Recovering data from Docker volumes

/ in dev, ops

After a fresh install of my operating system and after coping over my Docker backup, I discovered that there was a gigantic file called Docker.raw which stores all of the containers, images, and volumes somehow magically. In my case, it was a whopping 86+ GB! 😳 Sure storage is relatively cheap, but that’s just needlessly wasteful and this is my laptop so it’s a bit ridiculous. I believe now that this was a result of working with Docker from an old version and that this problem may have been solved, but at least regarding Docker for Mac, the raw file can not be shrunken while retaining the data. This means that if you wanted to keep all your Docker resources, you would first have to get any data out of your volumes that you want to keep before you can get rid of it.

While attempting to figure out how I was going to get the data out of my volumes. I was lurking around my Docker setup when I noticed that I had a lot of extra volumes. I didn’t know if the data inside was important or not and so I wanted to find out more about them in order to see if the data needed to be properly backed up. Upon searching the web, I couldn’t really find much documentation around viewing volumes and getting data out of them for backing up and restoring between machine migrations. I’m going to share this in the hopes it helps someone else since I wasn’t able to find anything online when I looked.

First, lets start out by seeing how much Docker resources is using on your system:
$ docker docker system df

Now, lets take a look at the volumes:
$ docker volume ls -qf dangling=true

You can inspect a Docker volume like below:
$ docker volume inspect $VOLUME
Note: change the $VOLUME with the ID of the one you want to use.

I discovered after restoring my Docker setup from my previous machine that this command’s info was largely useless. The reason is that the paths in the command output were actually not even on the system. While this could be some kind of filesystem abstraction that I’m not aware of, the information essentially was just not helpful in recovering any data, at least for me.

At this point, I began to get frustrated and thinking that I was just going to have to lose whatever data maybe stored in these volumes and hope that I could just live with it, but it later dawned on me💡 that I could just use many of the built in Docker commands to solve this problem.

Since the volumes were originally mounted into Docker containers, I wanted to try to first see what the data looked like on the filesystem to know what it was used for. We can do that by simply mounting an image of a Linux distribution and using some bash commands.

To dig around your volume filesystem you can tinker in the shell (pick your image preference):
BusyBox => $ docker run --rm -it -v $VOLUME:/data busybox
Ubuntu => $ docker run --rm -it -v $VOLUME:/data ubuntu bash
Note: change the $VOLUME with the ID of the one you want to use.

For those who are unfamiliar with Docker low level usage or use Docker Compose primarily, I want to break this down a bit to explain. The first part is a pretty common workflow to instruct Docker to start a container with an image, in this case BusyBox, a lightweight Linux distro, as the starting point. The -it argument tells Docker we want to run in interactive mode/shell environment, in this case Bash. The best part is --rm which tells Docker to remove the container once we exit the shell, that way we don’t have to later do it ourselves to cleanup once we are done looking at the filesystem.
Note: while you could use another flavor of Linux like Alpine with lighter footprint, often times many useful binary commands are not bundled with the image which makes inspecting volumes more difficult, but is great for production. If you need more power and flexibility, you can always use Ubuntu.

Try listing the directory contents of a volume like this (pick your image preference):
BusyBox => $ docker run --rm -v $VOLUME:/data busybox ls -la /data
Ubuntu => $ docker run --rm -v $VOLUME:/data ubuntu bash -c "ls -la /data"
Note: change the $VOLUME with the ID of the one you want to use.

While it’s similar to the previous command, the main difference is that we don’t have a shell and that it’s way quicker to just get a list of the root filesystem which is useful if you wanted to automate this process using scripts.

Another useful example for showing a tree of the volume’s filesystem like this:
$ docker run --rm -it -v VOLUME:/data iankoulski/tree /data

Now that the contents of the volume is figured out, we need to simply recreate the environment in which they were used before with images from DockerHub and then connect and use the appropriate commands to complete the exporting process.

Try creating a compressed tarball archive of the volume’s filesystem and save it to to the current directory:
docker run --rm -v $VOLUME:/data -v $(pwd):/backup busybox tar -zcvf /backup/data.tar.gz /data
Note: change the $VOLUME with the ID of the one you want to use.

If you want to know if the volume is a data store, you want to look at the files and directories and compare them to their standard conventions. In the case of MySQL, there should be mysql, sys, and performance_schema directories or for Redis you’ll find a single dump.rdb file.

When I found a volume that used to belong to a container storing a MySQL database, I wanted to use the MySQL image and connect to it with my GUI and then export the database contents. Here’s an example of how you would do just that by forwarding the ports from the container to your host machine, allowing you to easily connect to it:
$ docker run -d -p 3306:3306 -v $VOLUME:/var/lib/mysql mysql:5.7
Note: change the $VOLUME with the ID of the one you want to use.

From here you’ll be be able to retrieve your data from your volumes. If this helped you out or you found a better way to do this, let me know on Twitter! 🎉

Taking Docker to the Next Level

/ in dev, ops

Last year I dove into Vagrant and Chef to setup developer environments. For a while now I’ve been trying to wrap my head around Docker and why people are raving about it in the devops world so I decided try it more.

Why Docker

Docker is a very powerful tool to spin up isolated “containers” which are similar to virtual machines except that they aren’t. They are built and ran as developers choose and every step inside a build file creates an image subset that can be used as a starting point in another image.

What does that mean? Well say you have 3 steps to setup a simple WordPress server.

Continue reading…

Getting Started in Open Source

/ in dev, open source

Open source is so ubiquitous nowadays that its inevitable you are benefiting from it every day even if you don’t know it just by using applications on the desktop, web, or mobile. The applications you use most likely have dependencies that are open source. If you’re on Mac or Linux, the unix kernel that powers them is open source.

Many people including myself feel encouraged to contribute projects that we use and benefit from because it helps the ecosystem as a whole. The entire ecosystem grows exponentially as more people contribute and work together, it’s a powerful thing.

Continue reading…

Design for Programmers

/ in design

Getting Started

I am not the best designer, but I feel I am able to distinguish good design from bad. I like to keep up with the design trends of the web and be active in the community.

Resources

Frameworks

One way is to build out design is by wireframing through frameworks as a base layer and add on your custom design on top.

Continue reading…

The Future of Hosting

/ in dev, hosting, ops

The web hosting industry is rapidly changing. In the early days you had traditional hosting, but now the web is seeing a shift towards scalable PaaS and managed web hosting.

Multiple software versions

With traditional hosting, you were stuck with whatever version the server was running by your provider. If they were running PHP 4 and you wanted to build a PHP 5 app, you were just out of luck.

With cloud offerings you can generally choose different version of the software whether thats PHP or MySQL instead of just a single version fits all.

Continue reading…