Recovering data from Docker volumes

/ in dev, ops

After a fresh install of my operating system and after coping over my Docker backup, I discovered that there was a gigantic file called Docker.raw which stores all of the containers, images, and volumes somehow magically. In my case, it was a whopping 86+ GB! 😳 Sure storage is relatively cheap, but that’s just needlessly wasteful and this is my laptop so it’s a bit ridiculous. I believe now that this was a result of working with Docker from an old version and that this problem may have been solved, but at least regarding Docker for Mac, the raw file can not be shrunken while retaining the data. This means that if you wanted to keep all your Docker resources, you would first have to get any data out of your volumes that you want to keep before you can get rid of it.

While attempting to figure out how I was going to get the data out of my volumes. I was lurking around my Docker setup when I noticed that I had a lot of extra volumes. I didn’t know if the data inside was important or not and so I wanted to find out more about them in order to see if the data needed to be properly backed up. Upon searching the web, I couldn’t really find much documentation around viewing volumes and getting data out of them for backing up and restoring between machine migrations. I’m going to share this in the hopes it helps someone else since I wasn’t able to find anything online when I looked.

First, lets start out by seeing how much Docker resources is using on your system:
$ docker docker system df

Now, lets take a look at the volumes:
$ docker volume ls -qf dangling=true

You can inspect a Docker volume like below:
$ docker volume inspect $VOLUME
Note: change the $VOLUME with the ID of the one you want to use.

I discovered after restoring my Docker setup from my previous machine that this command’s info was largely useless. The reason is that the paths in the command output were actually not even on the system. While this could be some kind of filesystem abstraction that I’m not aware of, the information essentially was just not helpful in recovering any data, at least for me.

At this point, I began to get frustrated and thinking that I was just going to have to lose whatever data maybe stored in these volumes and hope that I could just live with it, but it later dawned on meπŸ’‘ that I could just use many of the built in Docker commands to solve this problem.

Since the volumes were originally mounted into Docker containers, I wanted to try to first see what the data looked like on the filesystem to know what it was used for. We can do that by simply mounting an image of a Linux distribution and using some bash commands.

To dig around your volume filesystem you can tinker in the shell (pick your image preference):
BusyBox => $ docker run --rm -it -v $VOLUME:/data busybox
Ubuntu => $ docker run --rm -it -v $VOLUME:/data ubuntu bash
Note: change the $VOLUME with the ID of the one you want to use.

For those who are unfamiliar with Docker low level usage or use Docker Compose primarily, I want to break this down a bit to explain. The first part is a pretty common workflow to instruct Docker to start a container with an image, in this case BusyBox, a lightweight Linux distro, as the starting point. The -it argument tells Docker we want to run in interactive mode/shell environment, in this case Bash. The best part is --rm which tells Docker to remove the container once we exit the shell, that way we don’t have to later do it ourselves to cleanup once we are done looking at the filesystem.
Note: while you could use another flavor of Linux like Alpine with lighter footprint, often times many useful binary commands are not bundled with the image which makes inspecting volumes more difficult, but is great for production. If you need more power and flexibility, you can always use Ubuntu.

Try listing the directory contents of a volume like this (pick your image preference):
BusyBox => $ docker run --rm -v $VOLUME:/data busybox ls -la /data
Ubuntu => $ docker run --rm -v $VOLUME:/data ubuntu bash -c "ls -la /data"
Note: change the $VOLUME with the ID of the one you want to use.

While it’s similar to the previous command, the main difference is that we don’t have a shell and that it’s way quicker to just get a list of the root filesystem which is useful if you wanted to automate this process using scripts.

Another useful example for showing a tree of the volume’s filesystem like this:
$ docker run --rm -it -v VOLUME:/data iankoulski/tree /data

Now that the contents of the volume is figured out, we need to simply recreate the environment in which they were used before with images from DockerHub and then connect and use the appropriate commands to complete the exporting process.

Try creating a compressed tarball archive of the volume’s filesystem and save it to to the current directory:
docker run --rm -v $VOLUME:/data -v $(pwd):/backup busybox tar -zcvf /backup/data.tar.gz /data
Note: change the $VOLUME with the ID of the one you want to use.

If you want to know if the volume is a data store, you want to look at the files and directories and compare them to their standard conventions. In the case of MySQL, there should be mysql, sys, and performance_schema directories or for Redis you’ll find a single dump.rdb file.

When I found a volume that used to belong to a container storing a MySQL database, I wanted to use the MySQL image and connect to it with my GUI and then export the database contents. Here’s an example of how you would do just that by forwarding the ports from the container to your host machine, allowing you to easily connect to it:
$ docker run -d -p 3306:3306 -v $VOLUME:/var/lib/mysql mysql:5.7
Note: change the $VOLUME with the ID of the one you want to use.

From here you’ll be be able to retrieve your data from your volumes. If this helped you out or you found a better way to do this, let me know on Twitter! πŸŽ‰