Restoring MongoDB from a Dump File in AWS S3

Hi everyone!

It’s been a long time already since my last blog post! *cue music: “It’s been a long time without you, my friend!“* Haha. :))

Life has been pretty fast and busy lately wooo but fun nonetheless! I was just actually from a family vacation in Palawan and it was super nice! Clear waters, sunny skies, fresh air, yummy seafood, and crisp waves humming in one’s ear. All my favorite elements combined!

Woooo, so since today is #backtowork day, I started it with preparing a golden image for our QA database.

Backing up one of our databases wasn’t as tedious before (already completing after an hour or so). But due to some major changes in data collection and recording, one of our databases became huge which also made restoring take a while.

Due to this, preparing the testing database became one of the challenges during our last QA testing session.  I started restoring the database at 6 pm and it was still creating indices at 3 am. Because of this, I plan to just create a golden database image for QA testing regularly (maybe twice every month) and use it for QA testing sessions.

So there, sorry for the long introduction part for this post! So in this blog post, we’ll walk through the steps in creating a golden image for your MongoDB database, pulling your dump from AWS S3 and setting it up in your AWS EC2 instances. 🙂

My setup includes:

  • Mongo Database
  • Database Dump in S3
  • AWS EC2 Instances.

We can divide the whole process into 5 parts:

  1. Preparing the AWS EC2 Instance
  2. Copying the Dump from S3
  3. Mounting AWS EBS storage
  4. Preparing the Copied MongoDB Dump
  5. Restoring the Copied MongoDB Dump

Before we start, let us start with the following quote:

TMUX is always a great idea!

Oftentimes, we get disconnected from our SSH connections, and sometimes unfortunately, with a running process. Oftentimes too, we want to get back to whatever our workspace was – for this purpose, we can use tools, like tmux or GNU screen, that provides session management (along with other awesome feature like screen multiplexing, etc).

I. Preparing the AWS EC2 Instance

For the first part, we will be preparing the AWS EC2 instance where we will be running Mongo where we will be restoring our database to.

A. Provisioning the AWS EC2 Instance

For this, I used an Ubuntu 14.04 server,

Screen Shot 2016-05-13 at 9.25.20 AM.png

and provisioned with 72 GB for the main memory and an additional 100 GB with an EBS volume. These sizes may be too big or too small for your setup, feel free to change them to different numbers that would suit you best.

Screen Shot 2016-05-13 at 9.25.45 AM.png

B. Installing MongoDB

i. Import MongoDB public key
$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10
ii. Generate a file with MongoDB reposityory URL
$ echo 'deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen' | sudo tee /etc/apt/sources.list.d/mongodb.list
iii. Refresh and update packages
$ sudo apt-get update
iv. Install MongoDB
$ sudo apt-get install -y mongodb-org

C. Operating MongoDB

Here are some useful commands on operating MongoDB.

i. Starting Mongo:
$ sudo service mongod start
ii. Checking If It is Running:
$ tail -n 500 /var/log/mongodb/mongod.log

You should see something like:

[initandlisten] waiting for connections on port 27017
iii. Stopping Mongo
$ sudo service mongod stop
iv. Restarting Mongo
$ sudo service mongod restart

II. Copying the Dump from AWS S3

If your dump in S3 is publicly available, go ahead and use wget with the url that S3 provided for your file. But in case its security settings allows it to be only viewable from certain accounts, you can use AWS CLI to copy from S3

i. Install AWS CLI
$ sudo apt-get install awscli
ii. Configure you Credentials
$ aws configure
iii. Execute the Copy Command

* Feel free to change the region to the region where your bucket is

$ aws s3 cp s3://bucket-name/path/to/file/filename /desired/destination/path --region us-west-2

 

III. Mounting AWS BS Storage

From I, we have provisioned our Ec2 Instance with 100GB of EBS storage, now it’s time to mount it in our EC2 instance to make it usable.

We first want to see a summary of avaialble and used disk space in our file system:

$ df -h

We can see that our 100 GB is still not part of this summary. Listing all block devices with:

$ lsblk

We get:

Screen Shot 2016-05-13 at 9.29.11 AM.png

Since this is a new EBS volume, no file system is still intact so we proceed in creating a filesystem and also mounting the volume:

i. Check and Create File System
$ sudo file -s /dev/xvdb
$ sudo mkfs -t ext4 /dev/xvdb
ii. Create, Mount, Prepare Directory
$ sudo mkdir /data
$ sudo mount /dev/xvdb /data
$ cd /data
$ sudo chmod 777 .
$ sudo chown ubuntu:ubuntu -R .

For an in-depth tutorial on attaching EBS volumes, you may check my another blogpost: Amazon EBS: Detachable Persistent Data Storage.

IV. Preparing the Copied MongoDB Dump

Once you have downloaded your dump in S3, most likely it is compressed and zipped to save space. In that case, you need to uncompress it.

If your dump file has a .tar extension, you can untar it by:

$ tar -xvf /path/to/dump/dump-filename.tar

On the other hand, if your dump file has a .tar.gz extension, you can untar-gz it by:

$ tar xvzf /path/to/dump/dump-filename.tar.gz -C desired/destination/path/name

Continue un-tarring and unzipping your files if the main dump file contains nested compressed resources.

V. Restoring the Copied MongoDB Dump

$ export LC_ALL="en_US.UTF-8"
$ mongorestore --drop --host localhost --db db_name_here path/to/the/copied/dump/filename

If you are in tmux, in case you get disconnected, you can get back to your previous workspace by:

$ tmux attach

 

So there, a really quick and short tutorial on how we can get our Mongo Dumps and Databases up and running. 🙂

One thought on “Restoring MongoDB from a Dump File in AWS S3

  1. Hi!
    Starting from Mongo 3.2 you may find useful stream archiving for both backup and restore, teamed with awscli s3 stream support:

    mongodump –oplog –gzip –archive | aws s3 cp – s3://bucket-name/dump-date.tar.gz
    aws s3 cp s3://bucket-name/dump-date.tar.gz – | mongorestore –gzip –archive –drop –host localhost

    For low impact backups, I also like using this together with nice/ionice/pipeviewer in the mix.

    HTH
    Bye.

    Like

Leave a comment