Restoring MongoDB from a Dump File in AWS S3

Hi everyone!

It’s been a long time already since my last blog post! *cue music: “It’s been a long time without you, my friend!“* Haha. :))

Life has been pretty fast and busy lately wooo but fun nonetheless! I was just actually from a family vacation in Palawan and it was super nice! Clear waters, sunny skies, fresh air, yummy seafood, and crisp waves humming in one’s ear. All my favorite elements combined!

Woooo, so since today is #backtowork day, I started it with preparing a golden image for our QA database.

Backing up one of our databases wasn’t as tedious before (already completing after an hour or so). But due to some major changes in data collection and recording, one of our databases became huge which also made restoring take a while.

Due to this, preparing the testing database became one of the challenges during our last QA testing session.  I started restoring the database at 6 pm and it was still creating indices at 3 am. Because of this, I plan to just create a golden database image for QA testing regularly (maybe twice every month) and use it for QA testing sessions.

So there, sorry for the long introduction part for this post! So in this blog post, we’ll walk through the steps in creating a golden image for your MongoDB database, pulling your dump from AWS S3 and setting it up in your AWS EC2 instances. 🙂

My setup includes:

  • Mongo Database
  • Database Dump in S3
  • AWS EC2 Instances.

We can divide the whole process into 5 parts:

  1. Preparing the AWS EC2 Instance
  2. Copying the Dump from S3
  3. Mounting AWS EBS storage
  4. Preparing the Copied MongoDB Dump
  5. Restoring the Copied MongoDB Dump

Before we start, let us start with the following quote:

TMUX is always a great idea!

Oftentimes, we get disconnected from our SSH connections, and sometimes unfortunately, with a running process. Oftentimes too, we want to get back to whatever our workspace was – for this purpose, we can use tools, like tmux or GNU screen, that provides session management (along with other awesome feature like screen multiplexing, etc).

I. Preparing the AWS EC2 Instance

For the first part, we will be preparing the AWS EC2 instance where we will be running Mongo where we will be restoring our database to.

A. Provisioning the AWS EC2 Instance

For this, I used an Ubuntu 14.04 server,

Screen Shot 2016-05-13 at 9.25.20 AM.png

and provisioned with 72 GB for the main memory and an additional 100 GB with an EBS volume. These sizes may be too big or too small for your setup, feel free to change them to different numbers that would suit you best.

Screen Shot 2016-05-13 at 9.25.45 AM.png

B. Installing MongoDB

i. Import MongoDB public key
$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10
ii. Generate a file with MongoDB reposityory URL
$ echo 'deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen' | sudo tee /etc/apt/sources.list.d/mongodb.list
iii. Refresh and update packages
$ sudo apt-get update
iv. Install MongoDB
$ sudo apt-get install -y mongodb-org

C. Operating MongoDB

Here are some useful commands on operating MongoDB.

i. Starting Mongo:
$ sudo service mongod start
ii. Checking If It is Running:
$ tail -n 500 /var/log/mongodb/mongod.log

You should see something like:

[initandlisten] waiting for connections on port 27017
iii. Stopping Mongo
$ sudo service mongod stop
iv. Restarting Mongo
$ sudo service mongod restart

II. Copying the Dump from AWS S3

If your dump in S3 is publicly available, go ahead and use wget with the url that S3 provided for your file. But in case its security settings allows it to be only viewable from certain accounts, you can use AWS CLI to copy from S3

i. Install AWS CLI
$ sudo apt-get install awscli
ii. Configure you Credentials
$ aws configure
iii. Execute the Copy Command

* Feel free to change the region to the region where your bucket is

$ aws s3 cp s3://bucket-name/path/to/file/filename /desired/destination/path --region us-west-2

 

III. Mounting AWS BS Storage

From I, we have provisioned our Ec2 Instance with 100GB of EBS storage, now it’s time to mount it in our EC2 instance to make it usable.

We first want to see a summary of avaialble and used disk space in our file system:

$ df -h

We can see that our 100 GB is still not part of this summary. Listing all block devices with:

$ lsblk

We get:

Screen Shot 2016-05-13 at 9.29.11 AM.png

Since this is a new EBS volume, no file system is still intact so we proceed in creating a filesystem and also mounting the volume:

i. Check and Create File System
$ sudo file -s /dev/xvdb
$ sudo mkfs -t ext4 /dev/xvdb
ii. Create, Mount, Prepare Directory
$ sudo mkdir /data
$ sudo mount /dev/xvdb /data
$ cd /data
$ sudo chmod 777 .
$ sudo chown ubuntu:ubuntu -R .

For an in-depth tutorial on attaching EBS volumes, you may check my another blogpost: Amazon EBS: Detachable Persistent Data Storage.

IV. Preparing the Copied MongoDB Dump

Once you have downloaded your dump in S3, most likely it is compressed and zipped to save space. In that case, you need to uncompress it.

If your dump file has a .tar extension, you can untar it by:

$ tar -xvf /path/to/dump/dump-filename.tar

On the other hand, if your dump file has a .tar.gz extension, you can untar-gz it by:

$ tar xvzf /path/to/dump/dump-filename.tar.gz -C desired/destination/path/name

Continue un-tarring and unzipping your files if the main dump file contains nested compressed resources.

V. Restoring the Copied MongoDB Dump

$ export LC_ALL="en_US.UTF-8"
$ mongorestore --drop --host localhost --db db_name_here path/to/the/copied/dump/filename

If you are in tmux, in case you get disconnected, you can get back to your previous workspace by:

$ tmux attach

 

So there, a really quick and short tutorial on how we can get our Mongo Dumps and Databases up and running. 🙂

PostgreSQL 101: Getting Started! (Part 1)

PostgreSQL

An object-relational database system

I. Installation

A. Mac OSX:

brew install postgresql

B. Ubuntu

sudo apt-get update
sudo apt-get install postgresql postgresql-contrib

II. Console Commands

A. Connecting to PostgreSQL Server

To connect to the PostgreSQL server with as user postgres:

psql -U postgres

By default, psql connects to a PostgreSQL server running on localhost at port 5432. To connect to a different port and/or host. Add the -p and -h tag:

psql -U postgres -p 12345 -h 192.32.123.32

Once in, you may navigate via the following commands:

  • \l – list databases
  • \c – change databases
  • \d – list tables
  • \df – list functions
  • \df – list functions with definitions
  • \q – quit

III. Database Creation

CREATE DATABASE < database name >;

# Creates database with name: test_db
CREATE DATABASE test_db

IV. Database Drop

DROP DATABASE < database name >;

 # Drops database with name: test_db
DROP DATABASE test_db

V. Table Creation

CREATE TABLE programs(
  programid SERIAL PRIMARY KEY,
  degree CHARACTER VARYING,
  program CHARACTER VARYING
);

CREATE TABLE students(
  studentid SERIAL PRIMARY KEY,
  student_number CHARACTER VARYING UNIQUE,
  first_name CHARACTER VARYING,
  last_name CHARACTER VARYING,
  programid INTEGER REFERENCES programs,
  insertedon TIMESTAMP WITHOUT TIME ZONE DEFAULT now()
);

A. Column Data Types

  • SERIAL
  • CHARACTER VARYING
  • CHARACTER(10)
  • INTEGER
  • TIMESTAMP WITHOUT TIME ZONE

B. Common Added Options

  • PRIMARY KEY
  • UNIQUE
  • DEFAULT

VI. CRUD Operations

A. Insertion of Rows

Template:

INSERT INTO table_name(column1, column2, column3...)
VALUES(value1, value2, value3...);

Sample:

INSERT INTO programs(degree, program)
VALUES('BS', 'Computer Science');

INSERT INTO programs(degree, program)
VALUES('BS', 'Business Administration and Accountancy');

INSERT INTO students(student_number, first_name, last_name, programid)
VALUES('2010-00031', 'Juan', 'Cruz', 1);

INSERT INTO students(student_number, first_name, last_name, programid)
VALUES('2010-00032', 'Pedro', 'Santos', 2);

B. Read/Lookup of Row

i. Get All Rows

SELECT * FROM students;

ii. Get Rows Satisfying Certain Conditions

# Gets row/s with studentid = 1

SELECT * FROM students where studentid = 1;

# Gets row/s where the last_name starts with 'cru' (non case sensitive)

SELECT * FROM students where last_name ilike 'cru%';

# Gets row/s where the student_number column is either 2010-0033, '2010-30011', or '2010-18415'

SELECT * FROM students where student_number in ('2010-00033', '2010-30011', '2010-18415');

iii. Get Specific Columns from Resulting Rows

# Selects the lastname and firstname from the students table

SELECT last_name, firstname from students;

# Selects the program column from rows of the programs table satisfying the condition and then prepending the given string

SELECT 'BUSINESS PROGRAM: ' || program from programs where program ilike '%business%';

C. Update of Row

i. Update all Rows

UPDATE students SET last_name = 'Cruz';

ii. Update Rows Satisfying Conditions

UPDATE students SET last_name = 'Santos' where studentid = 1;

UPDATE programs SET degree = 'BA' where programid NOT IN (2);

D. Deletion of Row

i. Delete all Rows

 DELETE FROM students

ii. Delete Rows Satisfying Conditions

DELETE FROM students WHERE studentid NOT IN (1,2)

VII. Queries

A. Joins

i. Inner Join

Syntax:
SELECT * FROM table_1 JOIN table_2 using (common_column_name);
Example:
SELECT student_number, program FROM students JOIN programs using (programid);

ii. Left Join

Syntax:
SELECT * FROM table_1 LEFT JOIN table_2 on table_1.column_name = table_2.column_name;
Example:

We insert a student row without a program

INSERT INTO students(student_number, first_name, last_name)
VALUES('2010-35007', 'Juana', 'Change');

Doing a left join would still return the recently inserted row but with empty Programs-related fields.

SELECT * FROM students LEFT join programs on students.programid = programs.programid;

iii. Right Join

Syntax:
SELECT * FROM table_1 RIGHT JOIN table_2 on table_1.column_name = table_2.column_name;
Example:

We insert a program row without any students attached

INSERT INTO programs(degree, program)
VALUES('BS', 'Information Technology');

Doing a right join would still return the recently inserted row but with empty Students-related fields.

SELECT * FROM students RIGHT join programs on students.programid = programs.programid;

B.Where

Specify conditions by which rows from the query will be filtered.

SELECT * from students where programid IS NOT NULL;

C. Group By

Allows use of aggregate functions with the attributes provided to the GROUP BY clause as basis for aggregations

SELECT program, COUNT(*) FROM students
JOIN programs USING (programid) GROUP BY program;

Above example counts students per program.

D. Having

Similar to WHERE but applies the condition to the groups produced with GROUP BY.

SELECT program, COUNT(*) FROM students
JOIN programs USING (programid) GROUP BY program HAVING COUNT(*) > 1;

E. Union

Joins resulting datasets from multiple queries.

select * from students where programid in (1, 2)

UNION

select * from students;

MongoDB 101: A Starter Guide

MongoDB

Open source document database

I. Definitions

A. Document

  • Represent one record in MongoDB; consists of key-value pairs.

  • Similar to JSON Objects

  • Values may include other documents, arrays, or arrays of documents

     {
         "_id" : ObjectId("54c955492b7c8eb21818bd09"),
         "student_number" : "2010-30010",
         "last_name" : "Dela Cruz",
         "first_name" : "Juan",
         "middle_name" : "Masipag",
         "address" : {
             "street" : "#35 Maharlika St.",
             "zipcode" : "30011",
             "city" : "Quezon City",
             "coord" : [ -73.9557413, 40.7720266 ]
         },
         "gwa" : "1.75",
         "program" : "Computer Science",
         "degree" : "Bachelor of Science"
     }
    

B. Collection

Documents are stored in collections. They are similar to tables but unlike a table, a collection does not require documents to have the same schema.

Documents stored in a collection has a unique identifier _id that acts as the primary key.

II. Installation

A. Mac OSx

brew install mongodb

B. Ubuntu

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927

echo "deb http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list

sudo apt-get update

sudo apt-get install -y mongodb-org

III. Setup

A. Running the Database

By default, mongod looks for your database at /data/db

 mongod

In case your database is at a different path, provide the –dbpath parameter

 mongod --dbpath .

B. Running the Console

 mongo

By default, Mongo console would connect to localhost at port 27017. Make sure that mongod is running before you issue the mongo command.

C. Switching database

use mongo-cheatsheet
# switched to db mongo-cheatsheet; would be created if non existent

III. CRUD Operations

A. Insertion

db.students.insert(
  {
     "student_number" : "2010-30010",
     "last_name" : "Dela Cruz",
     "first_name" : "Juan",
     "middle_name" : "Masipag",
     "address" : {
       "street" : "#35 Maharlika St.",
       "zipcode" : "30011",
       "city" : "Quezon City",
       "coord" : [ -73.9557413, 40.7720266 ]
     },
     "gwa" : 1.75,
     "course" : "BS Computer Science"
  })
  
 # => WriteResult({ "nInserted" : 1 })
 
 # _id is automatically asssigned

B. Read or Lookup

# Find student with student_number 2010-30010
db.students.find( { "student_number": "2010-30010" } )

# Find student with zipcode (embedded attribute) 30011
db.students.find( { "address.zipcode": "30011" } )

# Find students with the gwa column greater than 1.25
db.students.find( { "gwa": { $gt: 1.25 } } )

# Find students with the gwa column less than 1.25 and course is BS Computer Science
db.students.find( { "gwa": { $lt: 1.25 } , "course": "BS Computer Science"} )

# Find students with the gwa column less than 1.25 or course is Computer Science
db.students.find( { $or: [{ "gwa": { $lt: 1.25 } } , {"course": "BS Computer Science"}]})

C. Update

i. Update Attribute/s

Updates first matching document with first_name: Juan

db.students.update(
    { "first_name" : "Juan" },
    {
        $set: { "first_name": "Juana" }
    }
)

The following code snipper updates the first matching document with first name Juan. And also sets the field lastModified to true (since it is non existent on the first run based on our schema, it will be created and set.)

db.students.update(
    { "first_name" : "Juan" },
    {
        $set: { "first_name": "Juana" },
        $currentDate: { "lastModified": true }
    }
)

ii. Update Embedded Fields

db.students.update(
    { "first_name" : "Juana" },
    { $set: { "address.street": "#45 Maginhawa St.",
              "address.city": "Quezon City" }}
)

iii. Updating All Matching Documents

By default, update only updates the first matching document. To tell MongoDB to update all matching, we pass multi: true

db.students.update(
  { "first_name" : "Juan" },
  { $set: { "address.street": "East 31st Street" } },
  { multi: true}
)
# WriteResult({ "nMatched" : 2, "nUpserted" : 0, "nModified" : 2 })

iv. Replace

db.students.update(
    { "first_name" : "Juan" },
    {
        "first_name" : "Victoria",
        "address" : {
            "street" : "Emerson Subdivision",
            "city" : "Saog, Marilao"}
    }
)

If you want to insert in case the data is non-existent, pass upsert: true as well to the update call.

{ upsert: true }

D. Delete

i. Delete a document

db.students.remove( { "first_name": "Juan" } )
# removes all document

db.students.remove( { "first_name": "Victoria" }, { justOne: true } )
# removes only one of the matching document     

ii. Drop a Collection

db.students.drop
# => true

IV. Query

In addition to simple lookup commands, you can also use aggregation:

db.students.aggregate(
[
 { $group: { "_id": "$address.city", "count": { $sum: 1 } } }
]
);

Other available operators:

  • sort
  • project
  • and many more…

V. Data Import

A. Import from JSON, CSV, TSV

To import dataset from a JSON, CSV, or TSV

mongoimport --db mongo-cheatsheet --collection students --drop --file primer-dataset.json

Where:

  • Database name: mongo-cheatsheet
  • Collection name: students
  • Source File: primer-dataset.json

By default, connects to localhost:27017. If you wish to connect to other ports, add flags: –host and –port

mongoimport --db mongo-cheatsheet --collection students --drop --file primer-dataset.json --host 192.168.123.321 --port 27019

B. Restore from Mongo Backup

To restore from a mongoDB backup:

mongorestore --drop --db mongo-cheatsheet /path/to/your/dump

Where:

  • Database name: mongo-cheatsheet
  • Dump path: /path/to/your/dump

C. Backup a Mongo Database

mongodump --db mongo-cheatsheet
  • Database name: mongo-cheatsheet