Running a MongoDB cluster

wlanboy

Content Contributer
This tutorial is about installing a MongoDB cluster on Debian/Ubuntu.

There are different ways to run a MongoDB cluster. My prefered one is the ReplicaSet.

For a working ReplicaSet you need at least three servers.

  1. Running the master
  2. Running the slave
  3. Running the Arbiter
This is caused by the polling of the MongoDB cluster partners. Each one gives his vote to one of the servers. The one with majority of votes becomes the master. Therefore the uneven number of cluster servers.
An Arbiter is part of the cluster but not holding any data - it is only voting. You need about 5 MB of free RAM to run an Arbiter.

I don't want to talk about the pros and cons of MongoDB or NoSQL. If you want please join this discussion.

So back to the installation:

  1. Adding apt key

    sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10

  2. Adding the repro
    Code:
    nano /etc/apt/sources.list
    Add this line for Ubuntu:
    deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen
    Add this line for Debian:
    deb http://downloads-distro.mongodb.org/repo/debian-sysvinit dist 10gen
  3. Install MongoDB
    Code:
    sudo apt-get update
    sudo apt-get install mongodb-10gen
  4. Configuration of MongoDB
    For me it was easier to create new directories for the data and for logging:
    Code:
    sudo mkdir /mongodb
    sudo mkdir /mongodb/log
    sudo mkdir /mongodb/journal
    sudo chown -R mongodb:mongodb /mongodb
    Now we can edit the mongodb.conf
     


    sudo nano /etc/mongodb.conf

    Code:
    #Path for the db files
    dbpath=/mongodb
    #Path for the log file
    logpath=/mongodb/log/mongodb.log
    logappend=true
    
    #For cluster mode mongodb has to listen to a public ip
    #Enter a public ip (if you have more than one
    #bind_ip = 127.0.0.1
    port = 27017
    
    journal=true
    noauth = true
    #auth = true
    
    #quota = true
    
    nohttpinterface = true
    rest = true
    
    #Sets the default size of db files
    #smallfiles reduces the initial size for data files and limits them to 512 megabytes
    #smallfiles setting also reduces the size of each journal files from 1 gigabyte to 128 megabytes
    smallfiles = true
    
    #shared secret - authentication information for replica set members
    keyFile = /etc/keymongodb
    #name of replica set
    replSet = myreplica
    To generate a secret (authentication) for the replicaset run following command:
     


    sudo openssl rand -base64 80 > /etc/keymongodb

    You have to copy this key file to all members of the replica set.
     
  5. Setup iptables rules
    You should limit the access to your MongoDB instances: (this have to be done for all replica set members)

    #MongoDB
    iptables -A INPUT -s ip-of-master -i $device -m state --state NEW -p tcp --dport 27017 -j ACCEPT
    iptables -A OUTPUT -s ip-of-master -m state --state NEW -p tcp --dport 27017 -j ACCEPT
    iptables -A INPUT -s ip-of-slave -i $device -m state --state NEW -p tcp --dport 27017 -j ACCEPT
    iptables -A OUTPUT -s ip-of-slave -m state --state NEW -p tcp --dport 27017 -j ACCEPT
    iptables -A INPUT -s ip-of-arbiter -i $device -m state --state NEW -p tcp --dport 27017 -j ACCEPT
    iptables -A OUTPUT -s ip-of-arbiter -m state --state NEW -p tcp --dport 27017 -j ACCEPT

  6. Restart the instances
    Code:
    sudo service mongodb restart
  7. Setup the replicaset
    We have to start the mongo client "mongo". This has to be run on a single member because this information is allready synced between the different replica set members.
    Code:
    mongo
    rs.initiate()
    cfg = rs.conf()
    cfg.members[0].priority = 10
    cfg.members[0].host = "ip-of-master:27017"
    rs.reconfig(cfg)
    rs.add("ip-of-slave:27017")
    rs.addArb("ip-of-arbiter:27017")
    cfg = rs.conf()
    rs.reconfig(cfg)
    What we do:
    - Initiate the ReplicaSet
    - load the config
    - set the priority of the master to 10 (to ensure that the first voting results into our wished master)
    - set the host of the master to it's public ip (mongodb usese the hostname which ofter does not resolve to the public ip)
    - add the node to the replica set
    - add the arbiter to the replica set
    - reload the config (check if every ip and port is correct)
    - save the config
     
  8. After some minutes the members of the replica set start a vote and afterwars start to sync each collection.
We are done. The cluster is running.

To test it:

Connect to the master and run following commands (on the primary master):
 


mongo
use testdata
doc1 = { name: "test1", value: 10}
doc2 = { name: "test2", value: 15}
db.simple.insert( doc1 )
db.simple.insert( doc2 )

show collections

db.simple.find()

We are switching to the database "testdata". If it is not present it will be automaitcally generated after the first insert.

We are creating two json documents "doc1" and "doc2".

We are inserting them into the collection "simple".

Afterwards we list all available collections and search for all "simple" documents.

Output should be like:


PRIMARY> show collections
simple
system.indexes
system.users
PRIMARY> db.simple.find()
{ "_id" : ObjectId("520f2728c3633ec65806eadc"), "name" : "test1", "value" : 10 }
{ "_id" : ObjectId("520f272cc3633ec65806eadd"), "name" : "test2", "value" : 15 }


Now we connect to the slave:


mongo
rs.slaveOk()
show collections
db.simple.find()

Second command is to ensure that query on slave side is ok.

Output should be:


mongo
MongoDB shell version: 2.4.5
connecting to: test
> use testdata
switched to db testdata
> db.auth('******','******');
1
> show collections
Sat Aug 17 03:35:40.095 JavaScript execution failed: error: { "$err" : "not master and slaveOk=false", "code" : 13435 } at src/mongo/shell/query.js:L128
> rs.slaveOk()
> show collections
simple
system.indexes
system.users
> db.simple.find()
{ "_id" : ObjectId("520f2728c3633ec65806eadc"), "name" : "test1", "value" : 10 }
{ "_id" : ObjectId("520f272cc3633ec65806eadd"), "name" : "test2", "value" : 15 }

So the replication is working.

We should look to the slaveOK things again.

MongoDB uses votes to ensure that the member with the best uptime and connection is becomming the master.

The master is handling all queries and all slaves are pulling the data from the master.

If you want to do something like load balancing you can add the flag "query from slaves too" to your mongodb client. The ReplicaSetClient is able to handle a list of ips. First thing it does is to see who is the master to ensure that the inserts go to the right member.

Next topic would be "security". The config setting:


noauth = true
#auth = true

If you know user rights like MySQL/Oracle you might think that "auth=true" is a must - but MongoDB is only knowing users per database.

So if you have access to a database or not. Every user of a database is able to do everything.

If you want to use this feature to separate web applications (as you see in my last output log) you have to create one admin user:


mongo
use admin
db.addUser("admin", "your-super-password")
db.auth('admin','your-super-password');

You can use any name because MongoDB has not any naming conventions.

After you added that user you can switch the config settings and restart each node. (users are repliaced too).

Next time you connect to your mongodb you have to run:


mongo
use admin
db.auth('admin','your-super-password');

Or you will see this error message:
 


MongoDB shell version: 2.0.4
connecting to: test
> show collections
Sat Aug 17 10:48:36 uncaught exception: error: {
"$err" : "unauthorized db:test lock type:-1 client:127.0.0.1",
"code" : 10057
}

After authentificated you can add additional users by:


use servers
db.addUser("servers", "super-password-2")

After adding the user the database "servers" is created automatically.

Last topic would be the schema less state of MongoDB collections. A collection is just a list of documents of the same type. They don't have to have the same attributes:


PRIMARY> use testdata
switched to db testdata
PRIMARY> doc3 = { name: "test3", value: 10, isactive: false}
{ "name" : "test1", "value" : 10, "isactive" : false }
PRIMARY> db.simple.insert( doc3 )
PRIMARY> db.simple.find()
{ "_id" : ObjectId("520f2728c3633ec65806eadc"), "name" : "test1", "value" : 10 }
{ "_id" : ObjectId("520f272cc3633ec65806eadd"), "name" : "test2", "value" : 15 }
{ "_id" : ObjectId("520f2b94c75fcbd13a79119b"), "name" : "test3", "value" : 10, "isactive" : false }

But no schema means no constraints too.

But you can use index to do so:

A index can be added easily:


db.events.ensureIndex( { "username" : 1, "timestamp" : -1 } )

This would speed up queries of events sorted by username (asc) and timestamp (desc).

You can use a index too to ensure some values are unique:


db.logins.ensureIndex( { "user_id": 1 }, { unique: true } )

By default, unique is false on MongoDB indexes - so you have to set this option.

If you have a lot of documents in one collection you should set the option "{background: true}" to ensure that the index creation is done in background and is therefore non blocking.

That's it.

Select you fav mongodb driver - you will find a lot: http://docs.mongodb.org/ecosystem/drivers/ - and start using your MongoDB.
 

Damian

New Member
Verified Provider
Do you happen to have any tips or best practices for keeping the Mongo cluster online and preventing the database from corrupting itself? We have a 3-node Mongo cluster in place and it's, hands down, the most fragile thing ever
 

wlanboy

Content Contributer
I would check four things:

First thing: Use a lot of Arbiters:

Nodes do not vote for themselfs. So you should run Arbiters on client side to represent their view on the quality of the ReplicaSet members.

If you have package loss between a secondary and the master the secondary will not vote for the master. But the connection to the master can be top-notch for the web client. So the Arbiters should be on client side and not on db side of the network.

Second thing: OpenVZ:


[initandlisten] ** WARNING: You are running in OpenVZ. This is known to be broken!!!

That is true and false at the same time. It all depends on the (inaccurate) bean_counters and the burst RAM. Both will kill MongoDB because it is depending on the OS side of memory management. So only use quality OpenVZ containers.

Third thing: Index size:

Index have to be in RAM. Basically one of the "magics" why NoSQL DBs are quite fast.

You can look at the current index size if you run following commands:


> use pingstest
switched to db pingstest
> db.printCollectionStats()
netpingstest
{
"ns" : "pingstest.netpings",
"count" : 211777,
"size" : 35088268,
"avgObjSize" : 165.68497995533036,
"storageSize" : 49999872,
"numExtents" : 8,
"nindexes" : 1,
"lastExtentSize" : 18124800,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 6884192,
"indexSizes" : {
"_id_" : 6884192
},
"ok" : 1
}

So 211777 elements and an index size of 6884192 bytes or 6.6 MB.

Fourth thing: Drivers + DM

Check how your driver is handling connection errors, pooling and replica sets.

And of course how good your data mapper is handling everything too.

And if both are thread save ...
 
Top