wlanboy
Content Contributer
This tutorial is about installing a MongoDB cluster on Debian/Ubuntu.
There are different ways to run a MongoDB cluster. My prefered one is the ReplicaSet.
For a working ReplicaSet you need at least three servers.
An Arbiter is part of the cluster but not holding any data - it is only voting. You need about 5 MB of free RAM to run an Arbiter.
I don't want to talk about the pros and cons of MongoDB or NoSQL. If you want please join this discussion.
So back to the installation:
To test it:
Connect to the master and run following commands (on the primary master):
mongo
use testdata
doc1 = { name: "test1", value: 10}
doc2 = { name: "test2", value: 15}
db.simple.insert( doc1 )
db.simple.insert( doc2 )
show collections
db.simple.find()
We are switching to the database "testdata". If it is not present it will be automaitcally generated after the first insert.
We are creating two json documents "doc1" and "doc2".
We are inserting them into the collection "simple".
Afterwards we list all available collections and search for all "simple" documents.
Output should be like:
PRIMARY> show collections
simple
system.indexes
system.users
PRIMARY> db.simple.find()
{ "_id" : ObjectId("520f2728c3633ec65806eadc"), "name" : "test1", "value" : 10 }
{ "_id" : ObjectId("520f272cc3633ec65806eadd"), "name" : "test2", "value" : 15 }
Now we connect to the slave:
mongo
rs.slaveOk()
show collections
db.simple.find()
Second command is to ensure that query on slave side is ok.
Output should be:
mongo
MongoDB shell version: 2.4.5
connecting to: test
> use testdata
switched to db testdata
> db.auth('******','******');
1
> show collections
Sat Aug 17 03:35:40.095 JavaScript execution failed: error: { "$err" : "not master and slaveOk=false", "code" : 13435 } at src/mongo/shell/query.js:L128
> rs.slaveOk()
> show collections
simple
system.indexes
system.users
> db.simple.find()
{ "_id" : ObjectId("520f2728c3633ec65806eadc"), "name" : "test1", "value" : 10 }
{ "_id" : ObjectId("520f272cc3633ec65806eadd"), "name" : "test2", "value" : 15 }
So the replication is working.
We should look to the slaveOK things again.
MongoDB uses votes to ensure that the member with the best uptime and connection is becomming the master.
The master is handling all queries and all slaves are pulling the data from the master.
If you want to do something like load balancing you can add the flag "query from slaves too" to your mongodb client. The ReplicaSetClient is able to handle a list of ips. First thing it does is to see who is the master to ensure that the inserts go to the right member.
Next topic would be "security". The config setting:
noauth = true
#auth = true
If you know user rights like MySQL/Oracle you might think that "auth=true" is a must - but MongoDB is only knowing users per database.
So if you have access to a database or not. Every user of a database is able to do everything.
If you want to use this feature to separate web applications (as you see in my last output log) you have to create one admin user:
mongo
use admin
db.addUser("admin", "your-super-password")
db.auth('admin','your-super-password');
You can use any name because MongoDB has not any naming conventions.
After you added that user you can switch the config settings and restart each node. (users are repliaced too).
Next time you connect to your mongodb you have to run:
mongo
use admin
db.auth('admin','your-super-password');
Or you will see this error message:
MongoDB shell version: 2.0.4
connecting to: test
> show collections
Sat Aug 17 10:48:36 uncaught exception: error: {
"$err" : "unauthorized db:test lock type:-1 client:127.0.0.1",
"code" : 10057
}
After authentificated you can add additional users by:
use servers
db.addUser("servers", "super-password-2")
After adding the user the database "servers" is created automatically.
Last topic would be the schema less state of MongoDB collections. A collection is just a list of documents of the same type. They don't have to have the same attributes:
PRIMARY> use testdata
switched to db testdata
PRIMARY> doc3 = { name: "test3", value: 10, isactive: false}
{ "name" : "test1", "value" : 10, "isactive" : false }
PRIMARY> db.simple.insert( doc3 )
PRIMARY> db.simple.find()
{ "_id" : ObjectId("520f2728c3633ec65806eadc"), "name" : "test1", "value" : 10 }
{ "_id" : ObjectId("520f272cc3633ec65806eadd"), "name" : "test2", "value" : 15 }
{ "_id" : ObjectId("520f2b94c75fcbd13a79119b"), "name" : "test3", "value" : 10, "isactive" : false }
But no schema means no constraints too.
But you can use index to do so:
A index can be added easily:
db.events.ensureIndex( { "username" : 1, "timestamp" : -1 } )
This would speed up queries of events sorted by username (asc) and timestamp (desc).
You can use a index too to ensure some values are unique:
db.logins.ensureIndex( { "user_id": 1 }, { unique: true } )
By default, unique is false on MongoDB indexes - so you have to set this option.
If you have a lot of documents in one collection you should set the option "{background: true}" to ensure that the index creation is done in background and is therefore non blocking.
That's it.
Select you fav mongodb driver - you will find a lot: http://docs.mongodb.org/ecosystem/drivers/ - and start using your MongoDB.
There are different ways to run a MongoDB cluster. My prefered one is the ReplicaSet.
For a working ReplicaSet you need at least three servers.
- Running the master
- Running the slave
- Running the Arbiter
An Arbiter is part of the cluster but not holding any data - it is only voting. You need about 5 MB of free RAM to run an Arbiter.
I don't want to talk about the pros and cons of MongoDB or NoSQL. If you want please join this discussion.
So back to the installation:
- Adding apt key
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
- Adding the repro
Code:nano /etc/apt/sources.list Add this line for Ubuntu: deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen Add this line for Debian: deb http://downloads-distro.mongodb.org/repo/debian-sysvinit dist 10gen
- Install MongoDB
Code:sudo apt-get update sudo apt-get install mongodb-10gen
- Configuration of MongoDB
For me it was easier to create new directories for the data and for logging:
Code:sudo mkdir /mongodb sudo mkdir /mongodb/log sudo mkdir /mongodb/journal sudo chown -R mongodb:mongodb /mongodb
sudo nano /etc/mongodb.conf
Code:#Path for the db files dbpath=/mongodb #Path for the log file logpath=/mongodb/log/mongodb.log logappend=true #For cluster mode mongodb has to listen to a public ip #Enter a public ip (if you have more than one #bind_ip = 127.0.0.1 port = 27017 journal=true noauth = true #auth = true #quota = true nohttpinterface = true rest = true #Sets the default size of db files #smallfiles reduces the initial size for data files and limits them to 512 megabytes #smallfiles setting also reduces the size of each journal files from 1 gigabyte to 128 megabytes smallfiles = true #shared secret - authentication information for replica set members keyFile = /etc/keymongodb #name of replica set replSet = myreplica
sudo openssl rand -base64 80 > /etc/keymongodb
You have to copy this key file to all members of the replica set.
- Setup iptables rules
You should limit the access to your MongoDB instances: (this have to be done for all replica set members)
#MongoDB
iptables -A INPUT -s ip-of-master -i $device -m state --state NEW -p tcp --dport 27017 -j ACCEPT
iptables -A OUTPUT -s ip-of-master -m state --state NEW -p tcp --dport 27017 -j ACCEPT
iptables -A INPUT -s ip-of-slave -i $device -m state --state NEW -p tcp --dport 27017 -j ACCEPT
iptables -A OUTPUT -s ip-of-slave -m state --state NEW -p tcp --dport 27017 -j ACCEPT
iptables -A INPUT -s ip-of-arbiter -i $device -m state --state NEW -p tcp --dport 27017 -j ACCEPT
iptables -A OUTPUT -s ip-of-arbiter -m state --state NEW -p tcp --dport 27017 -j ACCEPT
- Restart the instances
Code:sudo service mongodb restart
- Setup the replicaset
We have to start the mongo client "mongo". This has to be run on a single member because this information is allready synced between the different replica set members.
Code:mongo rs.initiate() cfg = rs.conf() cfg.members[0].priority = 10 cfg.members[0].host = "ip-of-master:27017" rs.reconfig(cfg) rs.add("ip-of-slave:27017") rs.addArb("ip-of-arbiter:27017") cfg = rs.conf() rs.reconfig(cfg)
- Initiate the ReplicaSet
- load the config
- set the priority of the master to 10 (to ensure that the first voting results into our wished master)
- set the host of the master to it's public ip (mongodb usese the hostname which ofter does not resolve to the public ip)
- add the node to the replica set
- add the arbiter to the replica set
- reload the config (check if every ip and port is correct)
- save the config
- After some minutes the members of the replica set start a vote and afterwars start to sync each collection.
To test it:
Connect to the master and run following commands (on the primary master):
mongo
use testdata
doc1 = { name: "test1", value: 10}
doc2 = { name: "test2", value: 15}
db.simple.insert( doc1 )
db.simple.insert( doc2 )
show collections
db.simple.find()
We are switching to the database "testdata". If it is not present it will be automaitcally generated after the first insert.
We are creating two json documents "doc1" and "doc2".
We are inserting them into the collection "simple".
Afterwards we list all available collections and search for all "simple" documents.
Output should be like:
PRIMARY> show collections
simple
system.indexes
system.users
PRIMARY> db.simple.find()
{ "_id" : ObjectId("520f2728c3633ec65806eadc"), "name" : "test1", "value" : 10 }
{ "_id" : ObjectId("520f272cc3633ec65806eadd"), "name" : "test2", "value" : 15 }
Now we connect to the slave:
mongo
rs.slaveOk()
show collections
db.simple.find()
Second command is to ensure that query on slave side is ok.
Output should be:
mongo
MongoDB shell version: 2.4.5
connecting to: test
> use testdata
switched to db testdata
> db.auth('******','******');
1
> show collections
Sat Aug 17 03:35:40.095 JavaScript execution failed: error: { "$err" : "not master and slaveOk=false", "code" : 13435 } at src/mongo/shell/query.js:L128
> rs.slaveOk()
> show collections
simple
system.indexes
system.users
> db.simple.find()
{ "_id" : ObjectId("520f2728c3633ec65806eadc"), "name" : "test1", "value" : 10 }
{ "_id" : ObjectId("520f272cc3633ec65806eadd"), "name" : "test2", "value" : 15 }
So the replication is working.
We should look to the slaveOK things again.
MongoDB uses votes to ensure that the member with the best uptime and connection is becomming the master.
The master is handling all queries and all slaves are pulling the data from the master.
If you want to do something like load balancing you can add the flag "query from slaves too" to your mongodb client. The ReplicaSetClient is able to handle a list of ips. First thing it does is to see who is the master to ensure that the inserts go to the right member.
Next topic would be "security". The config setting:
noauth = true
#auth = true
If you know user rights like MySQL/Oracle you might think that "auth=true" is a must - but MongoDB is only knowing users per database.
So if you have access to a database or not. Every user of a database is able to do everything.
If you want to use this feature to separate web applications (as you see in my last output log) you have to create one admin user:
mongo
use admin
db.addUser("admin", "your-super-password")
db.auth('admin','your-super-password');
You can use any name because MongoDB has not any naming conventions.
After you added that user you can switch the config settings and restart each node. (users are repliaced too).
Next time you connect to your mongodb you have to run:
mongo
use admin
db.auth('admin','your-super-password');
Or you will see this error message:
MongoDB shell version: 2.0.4
connecting to: test
> show collections
Sat Aug 17 10:48:36 uncaught exception: error: {
"$err" : "unauthorized db:test lock type:-1 client:127.0.0.1",
"code" : 10057
}
After authentificated you can add additional users by:
use servers
db.addUser("servers", "super-password-2")
After adding the user the database "servers" is created automatically.
Last topic would be the schema less state of MongoDB collections. A collection is just a list of documents of the same type. They don't have to have the same attributes:
PRIMARY> use testdata
switched to db testdata
PRIMARY> doc3 = { name: "test3", value: 10, isactive: false}
{ "name" : "test1", "value" : 10, "isactive" : false }
PRIMARY> db.simple.insert( doc3 )
PRIMARY> db.simple.find()
{ "_id" : ObjectId("520f2728c3633ec65806eadc"), "name" : "test1", "value" : 10 }
{ "_id" : ObjectId("520f272cc3633ec65806eadd"), "name" : "test2", "value" : 15 }
{ "_id" : ObjectId("520f2b94c75fcbd13a79119b"), "name" : "test3", "value" : 10, "isactive" : false }
But no schema means no constraints too.
But you can use index to do so:
A index can be added easily:
db.events.ensureIndex( { "username" : 1, "timestamp" : -1 } )
This would speed up queries of events sorted by username (asc) and timestamp (desc).
You can use a index too to ensure some values are unique:
db.logins.ensureIndex( { "user_id": 1 }, { unique: true } )
By default, unique is false on MongoDB indexes - so you have to set this option.
If you have a lot of documents in one collection you should set the option "{background: true}" to ensure that the index creation is done in background and is therefore non blocking.
That's it.
Select you fav mongodb driver - you will find a lot: http://docs.mongodb.org/ecosystem/drivers/ - and start using your MongoDB.