What is Replication
Replication is the process of synchronizing data across multiple servers. Replication provides redundancy and increases data availability with multiple copies of data on different database servers. Replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.
ReplicaSet
MongoDB achieves replication by the use of a replica set. A replica set is a cluster of MongoDB database servers that implements master-slave (primary-secondary) replication. In a replica, one node is primary node that receives all write operations. All other instances, such as secondaries, apply operations from the primary so that they have the same data set. Replica set can have only one primary node. Replica sets also fail over automatically, so if one of the members becomes unavailable, a new primary host is elected and your data is still accessible. That means, when a primary replica fails, the replica set automatically conducts an election process to determine which secondary should become the primary.
Ref: https://docs.mongodb.com/manual/replication/
Replica Set oplog
The oplog (operations log) is a special capped collection that keeps a rolling record of all operations that modify the data stored in your databases.
Oplog Size
When you start a replica set member for the first time, MongoDB creates an oplog of a default size if you do not specify the oplog size. [1]
For Unix and Windows systems
The default oplog size depends on the storage engine:
In most cases, the default oplog size is sufficient. For example, if an oplog is 5% of free disk space and fills up in 24 hours of operations, then secondaries can stop copying entries from the oplog for up to 24 hours without becoming too stale to continue replicating. However, most replica sets have much lower operation volumes, and their oplogs can hold much higher numbers of operations.
Before mongod
creates an oplog, you can specify its size with the oplogSizeMB
option. Once you have started a replica set member for the first time, use the replSetResizeOplog
administrative command to change the oplog size. replSetResizeOplog
enables you to resize the oplog dynamically without restarting the mongod
process.
AutoFailover:
When a primary does not communicate with the other members of the set for more than the configured electionTimeoutMillis
period (10 seconds by default), an eligible secondary call for an election to nominate itself as the new primary. The cluster attempts to complete the election of a new primary and resumes normal operations.
Enough With theories Let's start?
- First, you need to create three ubuntu instances.
- Set the hostname of each instance.
- Configure the host files in path /etc/hosts.
- To create a replica set, you’ll need at least three instances with MongoDB installed.
- Initialize the master(primary) database server and add the secondary servers as slaves.
Step 1:
Create three instances of ubuntu
The IPs and hostnames are
172.31.11.1 mongo-manager
172.31.11.2 mongo-node1
172.31.11.3 mongo-node2
Step-2
To set the hostname ssh into each instance and goto /etc/hostname file
$ ssh into_mongo-manager_then
$ sudo vim /etc/hostname
mongo-manager
$ ssh into_mongo-node1_then
$ sudo vim /etc/hostname
mongo-node1
$ ssh into_mongo-node2_then
$ sudo vim /etc/hostname
mongo-node2
Step-3
Each member of your replica set should have a hostname that identifies it as a member of the set. This way, you’ll be able to keep your infrastructure organized at scale (for example, if you add more replica sets). In order to simplify the configuration of your replica set, add the following lines to the /etc/hosts file on each member of the replica set. So, you need to ssh into each node and add the below lines:
172.31.11.1 mongo-manager
172.31.11.2 mongo-node1
172.31.11.3 mongo-node2NOTE: here the ip address should be your private ip as we dont want replication to happen through public network.
NOTE: After changing in the hosts file you need to reboot the instances using sudo reboot command.
Step-4
Now, install mongodb in each instance. To do that:
$ sudo apt-key adv — keyserver hkp://keyserver.ubuntu.com:80 — recv EA312927
$ echo "deb [ arch=amd64,arm64 ] http://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.4.list
$ sudo apt-get update
$ sudo apt-get install -y mongodb-org
$ sudo systemctl start mongod
$ sudo systemctl enable mongod
On each of your nodes, make the following changes to your /etc/mongod.conf
file:
net:
port: 27017
#bindIp: 127.0.0.1replication:
replSetName: rs0
The replication
section needs to be uncommented to be enabled. Directives in this section are what directly affect the configuration of your replica set. The value rs0
is the name we’re using for our replica set, you can use a different naming convention if you like.
After changing the mongodb configuration file you need to restart the service in each node.
$ sudo systemctl restart mongod
Step-5
Now, you need to initialize the primary or master server. To do so ssh into db-manager node:
$ mongo
> rs.initiate()
output
{
"info2" : "no configuration specified. Using a default configuration for the set",
"me" : "mongo-manager:27017",
"ok" : 1
}
rs0:SECONDARY> rs.add('mongo-node1:27017')
{ "ok" : 1 }
rs0:PRIMARY> rs.add('mongo-node2:27017')
{ "ok" : 1 }
rs.initiate() command initiates a replica set with the current host as its only member.
Test your replicaset
Your replica set is fully functional and ready to use. Connect to the mongo of master node.
$ ssh into_mongo-manager_node
$ mongo
rs0:PRIMARY> use test
rs0:PRIMARY> for (var i = 0; i <= 10; i++) db.testCol.insert( { x : i } )
Now, ssh into your seconday or slave nodes and connect to mongo shell.
$ ssh into_mongo-node1
$ mongo
rs0:SECONDARY> rs.slaveOk()
rs0:SECONDARY> use test
rs0:SECONDARY> db.testCol.find()
output
{ "_id" : ObjectId("5a1483c65826dc1165ca7d19"), "x" : 0 }
{ "_id" : ObjectId("5a1483c65826dc1165ca7d1c"), "x" : 3 }
{ "_id" : ObjectId("5a1483c65826dc1165ca7d1b"), "x" : 2 }
{ "_id" : ObjectId("5a1483c65826dc1165ca7d1d"), "x" : 4 }
{ "_id" : ObjectId("5a1483c65826dc1165ca7d1a"), "x" : 1 }
{ "_id" : ObjectId("5a1483c65826dc1165ca7d1e"), "x" : 5 }
{ "_id" : ObjectId("5a1483c65826dc1165ca7d20"), "x" : 7 }
{ "_id" : ObjectId("5a1483c65826dc1165ca7d21"), "x" : 8 }
{ "_id" : ObjectId("5a1483c65826dc1165ca7d22"), "x" : 9 }
{ "_id" : ObjectId("5a1483c65826dc1165ca7d1f"), "x" : 6 }
{ "_id" : ObjectId("5a1483c65826dc1165ca7d23"), "x" : 10 }
Do the same as db-node1 in the db-node2.
You can view the config of connected nodes by using below:
$ ssh into_mongo-manager_node
$ mongors0:PRIMARY> rs.status()
All set. We’ve successfully configured mongodb replica set.
When we need Arbiter in MongoDb
The point of an Arbiter is to break the deadlock when an election needs to be held for a Primary. In such that there are a majority of nodes that can make the decision as to which node to elect.
In our current configuration, you have an odd number of nodes, so the election process is simple when all nodes are up, and in a failover one of the other nodes will simply be elected.
If we have an even number of nodes in a replica set to begin with and Arbiter may be required in the case where you do not want to commit the same level of hardware to have say a five node* replica set. Here you could use an arbiter on a lower spec machine in order to avoid a deadlock in elections.
An arbiter is also useful if you want to give preference to certain nodes to be elected as the Primary.
Plenty of information in the documentation