MonsterDB – Replication and Sharding

replication and sharding

monsterDB can be run in a shared nothing cluster mode which means that multiple machine can be connected and the data distributed amongst them, on this page you will learn how to configure and manage this.

Starting in Replicated Mode

How to connect a server node to another server node so they can share data commands, this example we have two nodes running on:

localhost:27018
localhost:27019

easy start-up

usage from CLI: java -jar monsterDB.jar -s -p 27018 -db ./one -rs localhost:27019 -n 0

This command starts a server on node 0, physically hosted on localhost:27018:

-db signifies the directory location that the database files will be stored (useful when you have two nodes on one machine) -rs (replicaSet) specifies the other nodes involved in the cluster. it can be provided n times. -n specifies the node number of this node, this has to be set for replication to work
usage from CLI: java -jar monsterDB.jar -s -p 27019 -db ./two -rs localhost:27018 -n 1

This command starts a server on node 1, physically hosted on localhost:27019, the parameters point back to the node 0 in the cluster.

start-up with authentication

usage from CLI: java -jar monsterDB.jar -s -p 27018 -db ./one -rs user:password@localhost:27019 -a auth.json -n 0
 

This command starts a server on node 0, physically hosted on localhost:27018: -a specifies the location of the authentication file -rs now includes a username and password in the link to the other server, this is now mandatory as the -a switch disables anonymous connections The other node follows the same pattern.

usage from CLI: java -jar monsterDB.jar -s -p 27019 -db ./two -rs user:password@localhost:27018 -a auth.json -n 1
 

This command starts a server on node 1, physically hosted on localhost:27019, the parameters point back to the node 0 in the cluster.

Replication Modes

The servers can operate in 3 replication modes, this is defined at the collection level and you could have each collection defined to distribute its data in a different manner.

Range based replication is where a filter is defined in standard monsterDB format that determines which documents a node should manage. Each node can have one filter that can contain multiple fields and conditions, just as a find statement would work on a collection, this filter is defined for each node on the creation of a collection. The definition of a range ensures that the collection will be created with a range distribution method only and other methods will be disabled.

The collection statement must include the range, one per node in the cluster, failing to include a range filter will default to that node receiving all documents in the collection, this may be your desired effect, however consider that when you subsequently query the cluster for a set of nodes, if the returned documents overlap the processor has to ensure the result of your query is distinct and this may take more effort.

db createCollection test {"0": {"FieldName": 1}, "1": {"FieldName": {$gt: 5}}}

Note also that the node id is always an integer and if you wish the replication to work then it should start at 0 and increment by 1 for each additional node.

Hash based replication in contrast to range based replication whereas you have full control over the method of distribution of the documents amongst the nodes, with hash based replication will determine from a static value in the document which node the document should be managed by, this essentially means that you need to define a hashKey value for a collection at the time of creation. If the number of nodes increases or decreases after a restart of the servers then you may become subject to data loss, this is due to the number of nodes being the constant required to determine where a document resides. If this constant changes then its home will change accordingly and although you could still save or update a record that would normally live on the missing node, if the missing node came alive again then you will find that one of the documents will survive and the other hidden by the natural order of the data in storage, this is because the contract of the cursor returned to the user states that all documents will only be returned once and this would be governed by the internal id of the document.

db createCollection test2 FieldName

FieldName determines the document field property that would be used to bucket the data to different nodes, the value in the document for this field should not be changed often, see the section “Relocating Nodes”

Simple Replication As a default if the node is involved in a cluster (ie the server was started with -rs) then the data would automatically be copied to the other nodes in the cluster without any intervention, the -rs switch tells monsterDB to tell all other nodes about new and changed data and they will store it regardless of your intentions! This is helpful if you wish to have complete replication across nodes to create redundancy in your database system. It would however cause some issues in the event of a server going offline and missing a data update, when it comes back online it will not be notified of the missing changes and it is the responsibility of the designer to ensure updates are reapplied in this situation. Future releases will respond to this.

db createCollection test2

Relocating Nodes

Unfortunately due to the nature of the range based and hashKey replication this could cause a node to move from the responsibility of one node to another, in addition if you created a new range based node then this would also cause documents to be covered by a new node and need to be relocated.

monsterDB tries to deal with this in the best way possible, when a nodes destination changes, it will be reevaluated on the next effective update or replacement – it will be removed from the current node as a result of the update to the data (the node knows its not its job to manage this document and will physically delete and de-index it), and inserted into the node where is it now to reside.

It will retain its internal _ID and the new move will index it based on the current state of the document.