Running monsterDB

monsterDB is a distributed collection based fuzzy matching database that stores the data used by the EntityStream Custodian application, monsterDB is available as a separate download if you wish to run it separately from the custodian server.

As a default monster is running in an embedded mode, this is controlled by your installation, if the custodian server does not detect the presence of the database as per the configuration it will start its own embedded server running inside Custodian. This is perfectly acceptable for non-production machines such as development, qa, etc, but may not meet the requirements for data security and recover-ability of your organisation.

As such we allow you to install your own monsterDB instances and point your custodian server at it, some examples are shown below.

set dbHost=localhost:27018

This will force the server to look for a database instance running on the same machine – in fact this is the default setting for custodian. If it is not running then the server will be started as per this setting.

set dbHost=

Where is the local ip address of the machine running custodian. If the database is not detected to be running then Custodian will start a database on the local machine at this port (7777).

set dbHost=

If this is not an IP address of the machine running Custodian then Custodian will not try to start a database and fail with the message “No DB Server Found”

Setting the storage location for the embedded monsterDB instance

To specify the location of the database files on the embedded server instance only – then use the environment setting dbLoc, firstly in *nix environments:

export dbLoc=/usr/local/entitystream

or in windows:

set dbLoc=C:\entitystream

Running Monster in a clustered mode

To run monsterDB across multiple machines, then the cluster must be configured and running before you start custodian, in this example we have 2 database nodes running on

Start-up each node (from

java -jar monsterDB.jar -s -p 27018 -db ./one -rs -n 0

This command starts a server on node 0, physically hosted on

-db signifies the directory location that the database files will be stored (also very useful when you have two nodes on one machine)
-rs (replicaSet) specifies the other nodes involved in the cluster. it can be provided n times. ie -rs -rs etc
-n specifies the node number of this node, this has to be set for replication to work

java -jar monsterDB.jar -s -p 27019 -db ./two -rs -n 1

This command starts a server on node 1, physically hosted on localhost:27019, the parameters point back to the node 0 in the cluster.

Connecting Custodian to this cluster:

set dbHost=
java -jar Entitystream-thorntail.jar

You can pick either of the nodes to talk to, we recommend in planning these machines that the node you connect to has more memory and CPU processing available to it, as although it will push work to other nodes in the cluster it will remain responsible for the collation of data on its way back to custodian.