MonsterDB – Collection Commands

query language

It is no secret that monsterDB query language is based upon the syntax that is used by MongoDB, this doesn’t mean we use mongoDB, we are built from the ground up as a pure Java engine.

Collection Commands

Briefly a collection stores one or many documents, these documents are in monsterDB JSON format, which is the same as a standard JSON document but can also store additional tagged objects such as images, document references and identifiers. To use the following Commands you must have created a collection to work with. Collections are independent, but they can store document references to documents in other collections.

find

usage from API:
  cursor = aCollection.find() 
  cursor = aCollection.find(query)

usage from CLI: 
  db.aCollection.find(query) 

aCollection is the name of an existing normal or fuzzy collection, query is an optional Document (in java) or json string representation of a json object (in cli). Java will return a DBCursor which is a lazy iterable for the Document object, CLI will return json text to the standard out, this can be redirected to a file using a redirector from the command line, for example

monsterCLI -h localhost -d fuzzyTest -r "db.TasksFlat.find()" | more

A query has the form:

{ aList.columnName: "a value", numericColumn: {$gt: 0}, textColumn: {"pattern": "regex.*"}
...
{ $or: [ {...}, {...} ] }
...
{ $and: [ {...}, {...} ] }

Notables from above include the $ operators and regex expressions, for further details see the operators guide.

findFuzzy

usage from API:
  cursor = aCollection.findFuzzy(query)

usage from CLI: 
  db.aCollection.findFuzzy(query) 
Whilst similar to the find statement the findFuzzy will not use the standard indexes, it will only use the fuzzy indexes, this will allow it to find records that appear to have some similarity with the search query. An example for this would be:
{Entity:{LegalName:"INTERAGROS VP, a.s."}}
The optimiser will pass the query to the fuzzy interpreter that will take into account any rules that apply to the filter above and return any records that match it:
{"score":100.0,"acceptance":"Auto-Match","action":"EID","actionText":"Link","rule":{"canMatchSameSystem":true,"systemMatchType":0,"lowScore":80,"highScore":95,"actionText":""},"_id":"08abcb80-cfc4-4cd2-aa7a-76a65f907b16"}
{"score":100.0,"acceptance":"Auto-Match","action":"EID","actionText":"Link","rule":{"canMatchSameSystem":true,"systemMatchType":0,"lowScore":80,"highScore":95,"actionText":""},"_id":"e2237bfa-fd15-4d40-ba8f-aef4bb174046"}
These results are indicating that there are two identical records matching the search criteria and indicating to the user what the fuzzy match rules state should be the designated action to take with them, the _id field is the internal document id that would allow you to inspect the record in more detail. See the section on pipelines on how to take further action on these events.

save

usage from API:
  cursor = aCollection.save(document)

usage from CLI: 
  db.aCollection.save(document) 
The record represented by the document object or json text would be saved to the database, in the event that there are indexes or fuzzy indexes the document would be standardised and indexed or just indexed accordingly.

insertOne

usage from API:
  cursor = aCollection.insertOne(document)

usage from CLI: 
  db.aCollection.insertOne(document) 

The record represented by the document object or json text would be saved to the database, in the event that there are indexes or fuzzy indexes the document would be standardised and indexed or just indexed accordingly.

insertMany

There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don’t look even slightly believable. If you are going to use a passage of Lorem Ipsum, you need to be sure there isn’t anything embarrassing hidden in the middle of text.

updateOne

usage from API:
  cursor = aCollection.updateOne(filter, updates)

usage from CLI: 
  db.aCollection.updateOne(filter, updates) 

The statement will be processed as a query resulting in a maximum of one document, this document would then be updated with the values in the $set operator in the updates document:

Consider the following document in the collection:

{"FirstName":"Robert","LastName":"Smith","ROWID":1}

If we run the following command:

db.aCollection.updateOne({"LastName":"Smith","FirstName":"Robert"}, {"$set":{"FirstName":"Robert", "LastName":"Smith","ROWID":6, "MiddleName":"Jim"}})  

The record would now look like this:

{"FirstName":"Robert","LastName":"Smith","ROWID":6,"_id":"de62f3cf-c5be-4840-a16e-c17d643f4f85","MiddleName":"Jim"}

updateMany

usage from API:
  cursor = aCollection.updateMany(filter, updates)

usage from CLI: 
  db.aCollection.updateMany(filter, updates) 

The statement will be processed as a query resulting in any number of documents that will be treated with the values in the $set operator in the updates document. Although this is a bulk statement the resulting process will be a parallel streamed set of individual updates, therefore will be more costly that using something like insertMany. 

findOneAndReplace

usage from API:
cursor = aCollection.findOneAndReplace(filter, replacement)

usage from CLI:
db.aCollection.updateOne(filter, replacement)
The statement will be processed as a query resulting in a maximum of one document, this document would then be replaced with the new document, if the _id field of the replacement is included it will likely be overwritten by the original document_id that is retrieved.

findOneAndUpdate

Is the same as updateOne.

deleteOne

usage from API:
cursor = aCollection.deleteOne(filter)

usage from CLI:
db.aCollection.deleteOne(filter)
The statement will be processed as a query resulting in a maximum of one document, this document would then be deleted.

deleteMany

usage from API:
cursor = aCollection.deleteMany(filter)

usage from CLI:
db.aCollection.deleteMany(filter)
The statement will be processed as a query resulting in all the matching documents to be deleted, to delete all documents in the collection one should use an empty document {}

count

usage from API:
  cursor = aCollection.count(query) 
  
usage from CLI: 
  db.aCollection.count(query) 
aCollection is the name of an existing normal or fuzzy collection, query is an optional Document (in java) or json string representation of a json object (in cli), the result will be a document containing the number of documents in the collection that match the query, use an empty document to count all documents. This value is currently not cached but in most cases would result in a count of a unique index rather than the data.

createIndex

usage from API:
  cursor = aCollection.createindex(document, options) 
 
usage from CLI: 
  db.aCollection.createindex(document, options) 
aCollection is the name of an existing normal or fuzzy collection, document is a document or json text representation that details the fields within the index:
{FirstName: 1, Lastname: 1}
– where 1 or 0 indicates the ordering of the index (not used currently) The options is a document or json text representation of a document that details the details of the index for example uniqueness and name:
{unique: true|false, name: "LastFirst_1"}
  Where unique will enforce the fields must be unique or not, if this happens an a new record with the same composite key is inserted – it will be rejected if the _ID is not the same.

listIndexes

usage from API:
cursor = aCollection.listIndexes()

usage from CLI:
db.aCollection.listIndexes()

aCollection is the name of a collection, this statement will list out all known indexes and the fields within.

dropIndex

usage from API:
cursor = aCollection.dropIndex(name)

usage from CLI:
db.aCollection.dropIndex(name)

aCollection is the name of a collection, this statement will drop the index from storage.

rebuildIndex

usage from API:
cursor = aCollection.rebuildIndex(name)

usage from CLI:
db.aCollection. rebuildIndex(name)

aCollection is the name of a collection, this statement will rebuild the index based on the values from the data in the documents in the collection, the use of this command is generally only recommended if the integrity of the index is in doubt or if the fuzzy definitions or rules set have changed and been reloaded.

To force a restandardisation of the data following a change to the fuzzy definition the user can also specify the special index name “STD_1”, this will re-evaluate the comparison data stored for the documents in the collection, please be aware that this is always done when a save or update is done on the collection, but the user may wish to force this event.

applyFuzzy

usage from API: void aCollection.applyFuzzy(definition, map)
usage from CLI: db.aCollection.applyFuzzy definitionFile ruleSetMapDirectory
API explanation: Creates or reapplies the fuzzy setup to the collection. “Definition” refers to a “Document” object that defines the match rules needed for matching as defined in the MARS schema definition here. The map is a RuleSetMap that contains the definition of commonly used values and translations for example nick names and company legal name types. The RuleSetMap can be read using a convenient factory method as below in java:
RuleSetMap map=RuleSetMap.readMap(mapDir);
Where mapDir is a string containing the full path of the ruleset files, standard ruleset files can be requested from EntityStream or you can build your own as described here.

CLI explanation: Creates or reapplies the setup required for a fuzzy collection. “definitionFile” refers to a text file name that contained the JSON format schema definition as defined in the MARS schema definition here. The ruleSetMapDirectory is a location on the server disk that contains the definition of commonly used values and translations for example nick names and company legal name types. Standard ruleset files can be requested from EntityStream or you can build your own as described here.