Bulk API Insert

Introduction

monsterDB provides 2 integration points for loading data outside of the User Interface provided by Custodian:

  • The Java API incorporating a bulk loader,
  • Load a file directly from the command prompt.

This document discusses the java API bulk commands, all bulk commands can accept many documents at once and this can be in the range of one to many thousand, it is advised that the bulk commands will lock the indexes for the period of time of processing the block, so it is advisable to remember when coding this the thread will wait until the indexes are freed.

Collection used in the following example is akin to the domain in you project.

insertMany

usage from API:
cursor = aCollection.insertMany(List<Document>)

aCollection is an existing collection in the database. The list of documents will be applied to the collection using an upsert, this means that a document that already exists according to the internal “_id” property of the document or because of the primary key of the record will be replaced with the new document you are supplying.

Please be aware that this will completely replace the document and not update it, therefore it could result in a race condition whereas the last document will be applied. 

Based on the unique indexes currently defined, this is designed to handle many documents in one batch (ie 100 or 1000) as it will lock the indexes for a period of time, it may cause other inserts to wait. Therefore depending on the size and complexity of your document and indexes you should probably spend some time optimising this for your situation.

saveMany

usage from API:
cursor = aCollection.saveMany(List<Document> docs)

This command is a pseudonym for insertMany and it is supplied for backwards compatibility only, it uses exactly the same code base as insertMany, you should migrate to insertMany.

updateMany

usage from API:
int updateMany(Document filter, Document amendments, Document options);

aCollection is an existing collection in the default database. The list of Documents will be applied to the collection, the integrity of the collection will be maintained and any records that exist with the same unique id or indexed unique id will be be updated accordingly.

The format of the update amendment should be as the following:

{ 
$set : {
avaluetochange: 'new value',
anothervalue: 'another new value'
}
}

All documents matching the filter condition will have the above changes applied to it, all other existing values will remain unchanged.

Example Java

private String databaseName="custodia-demo";
private String collectionName="NODES";
private String tableName="Company"; 
...
private Document readDocument(){
   String text=  ... your code to read some json value from somewhere....
   return Document.parse(text);
}

public void run() {
      MonsterClient client = null;
      try {
              client = new MonsterClient("localhost:27018");
              client.useDatabase(databaseName);
              client.useCollection(collectionName);
              
              List<Document> records = new ArrayList<Document>(1000);
              Document record = null;
              int count=0;
              while ((record = readDocument())!=null) {
                  count++;
                  record.append("Table", tableName);
                  records.add(record);
                  if (count % 1000 == 0){
                     client.saveMany(records);
                     records.clear();
                  }
              }
              if (count>0)
                  client.saveMany(records); 
         } catch (Exception e){
              e.printStackTrace();
         }
         if (client!=null)
             client.disconnect();
}
...

databaseName is in the format [instanceName]-[projectName],
collectionName is the domain context in custodian.