MonsterDB – Loading data

loading data

Loading Data

Data can currently be loaded into monsterDB in one of three ways, using a standard insert/save command either from the API or the Command Line interface, using the bulk saveMany command that will effectively perform an “upsert” into the collection or using the loadFile command, an alternate loadJDBC method is currently under construction.

loadFile

LoadFile is currently available via the Command Line 

java -jar monsterDB.jar -h host -p port -u username -pw password -d aDatabase -c aCollection -f file -r recordType -o options

Where aCollection is a collection that can have existing documents within it.

RecordType is the table type within the fuzzy rules, if it is not a fuzzy collection then the user should not include it or they can supply any non-null value for this, it will be ignored.

Filename/url can be exactly this, a local (to the CLI) file that can be read by the current user, or a URL such as http://… https://… or ftp:.//… pointing at a file or zip file on the network/internet.

Property document (last parameter), the valid values in this json text representation are as follows:

encoding – file character encoding, defaults to UTF8
rootNode – in XML this will be used to extract the documents, every XML element that is named with this will be converted to a document.
extractPattern – when a zip file is found by the URL then this pattern will . be used to extract the files needed from within it.
quote – The character used to quote fields of data that might contain the delimiter.
delimiter –delimiter character(s) between fields in the rows of data in a delimited file – ie ,
fileType – ie CSV, XML, XLS which indicates the type of the file that will eventually be read.
sheetno – with an XLS sheet this is the 0 based sheet number to read.

httpusername – username used to login to the remote http server
httppassword – associated http password
httpheaders – any headers required to create a http connection
ftpusername – ftp username to login to the remote server
ftppassword – ftp password

The following is an example to load a local XML file:
java -jar monsterDB.jar -h host -p port -u username -pw password -d aDatabase -c fuzzyCollection -f leismall.xml -r GLEIF -o "{fileType: \"XML\", rootNode: "\LEIRecord\"}"
And to load a XML file that has been ZIPPED and placed on a website:
java -jar monsterDB.jar -d newdb -c fuzzy2 -t GLEIF -f https://leidata.gleif.org/api/v1/concatenated-files/lei2/20190805/zip -o "{\"fileType\": \"zip\", \"rootNode\": \"LEIRecord\", \"extractPattern\": \".*.xml\"}"
Based on the unique indexes currently defined, this is designed to handle many documents in one batch (ie 100 or 1000) as it will lock the indexes for a period of time, it may cause other inserts to wait. Therefore depending on the size and complexity of your document and indexes you should probably spend some time optimising this for your situation.

insertMany

usage from API:
cursor = aCollection.insertMany(List<Document>)

This command is not available from the CLI – utilise the LoadFile or LoadJDBC commands as an alternative.

aCollection is an existing collection in the default database. The list of documents will be applied to the collection, the integrity of the collection will . be maintained and any records that exist with the same unique id or indexed unique id will be be replaced accordingly, please be aware that this will completely replace the document and not update it, therefore it could result in a race condition whereas the last update will be applied. 

Based on the unique indexes currently defined, this is designed to handle many documents in one batch (ie 100 or 1000) as it will lock the indexes for a period of time, it may cause other inserts to wait. Therefore depending on the size and complexity of your document and indexes you should probably spend some time optimising this for your situation.

saveMany

usage from API:
cursor = aCollection.saveMany(List<Document> docs)

This command is not available from the CLI – utilise the LoadFile or LoadJDBC commands as an alternative.

aCollection is an existing collection in the default database. The list of Document will be applied to the collection, the integrity of the collection will be maintained and any records that exist with the same unique id or indexed unique id will be be replaced accordingly, please be aware that this will completely replace the document and not update it, therefore it could result in a race condition whereas the last update will be applied. 

Based on the unique indexes currently defined, this is designed to handle many documents in one batch (ie 100 or 1000) as it will lock the indexes for a period of time, it may cause other inserts to wait. Therefore depending on the size and complexity of your document and indexes you should probably spend some time optimising this for your situation.

saveMany (fuzzy load)

usage from API:
cursor = aCollection.saveMany(List<Map<String,Object>> records, String tableName)

This command is not available from the CLI – utilise the LoadFile or LoadJDBC commands as an alternative.

aCollection is an existing collection in the default database. The list of Maps will be applied to the collection, the integrity of the collection will be maintained and any records that exist with the same unique id or indexed unique id will be be replaced accordingly, please be aware that this will completely replace the document and not update it, therefore it could result in a race condition whereas the last update will be applied. 

Based on the unique indexes currently defined, this is designed to handle many documents in one batch (ie 100 or 1000) as it will lock the indexes for a period of time, it may cause other inserts to wait. Therefore depending on the size and complexity of your document and indexes you should probably spend some time optimising this for your situation.