Introduction

Welcome to EntityStream Custodian.

We created a Full Featured Master Data Management solution with lower total cost of deployment and ownership. You can deploy it in the cloud, on premise or a combination of both, and the migration paths between them are easy.

The architecture for EntityStream is very simple we use monsterDB as the data storage system underneath the application and the application is 100% web based. We have attempted to include functionality in the system that enables its users to build a full multi domain data model with fuzzy and exact data matching (automated and attended). The solution is based on 7 years of fuzzy matching effort that was learned over 15 years working alongside financial services, healthcare, government and consumer packaged goods manufacturers.

We aim to provide pre-built data models for most applications, but the database and system aims to reduce the reliance on bound data models as this causes continual issues in upgrading, changing and evolving businesses, so although we provide an ability to build a model (or models in domains) this is not a strictly adhered to model in the system and mostly this metadata is used as a guidance system for the matching engine and as a layout guide for the User Interface to be able to interact with the client user better.

All storage in EntityStream is done via monsterDB, which is a distributed collection based NOSQL engine that uses pipelines and heavily relies on Queues and Streaming to perform its work. We know because we wrote it too and made it available as a full open source system on GITHUB.

Task Management

We deal with task management using what we call the pair review methodology, this is a relatively un-discussed method to review data matches:

The premise on the methodology is that a user however smart is more able to make more accurate decisions in a faster turnaround if they are presented with a single pair of records to compare, rather than the much more complex “unpicking” of detail required to assess whether a full cluster of records should be matched and how to break that cluster down as clearly in the diagram a link between supplier 1 and supplier 5 was never considered by the matching process and yet potentially the end result could yield this as a strong linkage. In the pair review the user is presented with the same actual linkages and is only asked to decide on these – this reduces errors and massively reduces the chance of snowballing groups of records in the system.

This method was used by a major financial services organisation under our advise and many years later no the same method is still being used. The orders of magnitude of productively between the cluster method and the pair method was in fact assessed to be over 10x improvement, reducing decision time from 2-7 minutes to 10 seconds.

Set Focus

The concept of set focus was developed after the pair review methodology was conceived as a way to further speed up the decision process. We found that by allowing the user to focus on related record pairs on one screen at the same time enabled them to find the best match and reject the obvious issues. This is particularly interesting as a concept when you are not just looking for matches between records (companies, persons, etc) but also when you are looking for relationships – ie family groups, legal entity ownership structures etc. Often matching will expose a potential match that in the end when other records are considered in a focused study turn out to be a parent child relationship.

Hierarchy Management

Custodian allows its users to define multiple formal and informal hierarchies, a subject discussed later in the documentation, but essentially when a relationship is defined between two records, it may be essential to collect further information on this link such as ownership%, start date of employment etc. As such we enable the user to define multiple different hierarchies that enable them to collect different type of data for each. In addition some hierarchies are formal – such as company can only be owned by another company (not true!) or a person reports to another person in an employment roles. But in other hierarchies a product can feed into a brand, category or product type etc, the former of these hierarchies is a formal hierarchy and the later an informal one.

Multi Domains

One of the very overused expressions in the Master Data Management space is that of multi domain modeling, in fact very few of the technologies on the market really understand the expression and even fewer actually enable its users to model it well. We created a domain model that enables its user to model like entities together in an interrelated space and also allow those entities to be related across the domains, thus providing a clear delineation in the data governance sense but retaining a proper cross domain model and reducing the need to replicate and distribute master data.

Summary

Custodian is a full featured, full service MDM solution that enables your business to get up to speed very quickly and whilst the space is very complex we have enabled our users to get moving much faster and reap the rewards of data governance in a shorter time to market with a reduced total cost of ownership (TCO)