What is Unsupervised Data Integration?
A Faster better way to Integrate
Unsupervised Data Integration (UDI) is a technique that is used to quickly evaluate how new data fits into your data landscape with the minimum amount of effort and time lapsed.
UDI provides a mechanism to evaluate data as it is discovered in realtime and plug this new data into your landscape far quicker, and UDI uses advanced techniques to ensure that data can be classified, verified, matched and published to downstream system in seconds of it being created or changed.
Better than Traditional Integration (ETL) Tools
Traditionally Integration Technology tried to glue together datasets and information by complex mapping, at both a high level structural and low level field. This approach whilst very effective where we could afford the time to manually coordinate data at such a degree is not effective in a “bigger” data world.
Instead of using complex Extract, Transform and Load technologies (ETL) that are expensive to acquire and have a high cost of ownership. These more traditional approaches require constant review and versioning to ensure data stays aligned.
So once a source of data is identified, either it is proposed to the system or the system finds its through fie system monitoring, the source is assessed using a set of complex rules that have been learnt over the history of the product and the installation. As we learn more about your data, we are more accurate in our assessment of the usefulness of data sources and the meaning of the information encapsulated in them.
Identity shows us the type of data and the fields of information in this data, for example does a field contain common first names, or salutations? This plus the metadata associated with the field itself tells us the likely useable values and how this fits with the jigsaw of other information in the landscape.
We collate information using match rules that tell us if data we have acquired is similar to others and if the profile of the company or person is overlapping with a previously known entity. This enables us to assimilate a golden view of the data and a view that is publishable to downstream applications.
Not all data is useful and we understand that quality of attributes and profiles needs to be checked regularly, UDI will ensure that only data meeting certain standards of completeness is considered for any purpose and our data firewall ensures that any aspect published via our firehose technology is fit for purpose.
How the pipeline works
The entire end to end process is a Micro-service framework that allows us to plug together complex algorithms and modules to enable unlimited extension. Data is assessed, collated, validated and published in near real time, which mean that once we can make a decision on the effectiveness of the data, it will flow out to our consumers in seconds.