Quality Rules (Tuning)

Quality rules are defined in the “rules” section of the schema.  mostly these are supplied to Custodian from the cloud service when it is started, but they can also be customised in your installation. Quality rules are manipulation of scoring.

The rules can be maintained in the User Interface, the section below described the function of the different types of rules.

There are 7 classification of dynamic rules and these are described here.

ER – Equivalent in Rule

ER dynamic rules are to persuade the matching process that two (or more terms) are functionally equivalent, it is often used to signify where the matching algorithm should consider the terms to be the same meaning.

{
"rulePurpose": "MatchCompanyName",
"type": "ER",
"parent": "",
"items": [
"ACCOUNTING",
"ACCOUNTANCY",
"ACCOUNTANT",
"ACCOUNTANTS",
"ACCT"
]
}

items is a list of equivalent terms – please use UPPERCASE as the system will process all matching comparisons in uppercase and this aids in the process.

parent is not used in ER rules and should be included as an empty string.

NR – Not Equivalent in Rule

NR dynamic rules are very important in preventing false positives in the match system, often you will see similar names such as “Robert Haynes 1st” and “Robert Haynes II”, clearly from a human understanding of these two person names they are not the same person, in fact they are most likely a father-son. Matching rules often fail to notice the differences and it can lead to false matching. If you also consider a more subtle example “Haynes LLP” and “Haynes LLC” – I think we can agree that these again are very similar companies and could cause a positive match, but in reality they are different legal forms and there is absolutely no case for considering them as a match – they are in fact 0% similar. To facilitate this NR rules are used.

Please note some more common rules are already pre-built into the match system ie Mr & Mrs and there is no need to manipulate the system in these more obvious cases.

{
  "rulePurpose": "MatchCompanyName",
  "type": "NR",
  "parent": "",
  "items": [
    "LTD",
    "LLP",
    "PLC",
    "INC",
    "LLC",
    "SA",
    "BV",
    "LLC",
    "CORP"
  ]
}

items is a list of non-equivalent terms – please use UPPERCASE as the system will process all matching comparisons in uppercase and this aids in the process.

parent is not used in NR rules and should be included as an empty string.

Scoring note: in the event that two records are compared together and one of them has a NR item in a value and the other does not for the same dynamic rule, they will still be considered a match, if neither has a NR item then they will similarly be considered – however if they both have NR items from the same dynamic rule and the item values are not the same, then the match will be discarded. 

IR – Ignore in Rule

Ignore in rule is far more subtle than NR, this will not reject potential matches but it will suggest to the match algorithm that a certain term is not important and likely a noise word. Please do not confuse the concept of noise words with equivalencies in matching – often people use IR as a method to ignore commonly abbreviated words such as LTD and Limited, this should not be done with an IR rule, as this would cause the system to ignore key tokens in the value being prepared, instead use a ER or TR rule.

{
  "rulePurpose": "MatchCompanyName",
  "type": "IK",
  "parent": "",
  "items": [
    "MR", "MRS", "MS", "DR"
  ]
}
TR – Translate in Rule

Translate in Rule is used to replace values in the record with other values, it can be used to replace commonly misspelt terms that should be standardise, such as England -> United Kingdom. Different to the ER rule where the values will be considered to be the same for match comparison, with the TR the values will actually be replaced in the record with the parent (or stem).

{
  "rulePurpose" : "MatchCountry",
  "type" : "TR",
  "parent" : "Algeria",
  "items" : [ 
    "Algeria", 
    "DZ", 
    "DZA", 
    "012"
  ]
}

items, is the list of values to search for, parent is the value to replace it with.

EK – Equivalent in Key

EK dynamic rules are to persuade the indexing process that two (or more terms) are functionally equivalent, it is often used to signify where the indexing algorithm should consider the terms to be the same meaning. This will generally result in multiple output index keys for the record.

{
"rulePurpose": "MatchCompanyName",
"type": "EK",
"parent": "",
  "items": [
    "LTD",
    "LIMITED"
  ]
}

In this example where the type of rulePurpose is MatchCompanyName – please excuse the incorrect property name, the value is actually one of the matchClass types defined here.

items is a list of equivalent terms – please use UPPERCASE as the system will process indexing comparisons in uppercase and this aids in the process.

parent is not used in EK rules and should be included as an empty string.

IK – Ignore in Key

Ignore in rule will suggest to the index algorithm that a certain term is not important and likely a noise word. 

{
  "rulePurpose": "MatchCompanyName",
  "type": "IK",
  "parent": "",
  "items": [
    "GROUP"
  ]
}
TK – Translate in Key

Translate in Rule is used to replace values in the record with other values, it can be used to replace commonly misspelt terms that should be standardise, such as England -> United Kingdom. Different to the EK rule where the system would general multiple keys for each record because of the EK, with TK the term will be replaced with the parent value before the key is generated.

{
  "rulePurpose" : "MatchCountry",
  "type" : "TK",
  "parent" : "Algeria",
  "items" : [ 
    "Algeria", 
    "DZ", 
    "DZA", 
    "012"
  ]
}

items, is the list of values to search for, parent is the value to replace it with.