Skip to main content

Sample screening hit

Updated over a year ago

The Person

It's always easier to understand something if we have an example. Let's say we have a customer named Shehadeh Rafiq Deha .

Next we'll want to make sure he isn't on our sanctions list. To do that, you'll want to screen his name against the sanctions database.


1. Make an API call

Your first step will be to make an API call to screen for the name Shehadeh Rafiq Deha.

It could be you'll find an exact match immediately, but much more likely is that your customer's name, even if he is on the sanctions list, won't match exactly. That's especially true if your customer's name is derived from a language that doesn't use the English alphabet. Or, if Shehadeh is actually a shady guy, he might have been smart and changed the spelling of his name so you can't match him so easily.

Which is why we'll need to search for similar names.


2. Find similar names

There are lists and lists with rows and rows of names. If we searched every single entry and compared it to Shehadeh Rafiq Deha , it would take all day because there are thousands and thousands and thousands of rows in these databases. Not so efficient, right? So we won't do that.

Instead, what we do is first identify names that are really similar using something called Elasticsearch. We won't get into the technicalities here, but essentially Elasticsearch lets us define the parameters we want to narrow down the names we'll compare Shehadeh Rafiq Deha against.

For example, Elasticsearch would skip the name "Princess Sarah" (because there's almost no similarity with Shehadeh Rafiq Deha) but would pick out SHEHADEH, Rafik because both names are similar to 2 out of the 3 names from our original customer.

Next, to find out how close SHEHADEH, Rafik and Shehadeh Rafiq Deha actually are, we'll need to calculate a fuzziness score.


3. Calculate a fuzziness score

There's actually 2 steps you need to complete in order to calculate the fuzziness score of SHEHADEH, Rafik. For all of this, we'll use a fuzziness measure called the Jaro Winkler algorithm (which gives us a score between 0 and 1 stating how similar two names are).


3.1 Calculate the full name score

For the full name score, we'll measure how close Shehadeh Rafiq Deha is to SHEHADEH, Rafik as full names. But we'll need to do some prep work.

  1. Try our customer name Shehadeh Rafiq Deha in every combination of name order.

    • shehadeh rafiq deha

    • shehadeh deha rafiq

    • rafiq deha shehadeh

    • rafiq shehadeh deha

    • deha rafiq shehadeh

    • deha shehadeh rafiq

  2. Then take out the spaces between all the combinations we came up with for Shehadeh Rafiq Deha.

    • shehadehrafiqdeha

    • shehadehdeharafiq

    • rafiqdehashehadeh

    • rafiqshehadehdeha

    • deharafiqshehadeh

    • dehashehadehrafiq

  3. After we've put together our name in different combinations without spaces, we'll use the Jaro Winkler algorithm to generate a closeness score when we compare the two sets of names. The closer the number is to 1, the more close the match.

  • "shehadehrafiqdeha" and "shehadehrafik" (score: 0.93)

  • "shehadehdeharafiq" and "shehadehrafik" (score: 0.91)

  • "rafiqdehashehadeh" and "shehadehrafik" (score: 0.53)

  • etc.

After we try all of the combinations we prepared, we'd take the highest score and call that our full name score. In this example, our full name score would be 0.93.


3.2 Calculate the composite score

Now we need to calculate the composite score, which is almost the opposite of the full name score. Instead of putting all the names into one big string of letters, we look at each name separately (as long as the name has more than 2 letters) and give each of those names a score. Our composite score will then be the average of those scores.

  • [ shehadeh rafiq deha ] and [ shehadeh rafik ]

    • shehadeh and shehadeh match completely. So the score is: 1.0

    • rafiq and rafik are almost a complete match. This is scored with a separate function which depends on the Jaro Winkler and Soundex algorithms. But here we'd get the score: 0.9232

    • deha isn't scored against anything. So the score would be: 0.0

    If we average our above scores of 1, 0.9232, and 0, that will give us our composite score 0.641 when the names are in this order: shehadeh rafiq deha. Now we'll try another name order and calculate the composite score for that.

    • [ shehadeh deha rafiq ] and [ shehadeh rafik ]

      • shehadeh and shehadeh match completely. So the score is again: 1.0

      • deha and rafiq don't match. So the score is: 0.1152

      • rafiq has no other names to score against. So the score is: 0.0

      So the score for this ordering is 0.372 (the average of 1, 0.1152, 0).

We would continue making composite scores for every possible order of the names.

At the end, we see which combination gave us the highest score and use that for our composite score. So, in the end, our composite score would be 0.641.


3.3 Calculate the final fuzzy matching score

Once you've done the full name score and the composite score, this part is easy. You just pick whichever of those numbers is highest, and that's your final score. Using our example here:

  • Full name score: 0.93

  • Composite score: 0.641

Our full name score is highest, so, voila β€” that's our final score: 0.93.

❗ The above example works with the assumption that the fuzzy matching threshold is 92. So, Shehadeh Rafiq Deha would therefore be considered a match.


4. The API sends you info back

In step 1, you sent the name via API. Then all of the fuzzy scoring logic happened in steps 2-3. Now the final step is that the API will send you back the calculated final score. Voila!


A few final notes

  1. The names in the algorithm aren't commutative. That's a fancy way of saying if we reversed the previous example, comparing shehadeh rafik to shehadeh rafiq deha, then we wouldn't necessarily get the same final score. The full name score would still stay the same in this example, but the composite score would be a lot higher because there are simply less names to compare β€” so both of those 2 names would have a counter-match.

    • Composite score in this new case would include the following orderings and scores.

      • [ shehadeh rafik ] and [ shehadeh rafiq deha ]

        • shehadeh and shehadeh match completely, the score: 1.0

        • rafik and rafiq are almost a complete match, score: 0.9232

        Score for this ordering is now 0.9616 or the average of 1, 0.9232

      • [ shehadeh rafik ] and [ shehadeh deha rafiq ]

        • shehadeh and shehadeh match completely, the score: 1.0

        • rafik and deha don't match, the score: 0.1152

        Score for this ordering is now 0.5576 or the average of 1, 0.1152.

      • etc. until all orderings of the longer name are scored.

    • You get the picture. In this case, the final score will now be 0.9616.

  2. Checks have a 100% match if the sanctioned entity name fully contains the customer's name you're screening. If a name Abdulla is screened and a sanctioned person has a name which contains Abdulla, then this is a full match, as per the algorithm OFAC itself uses.

  3. Some names are so long that it's not possible to look at all the orderings under one second. Names that have more than 7 name parts are handled differently. This is rare, but the cases include some legal entity names or sometimes person names. One example of such a name is Nesrine Bent Zine El Abidine Ben Haj Hamda BEN ALI.

  4. Our system is optimised to do the checks faster, but this document gives a rough outline of what is happening when calculating the fuzziness score.

Did this answer your question?