In my previous post I mentioned that duplicate detection could be better in CRM 4.0. Few days ago I found a nice article on how to enhance Duplicate detection. It describes how to use the Soundex algorithm for duplicate detection.
Soundex will detect 3 duplicates in my “Coca Cola problem”. Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English, so its use for other languages is questionable.
In my example I decided to use double duplicate detection. I am using Soundex for phonetic detection but also a strip algorithm that will remove all non characters from a crm entity name. Strip algorithm will also remove all company type abbreviations. Algorithm is able to detect duplicates for first 4 names in my “Coca Cola problem”. Coca Cola Beverages company is still undetected. By using the same principle you can create your own duplicate detection algorithms.