Just wondering if you could recommend a low-cost identity resolution tool for advanced name matching – preferably some open source data quality software? We’re looking for something that can take into account out-of-order names, misspelled names and name variations in various languages – for example, Francis Julien = Frank Julien = Julien Franciose.
I would prefer to give you this advice. First of all, a quick Web search will reveal a number of open source and/or free options, so, you’re probably in as good a position as I am to answer this question. In addition, like I did 15 years ago, you can spend a little time reading up on generally accepted algorithms (edit distance calculations, n-gramming, data edit rules – check out dataqualitybook.com to get a link to my new book that has chapters discussing these things) and then implement them yourself.
You also have to consider the total cost of ownership associated with open source data quality software and determine whether the upfront effort required to get the product up and running and to find the right expertise to help you adjust the rules in the software offsets the value proposition of investing in tools whose vendors will get you going relatively quickly. I’m not advocating one way or the other, just that you have to think about what is best for your environment.
This was first published in February 2011