Just wondering if you could recommend a low-cost identity resolution tool for advanced name matching – preferably some open source data quality software? We’re looking for something that can take into account out-of-order names, misspelled names and name variations in various languages – for example, Francis Julien = Frank Julien = Julien Franciose.
I would prefer to give you this advice. First of all, a quick Web search will reveal a number of open source and/or free options, so, you’re probably in as good a position as I am to answer this question. In addition, like I did 15 years ago, you can spend a little time reading up on generally accepted algorithms (edit distance calculations, n-gramming, data edit rules – check out dataqualitybook.com to get a link to my new book that has chapters discussing these things) and then implement them yourself.
You also have to consider the total cost of ownership associated with open source data quality software and determine whether the upfront effort required to get the product up and running and to find the right expertise to help you adjust the rules in the software offsets the value proposition of investing in tools whose vendors will get you going relatively quickly. I’m not advocating one way or the other, just that you have to think about what is best for your environment.
Related Q&A from David Loshin
Learn how to get senior management to buy into data governance. Get tips on selling data governance policies and processes to executives who can ...continue reading
Learn how often companies should update their data quality strategy. See how changes in data quality problems create new challenges and how revising ...continue reading
Learn about emerging data governance trends for 2011, including more use of data governance tools and data quality metrics and scorecards. Find out ...continue reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.