Open source data quality software to help with name matching

Just wondering if you could recommend a low-cost identity resolution tool for advanced name matching – preferably some open source data quality software? We’re looking for something that can take into account out-of-order names, misspelled names and name variations in various languages – for example, Francis Julien = Frank Julien = Julien Franciose.

    Requires Free Membership to View

I would prefer to give you this advice. First of all, a quick Web search will reveal a number of open source and/or free options, so, you’re probably in as good a position as I am to answer this question. In addition, like I did 15 years ago, you can spend a little time reading up on generally accepted algorithms (edit distance calculations, n-gramming, data edit rules – check out  to get a link to my new book that has chapters discussing these things) and then implement them yourself.

You also have to consider the total cost of ownership associated with open source data quality software and determine whether the upfront effort required to get the product up and running and to find the right expertise to help you adjust the rules in the software offsets the value proposition of investing in tools whose vendors will get you going relatively quickly. I’m not advocating one way or the other, just that you have to think about what is best for your environment.

This was first published in February 2011

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: