News Stay informed about the latest enterprise technology news and product updates.

Cleansing data after an acquisition often calls for data quality software

After an acquisition saddled MSC Direct with a serious duplicate customer data problem, the direct marketer decided it was time to tap the commercial data quality software market.

Founded in 1941, and with hundreds of thousands of active customers, it's not surprising that MSC Direct would have a data quality problem.

"Any time when you're dealing with that many customers and there are many touch points, there's bound to be some duplications," said Patrick Hashimoto, new business development manager at MSC.

The problem came into even sharper focus when, in March 2006, the Melville, N.Y.-based firm -- which distributes more than half a million different steel-cutting tools and other industrial supplies to manufacturing companies -- acquired rival J&L Industrial Supply.

Not only did J&L have duplicate customer records of its own, but the two companies had many of the same customers, creating even more duplicates when the firms' customer data records were merged.

It was then, while manually mapping and transferring J&L's customer data to MSC's own AS/400 database (a process that took close to a year), that Hashimoto realized the extent of his data quality problem and the need for outside help. It's a common occurrence, according to Ted Friedman, an analyst with Stamford, Conn.-based Gartner Inc.

Acquisitions highlight need for better data quality

Acquisitions, and the integration of disparate data sources that go along with them, often cause companies to realize the need for improved data governance practices, including better data quality, Friedman said, though not necessarily the need for commercial rather than homegrown tools.

"[However] organizations that are in the mode of repeated acquisitions over time tend to gravitate to [vendor] tools because of the benefits of best practices, re-use, and leverage that the tools bring," he said. "This is particularly true in the case of data quality tools, where requirements like matching can get rather complex and it is difficult to account for all possibilities and exceptions when writing your own code to get that work done."

Increased economic pressures are also forcing companies to focus on data quality. Though he didn't put a figure on the amount of money it was costing MSC, for example, Hashimoto said poor customer data quality caused a number of operational problems, including shipping orders to incorrect or out-of-date customer addresses and crediting the wrong sales agents with commission rewards.

With companies like MSC facing more pressure to rein in bad data, vendors have responded with increasingly mature and complex data quality tools. Indeed, no fewer than five vendors were named "leaders" in Gartner's 2008 data quality Magic Quadrant report -- including BusinessObjects, Data Flux and Trillium Software – and Friedman said he expects the market for data quality tools to grow by a 15% compound annual rate over the next several years.

As for MSC, after issuing an RFP and narrowing the competition to three vendors, the company last year decided to go with data matching and standardization software from Pitney Bowes Group 1 Software. Hashimoto, who declined to name the other vendors he considered, said Lanham, Md.-based Group 1, in addition to proving its ability to connect to MSC's AS/400 database, did the best job of tailoring its pitch to MSC's specific duplicate customer data problem.

"The other vendors came in and tried to sell us more than what we needed," Hashimoto said. "We had a fixed idea as to what we needed and Group 1 met that need exactly. I'm sure Group 1 could have competed with the other stuff we didn't need at the time, but at least they stuck to the parameters of what we were looking for."

Two software licenses, one vendor

Since deployed, the Group 1 data quality software serves as a "gatekeeper" to MSC's customer database, Hashimoto said. When a call center agent tries to create a new record for an existing customer, the system sends an alert, preventing the creation of a duplicate record.

The company also built and implemented a homegrown tool with "Group 1 as the back end" to identify already existing duplicate customer data records, he said. Actually reconciling the duplicates is left to the MSC sales agents.

"We built an interface for our field sales associates to go in, see all the duplicates within their territory, and [gave] them the onus of cleaning them up in a very systematic and easy fashion," Hashimoto said. In some instances, the system may flag records that aren't truly duplicates, or there may be a business reason to have duplicate entries, he said, "but at least we know we've done our due diligence."

But that's only half the story, he said. MSC actually bought two software licenses from Group 1, one to use on live production data, as described above, and another to serve as a "staging area" where MSC can test and improve its matching logic.

If, for example, MSC notices specific types of duplicate customer data slipping through the system, Hashimoto and his team can tweak the software's matching logic in the test environment to catch them, then test and deploy the changes only when he's confident they're ready and won't have any adverse effects on the live data.

"That plays into the tremendous flexibility of this tool," he said. "It's not one match code or one match logic. We can continually tweak it and enhance it based on live data that comes in."

Hashimoto said that, in the end, improved data quality has led to better customer service, both in terms of communicating with customers on the front end and managing their accounts on the back end. And thanks to training from Group 1, MSC doesn't often need the vendor for support either.

"We've gotten pretty good training [from Group 1] the couple of times they were here in terms of how to build the matching logic," Hashimoto said. "So I think we're in a pretty good place in terms of using their software. It's pretty much a self-running, stable application."

Dig Deeper on Data quality techniques and best practices

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.