Anyone who's ever completed the paperwork for a home mortgage knows all too well the stacks of forms -- and wealth of data -- involved in the process.
Perhaps few know this better than the staff at LoanPerformance, a San Francisco-based mortgage risk intelligence firm and a subsidiary of First American CoreLogic Inc., which aggregates and analyzes data from mortgage lenders. The firm has historical data on more than 100 million home loans, and banks and mortgage companies use that data to better understand their loan portfolios and benchmark themselves against the industry, according to Carlos Santiago, vice president of data content. But in order to provide that information, LoanPerformance regularly has to load and normalize incoming data feeds from its many clients, usually delivered in flat files. This was a time-consuming process, Santiago said -- until the company installed data integration and data profiling software.
"We were hand-coding a lot of the routines and scripts that we needed to interpret the data, translate it and normalize it into a common format," he explained. "Every time we got a new data source, it was like a software development project."
The lengthy process was creating a backlog of client files -- and unhappy customers. So, last year, LoanPerformance began looking for tools to help improve this process -- focusing first on extract, transform and load (ETL) tools that could help speed data integration. Many of the ETL tools the company looked at fell into two distinct categories, Santiago said. On one end of the spectrum were "skinny" ETL tools, which had only very basic functionality; on the other were "elephant guns," or more robust platforms that were more difficult to use and more expensive. Somewhere in the middle was Austin, Texas-based Pervasive Software Inc., which had the right combination of data integration features for the price and could handle LoanPerformance's high volume of data.
The team boiled down their choices to Pervasive or Redwood City, Calif.-based Informatica Corp. and completed proof-of-concepts with demo versions of both tools. That's when they discovered Pervasive's data profiling features, Santiago said. The data profiling tool uses customized business rules to analyze data files and identify incomplete database fields, inconsistent field formats or other problems.
"In a very easy way, you could load a file in a format you've never seen before, get a schema, code it up really quickly, and then push a button and automatically generate all these metrics on fields," Santiago said. "To code that up -- especially for a big file with 10 million records -- would take a long time to code and a long time to run."
Though Informatica also offers data profiling, LoanPerformance chose the Pervasive tool, based on functionality, usability and price, Santiago said. The firm completed installation last fall and now uses the Pervasive tool to process incoming files. As hoped, the ETL tools significantly improved integration time -- processing an average incoming data file in approximately a quarter of the time it used to take, he said. The data profiling features also add value. Before data profiling, problems with incoming files might not be evident until erroneous data was live in the LoanPerformance application, he said. Now, upfront data profiling helps the team better identify problems or changes in the incoming data before it's processed. If there are issues, LoanPerformance passes the information back to clients so they can fix problems in the source systems producing the data.
"This is data coming from our clients' business-critical data warehouse systems," Santiago said. "Oftentimes, they don't know they've got these data quality problems, and they're grateful for having these issues pointed out."
The end results -- faster integration of incoming data files, better data quality, and improved customer service -- are helping LoanPerformance maintain a competitive edge, he said.
Data profiling software evaluation and implementation advice
The administrative interface of a data integration and data profiling tool may seem like a small part of the overall technology decision, but usability matters a lot to the people who will be working with the tool on a day-to-day basis and could make or break its long-term success, Santiago advised. That's why the ultimate end users should be "heavily" involved in the evaluation, he said.
And though data profiling might start as an afterthought in an integration project, as it did with LoanPerformance, he urged others to consider the benefits.
"Upfront profiling can save much wasted time and wasted work," Santiago said, "and it can have such a positive impact on data quality downstream."