Manage Learn to apply best practices and optimize your operations.

Rogue data management

Every company or organization has some of its key data assets managed outside of its business systems. While most companies are permeated with "rogue data", this type data is often difficult to find, and even harder to manage.

Every company or organization has some of its key data assets managed outside of its business systems. While most...

companies are permeated with "rogue data", this type data is often difficult to find, and can be even harder to manage. This article discusses an approach to identifying and managing this data, and provides some insight into some of the challenges associated with rogue data.

One of the more interesting businesses areas I've worked in was an area where there were a large number of statisticians and analytical business users. As I started to take scope of exactly what data there was to manage, I discovered a rather large amount of what I call "rogue data". I consider this data that is maintained outside the control of business systems, aka spreadsheets, and perhaps the most famous database of all, MS Access. Now let's admit it, every company has some of this, but in this case it seemed pretty excessive, and it was in the hands of a rather highly analytical group of professional people. Many of the people in this group who could probably teach an advanced course in MS Excel, and could query a database better than most of the IT staff.

One particular incident I remember was one of these business users testing an interface built for an application. He requested that the data be exported into a comma separated value (csv) format so he could import it into a spreadsheet and 'test it'. He went to work like a concert pianist carefully and intensely pounding the keyboard, manipulating the data as only an expert could do. After about ten minutes of "ahhhhs"' and "hhhhhms", copying data to several tabs in the workbook, multiple pivot tables, VLOOKUPS, and other formulas, and maybe even an incantation that even Harry Potter couldn't conjure up, he proudly declared that a certain data was missing from the export and that certain expected totals were short. Not exactly what you would call a repeatable test case, but you can't get a show like that just anywhere.

So how do you manage rogue data? Well this article provides some guidelines that should help your organization "bring in" those outlaw data rustlers, and maybe even win them over to "the Force".

More information on data management

BI's seven fatal flaws

Expert: Data quality is misunderstood

Moving customer data quality upstream

It is important to make some determination of the scope and nature of this type of data in your organization. You'll need to be a little clever here as you go about this. If these outlaws think you're out to take their spreadsheets away, they may become somewhat opaque. (Sssshhh, there could be one in the cube right next to you) You also have to realize that to some extent this will be a "guesstimate", and an effort you periodically repeat.

You can learn a lot about the rogue data in your organization from the people who build reports, administrative staff, people knowledgeable with interfaces, and finally from the business users themselves. You may be able to look at database logs, or other sources that can point to routinely run queries to help identify it.

Just as important as identifying rogue data is "guesstimating" its turnover or life cycle, and frequency of use. You'll usually find a handful of these things corresponding with a large system implementation because of all the out of scope items that the business users determined they would do themselves. So you go about building a list of this rogue data, its source(s), the type of data it is from a business perspective, its frequency of use, when it started being used, and how long it might continue. This just helps you understand the scope of the problem.

There isn't a cookie cutter approach to managing rogue data because every organization is different. You should work to develop both a strategic and tactical approach that fits into your over all data architecture. Your goal should be to provide the best solution to manage data and meet business needs, not to globally stop all spreadsheets and desktop databases. A positive approach is a winning approach. If you carve out "Thou shall not use spreadsheets to manage business data" on stone tablets, your success is likely to be limited and short lived.

As you go about implementing your plan for each data set, you'll also need to determine what the best "political approach" is. Some solutions are easy sells, or "win-win" scenarios, while other solutions may challenge someone's domain or turf and may require higher level management involvement. Most importantly you should NEVER even begin to think you will conquer all of the rogue data in the universe or that one solution fits all. In reality you'll be lucky to get it identified, and even luckier to tame a few of the wild rustlers.

Strategically, you should identify how each set of data fits into your over all data architecture, this can help determine your approach to managing this data.

Some of this data is similar to data key business systems data, but for practical reasons has to be processed outside of these systems. For example, some system can't handle a certain "unique kind" of widget, loan, order, policy, part, security, etc., so these "unique kinds" are maintained in a spreadsheet or desktop database. The number of these "unique kind" of business data is usually small relative to the more "common kind" that can be processed by the business system. Even if the cost of modifying a key business system prohibits maintaining this data within the system it most likely belongs in, this data can usually be easily integrated with its counterpart with good interface design. (By the way, the primary reason key business systems can't handle "unique kinds" of business data is poor abstraction in the database design.)

Some of this data is often what is called "master data" or "reference data" such as lists of organizational hierarchies, operational hierarchies, product catalogs, suppliers, portfolio managers, etc. There are more systems and better quality systems that can now manage this type of data than ever before. This is especially true in the area of business intelligence or informational systems, where master or reference data management is not as difficult as it was in the past. Making this data available to everyone, and integrating it back into source systems can be difficult.

Some of this data is "intermediate" or what I refer to as "special case" data. This is data that requires highly specialized processing or manipulation by a business person with a unique skill set. It might be that this data is processed only quarterly, or only in response to some unique business change, or in a highly technical or specialized manner such that only customized software development would meet this processing need. There simply may not be systems that provide an affordable alternative to processing this type of data.

Strategically it is important that adequate controls be placed around the processing of all data, (rogue or otherwise), so that regulatory, security, disaster recovery, data quality, and other concerns are addressed.

Tactically it's hard to argue to a business group about the use of spreadsheets and desktop databases and other query tools. They're quick, easy, cheap, and as reliable as the users, which in my sample case were people who could perform certain types of higher math better than I could. Some of the reasons not to use spreadsheets and desktop databases to manage key business data include:

  • Compliance – It is unlikely a real audit trail of who created and updated what data exists for this rogue data. A date or time stamp associated with a file means nothing. An author associated with a file means nothing either. These tools weren't designed with compliance in mind.
  • Cost of Processing - Business systems exist to process core business data. There is a tremendous cost for highly specialized business people to "baby-sit" this data relative to data entry persons or knowledge workers.
  • Disaster Recovery - When somebody's hard drive crashes and key business data is lost, can it be recovered, and at what cost?
  • Cost of Integration - This data usually has to be worked into systems and reports, often at the cost of double data entry or with specialized programming code.
  • Systems Development Processes - If your organization has a prescribed Systems Development Life Cycle, the use of these tools generally violates best practices for software or systems development. In reality, spreadsheets and desktop databases are a "down and dirty" application to handle business data. They never go through the rigor of review that other applications go through.

I consider tactics as the "how to get it done" part. I feel there are short term and long term solutions because in reality it takes time to execute a strategy. Some of the long term solutions include master or reference data management solutions, improved customer data management and a refined understanding of a what a customer or party is, enhanced requirements analysis and vendor requests, and simply better software development.

Some of these long term solutions are also short term solutions to the extent that they can be implemented sooner. For example, if one of the rogue data sets is such that there is a business application that processes similar data, but not for this "unique kind or type", then tactically you might work with the business and vendor to clearly define what data structure is needed so that the application can process this, and other future "unique kinds" as well.

If you're in a larger organization, you might be able to arrange for a development group that can develop application add-ons to address some of the needs for rogue data. This assures that some consistent and hopefully best practices are put into place. This must be done in a manner that assures the business needs are met in a rapid fashion, otherwise the business users are likely to demand their "application of choice".

Inevitably some of this data will remain managed through spreadsheets and/or desktop databases. For this data, you should define policies or procedures to use these applications to manage this data. This might include such things as:

  • Disaster Recovery – Maintaining this data on backed up network drives rather than local drives, and implementing immediate down stream processing can help reduce some of the business continuity risks associated with rogue data. You may be able to arrange limited access network drives or folders for specific business user groups.
  • Compliance Processes – You can place some compliance practices around this data, such as validation by secondary business users, transferring validated copies to appropriate personnel immediately after processing, etc.
  • Integration – It is important to design and implement integration points or services that work with all data. By emphasizing use of the data, the business can more easily understand how rogue data fits into the business and the level of effort it takes to manage it.

It is important to have a clearly defined strategy, measurable objectives, and commitment in order to manage and reduce the amount of rogue data. It is also important to be a solution provider rather than the "revelator of remorse". By identifying your rogue data and determining, on a case by case basis, the best solution, you can help to "corral" your rogue data in a manner that fosters participation from your business users and creates a win-win situation.


Dig Deeper on Data stewardship

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.