News Stay informed about the latest enterprise technology news and product updates.

Data Quality and Enhancement: Similarity, Self-Selection and Locality

There are several ways we can enhance data to improve its data quality and value to support our business intelligence conclusions.

This article originally appeared on the BeyeNETWORK.

Do birds of a feather flock together? This saying is commonly used to indicate a degree of proximity of individuals based on their similar characteristics. But we can delve a little deeper into this saying, as it conveys a message that has some bearing on ways that we can enhance data to both improve its data quality and value, as well as, facilitate ways to communicate our business intelligence conclusions.

Consider what motivates an individual to select a particular location to live. Every location has its particular characteristics that make it appealing to any specific person, although the metrics used to describe “appealing” may differ significantly based on demographics. For example, for an upper-middle class family with young children, the determining factor may depend on the quality of the schools; an unmarried professional might prefer a hip, urban setting with an active nightlife; for a poverty-class family, the determining factor may be affordability. Therefore, we will probably determine that if we characterize locations by their different characteristics, we are likely to find that the people who live there also share many characteristics.

Or is it the other way around? Does the congregation of people with similar needs determine the characteristics of the neighborhood? The urban location may have taken on its “hipness” because of the multitude of young professionals that live there, and the schools in the upper-middle class areas may be better because greater property tax collections provide better school funding. The answer is that it is probably a bit of both, since clearly, over time, areas and neighborhoods change.

Basically, the congregation of similar individuals within a particular locality is an example of self-selection. When a critical mass of like-minded individuals assembles in a place, that place adapts to both reflect and satisfy those individuals’ needs. From an analytical standpoint, this introduces some interesting opportunities:

  1. At some point, we can associate behavioral “psychographics” to a location based on its residents, which provides some flexibility (with some caveats) in personalization.
  2. The patterns of location evolution can be analyzed and used to predict how “micro-populations” change over time.
  3. We can use geographical information systems that enable the projection of behavior characteristics and trends onto maps in a way that can convey a message much more effectively than bullet points and bar charts can.

Locality and Psychographics

This is a simple concept: by segmenting the populations within a limited geographic space based on some evaluation criteria, “personalities” of the largest segments emerge as the overriding characteristics of the people comprising those segments. The relevant challenge is in determining the evaluation criteria used for segmentation. For a large organization with many customers those criteria might be a combination of readily available (sometimes, even self-submitted!) demographic data with behavior patterns culled from historical transactions. If you have ever answered the survey questions (e.g., “What is your age?” “What is your ZIP code?” “What is your annual salary? etc.) on a product registration card, you have provided exactly the kind of data that contributes to this knowledge base. Other information is readily available from other sources, such as information vendors or even the government. (There is a plethora of data available from the U.S. Census Bureau, for example.)

When all is said and done, the tabulated statistics projected across locations provides an ability to discern groupings of individuals across relevant demographic criteria (e.g., age, gender) and psychographic behavior (e.g., “drives to work,” “owns two cars,” “plays golf twice a month”). In turn, this information can be used to develop plans of action when attempting to convey a message. For example, an approach to reaching out to a target audience that commutes to work via public transportation might avoid radio advertising and opt for messages displayed on public billboards along the public transportation routes.

Micro-Population and Evolution

For the most part, the character of a neighborhood does not change significantly over time; rural communities tend to remain rural, while it is rare that an urban area will be transformed into farmland. Yet the denser the population, the more likely it is that behavioral variations will exist within a geographical area. The greater the variation, the greater the chance is that the overwhelming “signature” of the area will change more frequently or dramatically over time.

Even so, I suspect that the ways that micro-populations change are probably very similar—consider our hip young urbanites and the transformation of low-rent neighborhoods to trendy hot spots. Therefore, using historical analysis of the geographic evolution patterns can provide insight into predicting the next locations whose personality will change. Thinking about this will probably suggest numerous ways to exploit this knowledge, especially in terms of investment or marketing.

Conveying the Message

Here is the point I want to make: the results of any analysis that is based on affinity to a geographic region should be easily communicated visually using maps. To quote another old saying: “A picture is worth a thousand words.” This is especially true if that picture is a map with meaningful colors and insets. Converting analysis relevant to geographic spaces into a visual framework is what Geographical Information Systems (GIS) are intended to do. In future articles, I hope to explore the value that can be derived from GIS systems to enhance a business intelligence program.

David LoshinDavid Loshin
David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement , Master Data Management, Enterprise Knowledge Management:The Data Quality Approach  and Business Intelligence: The Savvy Manager's Guide . He is a frequent speaker on maximizing the value of information. David can be reached at or at (301) 754-6350.

Dig Deeper on Data quality techniques and best practices

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.