Dark data is digital information that is not being used. Consulting and market research company Gartner Inc. describes dark data as "information assets that an organization collects, processes and stores in the course of its regular business activity, but generally fails to use for other purposes."
Many times, an organization may leave data dark for practical reasons. The data may be dirty and by the time it can be scrubbed, the information may be too old to be useful. In such a scenario, records may contain incomplete or outdated data, be parsed incorrectly or be stored in file formats or on devices that have become obsolete.
Increasingly, the term dark data is being associated with big data and operational data. Examples include server log files that could provide clues to website visitor behavior, customer call detail records that incorporate unstructured consumer sentiment data and mobile geolocation data that could reveal traffic patterns that would help with business planning.
Potentially, this type of dark data can be used to drive new revenue sources, eliminate waste and reduce costs. As a result, many organizations that store dark data for regulatory compliance purposes are using Hadoop to identify useful dark bits and map them to possible business uses.