What data lake governance challenges do organizations face?

Expert Anne Marie Smith explores challenges an organization may face when apply data governance policies to data lakes -- and the benefits of doing so.

Data governance applies policies, standards, practices and processes to manage data and enable effective use of high-quality data across an organization. If an organization wants to have high-quality data in its data lake and achieve high-quality results, it needs to engage in proper data lake governance.

Data lakes pose many challenges across all the disciplines of enterprise data management. Some data lake governance challenges could include:

1. Identification and maintenance of correct sources for data in the lake (system of record, business owner of data elements, obvious redundant data in the lake causing issues, etc.).

2. Metadata management issues (correct data definitions for data in the lake; conflicts between valid data definitions caused by issues in ownership or stewardship assignments; data standards applied or not applied to data before or while stored in the lake, causing issues for analytics; etc.).

3. Lack of coordination between a data lake governance program and data quality efforts can result in poor-quality data entering the data lake. This may lead to inaccurate results when the data is used for analysis and decisions, causing loss of confidence in the data lake and a general distrust of data across organization.

4. Lack of coordination between data governance and data security, where standards and policies for protected data that aren't applied properly causes issues with access to sensitive data or data protected by regulations.

5. Conflict among business units or departments that use the same data lake. Different departments may have different business rules for similar data, applying differing policies or standards to data stored in the data lake and resulting in an inability to reconcile data differences for analytics.

In conclusion, without data lake governance, organizations may find themselves with a data swamp or data quagmire.

