shyshka - Fotolia
Data lakes have the potential to improve business outcome with data-driven insights, but there's a tricky part to it. Users need to be able to understand what's in a data lake to be able to benefit from it.
The goal of getting better data lake insights is the driver behind the Okera Spotlight tool, which the San Francisco-based data access and governance vendor unveiled as a preview technology Oct. 31. Okera Spotlight for Amazon Web Services (AWS) provides data classification capabilities, as well as data insights for data lakes that use the Amazon S3 cloud storage service, including AWS Lake Formation, which recently became generally available.
The need for data lake management and governance is a growing trend, Gartner analyst Sanjeev Mohan said. Data lakes provide a centralized structure to collect data, making it easier for users to have a target for business analytics and other analysis.
Making sense of data lake contents
"Many organizations have focused their efforts on collecting data with the impression that once the data is available in one place, they will surely find all answers," Mohan said. "However, if the data is not tagged and identified properly then it is nearly impossible to analyze that data."
Inability to discover data makes it difficult to identify data quality and data privacy problems and then reduce or eliminate them. As a result, problems with finding data reduces trust in data, which in turn translates to lower trust in reports, dashboards and machine learning models.
Sanjeev MohanAnalyst, Gartner
"What separates data lakes from data swamps is data governance," Mohan said.
Gartner is seeing a major uptick in organizations deploying data lakes to deliver ever-increasing use cases. Meanwhile, regulations to comply with data privacy, such as the GDPR, are also proliferating.
"Hence a tool like Okera Spotlight becomes essential before data lakes can be operationalized," Mohan said. "Organizations need to understand what data has been collected, who has been granted access to it and whether its usage complies with proper consent."
Data lake insights
Okera was founded in 2016 with a focus on helping enterprises manage data lakes, said Nong Li, CEO and co-founder of the company. To date, the startup has gone to market with its Active Data Access platform, which focused on governance and compliance.
Now, the new Okera Spotlight tool provides visibility and discoverability to help provide actionable data lake insights.
Okera Spotlight works by pulling audit information from the data lake storage system, which in the case of AWS is CloudTrail. Okera then combines the audit information with its own detection and analysis capabilities, enriching the data with other metadata that exists in the data lake system.
With that analysis, Li said Okera Spotlight creates reports with actionable data lake insights. For example, one of the business questions that Spotlight can help answer is what are the biggest data sets in a data lake that are not being analyzed.
"We have a client that we're working with now where we suspect that they have several petabytes of data that are unused," Li said. "That's the kind of information that we're able to surface for them from a data lake optimization point of view."
Data lake access
Spotlight also provides visibility into data lake access, which can be useful for data governance related to access to sensitive data. A common pattern for many users is to have one portion of a data lake strictly reserved for sensitive data and another area for nonsensitive data. With Spotlight it's possible for users to ensure that the different areas of a data lake are being accessed properly, tracking activity and providing insight into usage, Li said.
The two core questions most users ask about is how to reduce their storage and how to confirm that privacy and access control policies are working.
"We are trying to help administrators surface where they should focus their attention," Li said.
Okera Spotlight is now in beta. Okera said it expects to make it generally available by the end of the year.