shyshka - Fotolia
After months in preview, Amazon Web Services made its managed cloud data lake service, AWS Lake Formation, generally available.
AWS first unveiled Lake Formation at its 2018 re:Invent conference, with the service officially becoming commercially available on Aug. 8. AWS Lake Formation is a managed service that that enables users to build and manage cloud data lakes. A data lake is a form of data repository that stores large volumes of information in native formats.
While data lake technology has been available for nearly a decade, the market is still immature, said Mike Leone, senior analyst at Enterprise Strategy Group.
According to his data, just 22% of organizations currently use or plan to use data lakes. The most intriguing opportunity is reflected in his finding that 38% percent of organizations are evaluating and exploring how they can benefit from the right data lake technology, which is where AWS Lake Formation comes into play, Leone said.
"While many organizations run into various roadblocks in their data lake adoption, AWS Lake Formation looks to intelligently simplify, automate and secure the currently complex management and orchestration of data access and availability within a data lake," Leone said. "And that's before factoring in the fact that Lake Formation serves as a data foundation for organizations looking to grow their data and analytics strategies within AWS and their 60 or so database and analytics services."
Mike LeoneSenior analyst, Enterprise Strategy Group
Data lakes are often associated with unstructured data, but according to Leon, AWS Lake Formation is looking to add more value to the idea of simply dumping unstructured data into Amazon's S3 cloud data storage service. Some 81% of organizations view the cloud as being important to aligning to their data analytics strategy, according to Leone's data.
"Organizations recognize the value of being able to quickly leverage data and analytics services to further their data-driven initiatives," he said. "But you need a data foundation to get there, especially one that is able to simplify the usage of a data lake. AWS Lake Formation fills that need."
Cloud data lake technology matures
Customers can use a number of different methods to handle large volumes of data, including data warehouses and data virtualization approaches. The data lake approach is a better option for a variety of reasons, said Rahul Pathak, general manager of databases, analytics and blockchain at AWS.
"Data lakes can handle the scale, agility and flexibility required to combine different types of data and analytics approaches to gain deeper insights, in ways that traditional data silos and data warehouses cannot," Pathak said. "AWS gives customers the widest array of analytics and machine learning services, for easy access to all relevant data, without compromising on security or governance. Data virtualization solutions don't provide the same combination of flexibility and control."
How AWS Lake Formation enables cloud data lakes
The AWS Lake Formation service builds on multiple existing AWS services, including Amazon S3 as the storage infrastructure layer. Pathak said that AWS Lake Formation manages data access for registered data that is stored in Amazon S3, and manages query access from AWS Glue, Athena, Redshift and (in beta) EMR with Apache Spark, through a unified security model and permissions.
Also, AWS Lake Formation can ingest data from Amazon S3, Amazon RDS databases, and AWS CloudTrail logs, understand their formats, and make data clean and query able. Lake Formation configures the flows, centralizes their orchestration, and enables users to monitor the execution of jobs.
Getting started with a cloud data lake is another challenge that AWS is looking to simplify with Lake Formation. Pathak said that customers can use one of the blueprints available in AWS Lake Formation to ingest data into their data lake. Blueprints are used to create AWS Glue workflows that crawl source tables, extract the data, and load it to Amazon S3.
"In Amazon S3, AWS Lake Formation organizes the data, sets up required partitions and formats the data for optimized performance and cost," Pathak said. "For data already in Amazon S3, a crawler can create the metadata describing this data, and register the Amazon S3 paths to have AWS Lake Formation manage them."
Now that AWS Lake Formation is generally available, Pathak said the plan is to add more integration with other analytics services like Amazon QuickSight and Amazon SageMaker, as well as expanding into more regions globally. AWS Lake Formation is currently available across regions in the United States, Europe and the Asia Pacific area.