Definition

Apache Flink

Contributor(s): Jack Vaughan

Apache Flink is a distributed data processing platform for use in big data applications, primarily involving analysis of data stored in Hadoop clusters. Supporting a combination of in-memory and disk-based processing, Flink handles both batch and stream processing jobs, with data streaming the default implementation and batch jobs running as special-case versions of streaming applications.

Flink was designed as an alternative to MapReduce, the batch-only processing engine that was paired with the Hadoop Distributed File System (HDFS) in Hadoop's initial incarnation. The Flink software is open source and adheres to The Apache Software Foundation's licensing provisions. Its development is primarily being driven by DataArtisans GmbH, a startup vendor based in Berlin.

Flink streaming applications are programmed via a DataStream API using either Java or Scala. These languages, as well as Python, can also be used to program against a complementary DataSet API for processing static data. Flink can be deployed on a single Java virtual machine (JVM) in standalone mode or YARN-based Hadoop clusters, or on cloud systems.

The core Flink runtime supports a pipelined streaming architecture; it also offers a built-in method to support iterative data processing for machine learning and other analytics applications. Dedicated APIs and libraries are provided for development of machine learning programs, as well as string handling, graph processing and other uses. Another API is focused on Hadoop application integration.

Flink arose as an offshoot of Stratosphere, a project begun in 2009 at three universities in Germany:  TU Berlin, Humboldt University of Berlin and the Hasso Plattner Institute. The Flink technology subsequently became an Apache incubator project in April 2014 and a top-level project late that year; after nine earlier releases, Apache Flink 1.0.0 was released in March 2016. With that, Flink officially joined other Hadoop ecosystem frameworks such as Spark, Storm and Samza in the competition to provide big data streaming capabilities.

This was last updated in May 2016

Continue Reading About Apache Flink

Dig Deeper on Hadoop framework

PRO+

Content

Find more PRO+ content and other member only offers, here.

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

How do you intend to deploy streaming data architectures in the next 12 months?
Cancel

-ADS BY GOOGLE

File Extensions and File Formats

Powered by:

SearchBusinessAnalytics

SearchAWS

SearchContentManagement

SearchOracle

SearchSAP

SearchSQLServer

Close