Finally, some peace and tranquility

Apache Flink 1.5.0 is here: Smooth streaming and batch data processing

big data
© Shutterstock / iurii  

The new Flink stable release is here and makes streaming and batch data processing super smooth. The 1.5.0 release comes stuffed with a number of new features and major improvements.

For the past 5 months, the Apache Flink community has been working really hard to tackle more than 780 issues. But the new stable release of Apache Flink 1.5 is here and it’s full of handy new features and major improvements.

The Flink community does not see stream processing as just a faster analytics and a more principled way of building fast continuous data pipelines, but rather a paradigm to build data-driven and data-intensive applications, bringing together data processing logic and application/business logic.

Without further ado, let’s take a closer look at the release notes.

The evolution of streaming

The team behind Flink aims for the platform to feel natural to users who do data engineering/data processing, as well as users who build data/event-driven applications and, of course,  those who combine both aspects inside their applications.

Reworked deployment and process model (FLIP-6) – Additional support for dynamic resource allocation and dynamic release of resources on YARN and Mesos schedulers for better resource utilization, failure recovery, and also dynamic scaling. What’s more, this feature builds the foundation for future improvements of Flink’s integration with Kubernetes.

SEE ALSO: Kubernetes for data science and machine learning applications: Benefits

Support for a broadcast state (FLINK-4940) – The processing of the regular stream is configured by the messages of the control stream. By broadcasting rules or patterns to all parallel instances of a function, they can be applied to all events of the regular stream.

Improved network stack (FLINK-7315) – In the context of stream processing, two performance metrics are important: latency and throughput. Flink 1.5 features improvements in network stack, credit-based flow control and improving the transfer latency. Credit-based flow control reduces the amount of data “on the wire” to a minimum while preserving a high throughput.

Task-local state recovery (FLINK-8360) – Task-local state recovery leverages the fact that a job typically fails due to a single crashed operator, TaskManager, or machine. When writing the state of operators to the remote storage, Flink can now also keep a copy on the local disk of each machine. In case of failover, the scheduler tries to reschedule tasks to their previous machine and load the state from the local disk instead of the remote storage, resulting in faster recovery.

Extended join support for SQL and table API (FLIP-24) – Flink 1.5 adds support for windowed outer equi-joins. For cases where two streaming tables should not be joined within a bounded time interval, Flink SQL also now supports non-windowed inner joins. This enables full-history matching, which is common in many standard SQL statements.

But that is not all. Flink 1.5 features a number of other new features and improvements that are not highlighted here. For the extended list of the various other features and improvements, make sure to check out the official release notes.

Eirini-Eleni Papadopoulou
Eirini-Eleni Papadopoulou was the editor for Coming from an academic background in East Asian Studies, she decided that it was time to go back to her high-school hobby that was computer science and she dived into the development world. Other hobbies include esports and League of Legends, although she never managed to escape elo hell (yet), and she is a guest writer/analyst for competitive LoL at TGH.

Inline Feedbacks
View all comments