Hazelcast Jet 0.4: More functional and faster than ever
The folks over at Hazelcast are at it again. Open source in-memory data grid Hazelcast has just announced the 0.4 update for Hazelcast Jet, an application-embeddable, distributed processing engine for big data stream and batch.
Hazelcast is all about their in-memory data grid. The release of Hazelcast Jet, their open source distributed data processing engine was widely anticipated. Now, the latest update, Hazelcast Jet 0.4, brings major new functionality and speed.
“The Jet project is progressing faster than we could have hoped,” said Greg Luck, CEO of Hazelcast. “The new functionality in 0.4 brings stream processing for the first time. As with batch, we are achieving a new performance level, giving us a real edge over alternative market solutions.”
What’s new about Hazelcast Jet 0.4?
A quick look at the specs shows some dramatic changes, including:
- Improved streaming support including windowing support with event-time semantics.
- Out of the box support for tumbling, sliding and session window aggregations.
AggregateOperationabstractionwith several built-in ones including count, average, sum, min, max and linear regression.
- Hazelcast version updated to 3.8.2 and Hazelcast IMDG is now shaded inside
- Several built-in diagnostic processors and unit test support for writing custom processors.
- Many new code samples including several streaming examples and enrichment and co-group for batch operations.
- New sources and sinks including ICache, socket and file.
The newest update includes event-time processing with tumbling, sliding and session windowing. Users now benefit from a feature-rich stream processing architecture, which provides a flexible mechanism to build and evaluate windows over continuous data streams.
Companies have begun to use stream processing over batch processing for big data sets that require immediate analysis. The data is partitioned and then each data element in the stream is associated with a timestamp. That makes it possible classify windows during processing.
In Hazelcast Jet 0.4 this is done via event-time processing. This is a logical, data-dependent timestamp, embedded in the event itself. However, while this is useful, there is a downside. If you use event-time processing, events may arrive out of order or late, making it difficult be certain if you can see all events in a given time window.
The folks behind Hazelcast Jet have come up with a solution. In the 0.4 release, they have included windowing functionality. This enables users to evaluate stream processing jobs at regular time intervals, regardless of how many incommoding messages the job is processing.
Hazelcast Jet offers three types of windows:
- Fixed/tumbling – time is partitioned into same-length, non-overlapping chunks. Each event belongs to exactly one window.
- Session – windows have various sizes and are defined basing on data, which should carry some session identifiers.
- Sliding – windows have fixed length, but are separated by a time interval (step) which can be smaller than the window length. Typically the window interval is a multiplicity of the step.
If you’re interested, there’s more here about the types of windows supported by Jet.
Need for speed
Some of the big news about this update is the dramatic increase in speed. According to the latest benchmark test, Hazelcast Jet 0.4 outperformed its competitors with a 40ms average latency for stream processing computations. Flink and Spark’s execution latencies were hundreds of ms rising to seconds at the higher message throughputs.
The study compares the average latencies of Hazelcast Jet, Flink and Spark Streaming under various different criteria such as message rate and window size. The full benchmark is available here.
New Sources and Sinks
The 0.4 release also adds several new connectors:
- Hazelcast ICache can be used as a source or sink and it can also be used as a source for distributed java.util.stream computations:
- Socket readers and writers can read and write to simple text based sockets. An example can be found inside the Hazelcast Jet Code Samples Repository
Vertex source = dag.newVertex("source", Sources.streamSocket(HOST, PORT));
- Batch and streaming file reader and writers can be used for either reading static files or watching a directory for changes:
Vertex streamFiles = dag.newVertex("stream-files", Sources.streamFiles(DIRECTORY));
Vertex readFiles = dag.newVertex("read-files", Sources.readFiles(DIRECTORY));