Let it flow

Getting started with Java 8 Streams

Sven Ruppert
frozen

Puzzling out streams in Java 8? This tutorial will get you off on the right foot.

The
launch of Java8 brings with it the Streams-API. But what are the
advantages of this addition for developers? How is it being used?
In this article (originally published in JAX Magazine), we‘ll walk
you through the API step by step.

What are these Streams again?

 

At some point in Java, everybody has been
confronted with streams of some kind. But what exactly constitutes
a stream in JDK8?

  • Streams are not data structures ,which means
    that they do not constitute storage for data. They can be regarded
    as more of a pipeline for data streams. Here, different
    transformations are applied on the data. In this special case, the
    transformations are not performed on the data of the source
    structure itself. Underlying data structures such as Arrays or
    lists are therefore not changed. A stream thus wraps the data
    structure, withdraws source data from it and works on
    copies.

  • Streams have been conceptualized for the use of
    lambdas. Therefore there are no streams without lambdas – this
    doesn‘t pose a problem, since streams and lambdas are jointly
    contained in JDK8.

  • Streams do not offer random access on the source
    data via index or the like. Access to the first element is
    possible, however not on any following elements.

  • Streams offer good support to provide the
    results as e.g. Array or list.

  • Streams are organized as lazy. This means that
    the elements are not fetched until the operation is supposed to be
    applied on them. Assuming that the data source consists of 1000
    elements then the first access does not take 1000 time units but
    one time unit. (Provided that the access on an element is linear in
    time consumption).

  • Streams are parallel if requested. The streams
    can basically be divided into two main groups: the serial and the
    parallel implementations. Therefore when operations are executed
    atomically and without invariants, no typical multithreaded code is
    necessary to use the Cores sleeping in the system.

  • Streams are unbound since they are not initially
    filled like Collections. Hence streams are also infinite. One can
    define generator functions that ensure the permanent delivery of
    source data. This source data is generated when the Client is
    consuming elements of the stream.

Where does all the source data come
from?

When you consider the fact that streams do not
keep their own data like Collections then it begs the question –
where does the data come from? The most common way to generate
streams is the usage of the following methods by which a stream is
created from a fixed number of elements:

Stream.of(val1,val2,val3…) , Stream.of(array) and list.stream(). 

These methods that generate from a fixed domain
also include the method that creates a stream from a string. A
string is nothing else than a finite chain of chars.

final Stream<String> splitOf = Stream.of("A,B,C".split(","));

Equally streams can be generated from streams.
This will be more precisely demonstrated in the next article. Now
two possibilities to generate streams are still missing. The first
one is to programmatically create a stream with a builder. The
other and last possibility is to use a generator. This is done by
the method
Streams.generate(..), in whose
argument the method obtains an instance of the class

Supplier<T>.Where is all the data
going?

Listing 1

final Stream<Pair> stream 
    = Stream.<Pair>builder().add(new Pair()).build();

Stream.generate(() -> {
    final Pair p = new Pair();
    p.id = random.nextInt(100);
    p.value = "Value + " + p.id;
    return p;
})

Where is all the data going?

Since we now know where the data is coming from,
the question arises, how can we retrieve the data from ohe stream?
After all, usually the idea is to continue working with it. The
easiest way is to generate an Array with the method

stream.toArray() or a list by use of
stream.collect(Collectors.toList()) from the
stream.

With this almost 90 percent of the usages are
described. However, sets and maps can also be generated. Sets with
the method stream.collect(Collectors.toSet()), maps on the other
hand by the us e of

stream.collect(Collectors.groupingBy(..)). The
argument of
group-ingBy() provides at
least one function with which an aggregation can be carried out.
The aggregation represents the key in the map, the value then is a
list of the type of the element of the stream. One possibility that
might seem a little unusual for some developers is to output the
stream in a string. In order to achieve this we use in the method
collect a toStringJoiner whose parameter is a delimiter. The result
is then a list of
toString()-
representations generated from all elements and concatenated
through this delimiter.

Listing 2

public static void main(String[] args) {
    final List<Pair> generateDemoValues 
        = generateDemoValues();
    //Stream from Values
    final Stream<Pair> fromValues 
        = Stream.of(new Pair(), new Pair());
    //Stream from Array
    final Pair[] pairs = {new Pair(), new Pair()};
    final Stream<Pair> fromArray = Stream.of(pairs);
    //Stream from List
    final Stream<Pair> fromList 
        = generateDemoValues.stream();
    //Stream from String
    final Stream<String> abc = Stream.of("ABC");
    final Stream<IntStream> of = Stream.of("ABC".chars());
    final Stream<String> splitOf 
        = Stream.of("A,B,C".split(","));
    //Stream from builder
    final Stream<Pair> builderPairStream =
        Stream.<Pair>builder().add(new Pair()).build();
    //Stream to Array
    final Pair[] toArray =
        generateDemoValues.stream().toArray(Pair[]::new);
    //Stream to List
    final List<Pair> toList =
        generateDemoValues.stream()
        .collect(Collectors.toList());
    //Stream to Set
    final Set<Pair> toSet =
        generateDemoValues.stream()
        .collect(Collectors.toSet());
    //Stream to Map
    final Map<Integer,List<Pair>> collectedToMap =
        generateDemoValues.stream()
        .collect(Collectors.groupingBy(Pair::getId));
    System.out.println("collectedToMap.size() = " 
        + collectedToMap.size());
    for (final Map.Entry<Integer, List<Pair>> entry : 
        collectedToMap.entrySet()) {
            System.out.println("entry = " + entry);
        }
    }

Core methods

Now that we‘ve discussed how the data is fed
into the streams and how it is retrieved, we now will deal with the
data transformation. Besides others there are the following three
basic methods available for Each, match and find – available with
which one can quickly and easily undertake the first
attempts.

ForEach – a lambda for each element

The method
forEach(<lambda>) is actually doing
exactly what one suspects. It applies the lambda that has been
passed as argument to every single element of the stream. This
method can also be found with iterable, list, map and some other
classes/interfaces – a fact that, fortunately, leads to shorter
code constructs.

When using
forEach(<lambda>)one has to consider the
following: Through the method accept in the consumer the element is
being consumed. This means that

forEach(<lambda>)can only once be applied
on a stream. In this context one also speaks of a terminal
operation. If you need to apply more than one operation on the
element, this can also happen within the passing lambda.

The argument of the
forEach(<lambda>) – method however can be
reused by holding an instance and then applying it on several
streams.

Likewise, the manipulation of surrounding
variables is not permitted. How this can occur we see in the
context of the method map and reduce. The greatest difference to a
for-loop though is that it cannot be interrupted ahead of time –
neither with break nor with return.

Listing 3

final List<Pair> generateDemoValues = 
    new PairListGenerator(){}.generateDemoValues();
        
//pre JDK8
for (final Pair generateDemoValue : generateDemoValues) {
    System.out.println(generateDemoValue);
}
//long version
generateDemoValues.stream()
    .forEach(v -> { System.out.println(v) });
//short version - seriel
generateDemoValues.stream()
    .forEach(System.out::println);
//short version - parallel
generateDemoValues.parallelStream()
    .forEach(System.out::println);

Map – How about transformations?

The method
map(<lambda>) generates a new stream
consisting of the sum of all transformations of the elements of the
source stream. Again here the argument is a lambda. This means that
except for the functional coupling the target stream does not have
to have anything in common with the source stream. The method can
be applied as many times as required since every time the result is
a new stream.

Listing 4

final List<String> stringList = generateDemoValues.stream()
    .map(v -> {
        final String value = v.getValue();
        final DemoElement d = new DemoElement();
        d.setDatum(new Date());
        d.setValue(Base64.getEncoder()
        .encodeToString(value.getBytes()));
        return d;
    })
    .map(DemoElement::getValue)
    .collect(Collectors.toList());

Filter – What method should it
be? 

Just like the method
map(<lambda>),the method
filter(<Lambda>)also generates a new
Stream. From the set of source elements the elements for the next
steps are filtered out. The method

filter(<Lambda>) can be applied several
times in sequence whereas with every call the set is filtered
further. Therefore a further reduction is taking place. The
method
filter(<Lambda>) can be
applied in any combination. E.g. map -> filter -> map –>
filter -> filter.

Listing 5

 final Stream<Pair> filteredPairStream =
        generateDemoValues.stream()
            .filter(v -> v.getId() % 2 == 0);

Sometimes there is a set of elements where the
order is not defined and the quantity is indefinite, but of which
exactly one element with certain character-istics can be extracted.
Queries on the database that thanks to SQL do in most cases not
pose a problem, can on the imperative side extend the source
code.

The method findFirst()
provides the first element from the stream. A trivial method
at first sight which delights you on the second. The return value
is an optional, in case of an empty stream an empty
optional.

Using the method findFirst()
the first hit is returned as optional from the defined value
range based on the stream content. But this also means that it must
not necessarily be the first in the order of the input value. What
happens if one defines it as ParallelStream? It can happen that any
value of the value list that corresponds to the criterion is
returned because the stream is processed parallel.

The method findFirst()
belongs to the “terminal“ methods. This means that after the
invocation of
findFirst() no further
stream operations can be performed. The stream is being terminated.
With the usage of
findFirst() quite
complex patterns can be mapped to obtain specific objects from the
stream. Since the streams are basically pipelines there are only as
many elements produced as necessary for the finding of this one
element. In contrast to the expressions in the conventional
notation the expressions via streams are usually much more compact.
The usage of
findFirst() is suitable when
a declarative, quantity based description of the individual entity
cannot be applied and therefore an imperative approach is
necessary.

Listing 6

inal List<String> demoValues =
    Arrays.asList("AB","AAB","AAAB","AAAAB","AAAAAB");
final String value = demoValues
        .stream()
        .filter(o -> o.contains("AAA"))
        .findFirst().orElse("noop ");
    System.out.println("value = " + value);

Reduce – Bring it down to a common
denominator

All methods that we have looked at so far were
not able to include e.g. elements of the position n-1 in the usage
of the element n. Now how can we generate values that built up on
each other? As an example we say that on the value n there always
has to be attached the value n-1. The input values are the
characters of the chain “A,B,C,D,E”. These elements are to be
merged. The method
reduce((v1,v2)->)
receives a lambda with two parameters: V1 and V2 the content
of which are the elements n-1 and n from the stream. The output
must be one element, based on the two input elements.

The method reduce enables us to merge the values
from the source stream to obtain a single result. Here it is
important to consider the distinctions between the serial and
parallel processing. The results can be different depending on the
particular reduction transformation. With trivial things such as
the finding of a maximal value side issues do not occur. However
also with supposed trivial transformations you should test if the
result is still equivalent to the desired outcome. When using
streams one finds many basic functions that are already included in
the API and that spare you the development of
Basic-Utilities.

Listing 7

 

   final List<String> demoValues 
        = Arrays.asList("A", "B", "C", "D", "E");
    System.out.println(demoValues.stream()
        .reduce(String::concat)); //Optional[ABCDE]

Limit / Skip – Please not
everything

Streams can be indefinitely long. This means
that in the extreme case streams have no end. Therefore, sometimes
it can be useful to process streams only to a certain length or to
just collect a certain set of results since the rest cannot be used
for the following logic. The method

limit(count) is designed exactly for this
case.

The method skip(count)
works a little differently. Here we have also a limitation of
the stream, however we have an absolute limit. The counter
indicates how many elements are being skipped. The end is open
however. The limitation therefore takes place in the beginning by
skipping n elements without processing them. The method

skip(counter) can also occur several times and
in several places of the entire construct.

Listing 8

final List<Integer> demoValues
        = Arrays.asList(1,2,3,4,5,6,7,8,9,10);
        //limit the input -> [1, 2, 3, 4]
        System.out.println(demoValues
            .stream().limit(4)
                .collect(Collectors.toList()));
    //jumping over the first 4 elements -> [5, 6, 7, 8, 9, 10]
    System.out.println(demoValues
        .stream().skip(4)
        .collect(Collectors.toList()));

Distinct – All at once

From SQL one knows the command
distinct, to reduce a set of val-ues to only one
single value – therefore the generation of a unique-set. The
method
distinct() is doing exactly the
same. The implementation itself works in the class

DistinctOps on a
ConcurentHashMap because this operation has also
been developed for parallel streams. The distinct-set is then
the
KeySet of the
HashMap. The determining element is the
hashCode- and equals-
implementation of the elements which are supposed to be transferred
into the unique-set. At this point you can influence the behavior
and the performance of the distinct operation.

Listing 9

 final Random random = new Random();
    System.out.println(
        Stream.generate(() -> random.nextInt(100))
            .limit(40)
            .distinct()
            .collect(Collectors.toList())
    );

Min / Max – Very small, very large

The methods
min(<Comparator>) and
max(<Comparator>) return the minimum,
resp. the maximum from the set of the values in the stream. This
value is determined by use of Comparator. This entails that all
elements have to be iterated. Thus it cannot be performed on
infinite streams. Accordingly the definition of the Comparator
allows different interpretations about what is a minimum and what
is a maximum. At the same time the implementation of the Comparator
is one of the defining components in the performance because it is
applied to all elements. In any case it is faster than sorting the
elements with ensuing
findFirst(),
because the complexity of min/max is O(n) and the complexity
of the sorting is O(n log n).

Listing 10

 System.out.println(demoValues
        .stream().max(Integer::compareTo));

allMatch, anyMatch, noneMatch,
count

The methods
allMatch(<Predicate>),
anyMatch(<Predicate>),

noneMatch(<Predicate>) are
returning a boolean.

  • allMatch if the defined condition is true with
    exactly all elements

  • anyMatch if some elements correspond to the
    condition (minimum 2)

  • noneMatch if no single element corresponds to
    the condition

Looking at the runtime of the single methods you
can observe that
noneMatch(<Predicate>)
has to be applied to the entire value supply.
anyMatch(<Predicate>) and
allMatch(<Predicate>) on the other hand
cancel as soon as the result is derivable.

Now only the method count()
is missing. It can be explained quite simply because this
method returns the number of elements that have been processed in
the stream so far.

Listing 11

 

// true, some are matching
    System.out.println("anyMatch " + demoValues.stream()
        .map((e) -> {
            System.out.println("e = " + e);
            return e;
        })
        .anyMatch((v) -> v % 2 == 0));
    //false, not all are matching
    System.out.println("allMatch " + demoValues.stream()
        .map((e) -> {
            System.out.println("e = " + e);
            return e;
        })
        .allMatch((v) -> v % 2 == 0));
    //false, not all are NOT matching
    System.out.println("noneMatch " + demoValues.stream() 
        .map((e) -> {
            System.out.println("e = " + e);
            return e;
        })
        .noneMatch((v) -> v % 2 == 0));

Parallel / Sequential – Switch if
Necessary

The last two methods that we are going to look
at here are
parallel() and
sequential(). This way the methods that in turn
return a stream can be operated explicitly in a serial or a
parallel version. If a following operation cannot be performed
parallel then this can happen with the method call seriell(). You
can decide for every individual stream whether it should work
parallel or serial.

Listing 12

System.out.println(demoValues.stream()  //seriell
          .map((m1) -> m1)
          .parallel()    
          .map((m2) -> m2)
          .sequential() //seriell
          .collect(Collectors.toList()));

Matrix as Stream

With streams you can also elegantly work on a
n-dimensional matrix. In the following example shall be searched
for the number 66 in a 2-dimensional matrix. For simplification it
is assumed that the number is to be found only once. The
Pre-Streams-Solution is based on nested For-Loops with a label on
the outermost loop. Generally you can deduce the following
transformation rules:

Common For-Loops can be mapped onto forEach if
no cancella-tion during the loop iteration is required.

  • If a condition is to be checked via if, then there are
    two alternatives.
  • If without else: then this can be mapped onto the method
    filter
  • If with else: this is mapped on the Map-method within
    which the case differentiation is performed

Thus it is strongly depended on the control flow
if a transformation in streams is going to be
profitable.

Listing 13

public static void main(String[] args) {

    final List<List<Integer>> matrix = new ArrayList<>();
    matrix.add(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9));
    matrix.add(Arrays.asList(1,2,3,4,5,66,7,8,9));
    matrix.add(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9));
    matrix.forEach(System.out::println);

    final Integer s = matrix.stream()
        .map(l -> l.stream()
            .filter(v -> v.equals(66))
            .findFirst().orElse(null))
        .filter(f->f != null)
        .findFirst() .orElse(null);
    System.out.println("s = " + s);

    Integer result = null;
    endPos:
    for (final List<Integer> integers : matrix) {
        for (final Integer integer : integers) {
            if(integer.equals(66)){
                result = integer;
                break endPos;
            }
        }
    }
    System.out.println("result " + result);
}

Conclusion

When using streams the following questions
(among others) come up:

•Is a concurrency required or not? If yes, then
in many cases streams by usage of parallelStream() are a simple and
quick approach.

•Should the nestling of the control stream be
reduced? Here it depends on the constructs within the case
differentiation itself. Quite often with slight alterations you can
build convincing constructs via streams that in the long run lead
to a better maintainability. If this is profitable in old projects
must be decided in the individual case.

•Do you need to map mathematical functions? In
many cases you can accelerate your success by usage of streams
without having to integrate Scala or other functional languages
into the project.

All in all streams are a very effective support
in the daily work with Java.

You will realize that the generic approach turns
out to be quite a relief when working with typical business
applications. The adjustment to streams should usually lead to
noticeable results within two to three work days. Try
it!

The sources are available at [2].more
examples are here
[3].

Sven Ruppert [1] has been speaking Java since
1996 and has worked on both national and international projects.
You can follow him on twitter @SvenRuppert.

References

[1]
de.linkedin.com/in/svenruppert/

[2]
https://bitbucket.org/rapidpm/entwicklerpress-shortcut-jdk8-streams

[3] https://bitbucket.org/rapidpm/modules


 


Author
Sven Ruppert
Sven Ruppert has been coding Java since 1996. He is a Principal IT Consultant for codecentric in Munich. In his free time he regularly contributes to German IT periodicals, including Java Magazin, Eclipse Magazin, and Entwickler Magazin, as well as tech portals such as jaxenter.de
Comments
comments powered by Disqus