JAX London 2014: A retrospective
Let it flow

Getting started with Java 8 Streams

SvenRuppert
frozen

Puzzling out streams in Java 8? This tutorial will get you off on the right foot.

The launch of Java8 brings with it the Streams-API. But what are the advantages of this addition for developers? How is it being used? In this article (originally published in JAX Magazine), we‘ll walk you through the API step by step.

What are these Streams again?

 

At some point in Java, everybody has been confronted with streams of some kind. But what exactly constitutes a stream in JDK8?

  • Streams are not data structures ,which means that they do not constitute storage for data. They can be regarded as more of a pipeline for data streams. Here, different transformations are applied on the data. In this special case, the transformations are not performed on the data of the source structure itself. Underlying data structures such as Arrays or lists are therefore not changed. A stream thus wraps the data structure, withdraws source data from it and works on copies.

  • Streams have been conceptualized for the use of lambdas. Therefore there are no streams without lambdas – this doesn‘t pose a problem, since streams and lambdas are jointly contained in JDK8.

  • Streams do not offer random access on the source data via index or the like. Access to the first element is possible, however not on any following elements.

  • Streams offer good support to provide the results as e.g. Array or list.

  • Streams are organized as lazy. This means that the elements are not fetched until the operation is supposed to be applied on them. Assuming that the data source consists of 1000 elements then the first access does not take 1000 time units but one time unit. (Provided that the access on an element is linear in time consumption).

  • Streams are parallel if requested. The streams can basically be divided into two main groups: the serial and the parallel implementations. Therefore when operations are executed atomically and without invariants, no typical multithreaded code is necessary to use the Cores sleeping in the system.

  • Streams are unbound since they are not initially filled like Collections. Hence streams are also infinite. One can define generator functions that ensure the permanent delivery of source data. This source data is generated when the Client is consuming elements of the stream.

Where does all the source data come from?

When you consider the fact that streams do not keep their own data like Collections then it begs the question – where does the data come from? The most common way to generate streams is the usage of the following methods by which a stream is created from a fixed number of elements:

Stream.of(val1,val2,val3…) , Stream.of(array) and list.stream(). 

These methods that generate from a fixed domain also include the method that creates a stream from a string. A string is nothing else than a finite chain of chars.

final Stream<String> splitOf = Stream.of("A,B,C".split(","));

Equally streams can be generated from streams. This will be more precisely demonstrated in the next article. Now two possibilities to generate streams are still missing. The first one is to programmatically create a stream with a builder. The other and last possibility is to use a generator. This is done by the method Streams.generate(..), in whose argument the method obtains an instance of the class Supplier<T>.Where is all the data going?

Listing 1

final Stream<Pair> stream 
    = Stream.<Pair>builder().add(new Pair()).build();

Stream.generate(() -> {
    final Pair p = new Pair();
    p.id = random.nextInt(100);
    p.value = "Value + " + p.id;
    return p;
})

Where is all the data going?

Since we now know where the data is coming from, the question arises, how can we retrieve the data from ohe stream? After all, usually the idea is to continue working with it. The easiest way is to generate an Array with the method stream.toArray() or a list by use of stream.collect(Collectors.toList()) from the stream.

With this almost 90 percent of the usages are described. However, sets and maps can also be generated. Sets with the method stream.collect(Collectors.toSet()), maps on the other hand by the us e of stream.collect(Collectors.groupingBy(..)). The argument of group-ingBy() provides at least one function with which an aggregation can be carried out. The aggregation represents the key in the map, the value then is a list of the type of the element of the stream. One possibility that might seem a little unusual for some developers is to output the stream in a string. In order to achieve this we use in the method collect a toStringJoiner whose parameter is a delimiter. The result is then a list of toString()- representations generated from all elements and concatenated through this delimiter.

Listing 2

public static void main(String[] args) {
    final List<Pair> generateDemoValues 
        = generateDemoValues();
    //Stream from Values
    final Stream<Pair> fromValues 
        = Stream.of(new Pair(), new Pair());
    //Stream from Array
    final Pair[] pairs = {new Pair(), new Pair()};
    final Stream<Pair> fromArray = Stream.of(pairs);
    //Stream from List
    final Stream<Pair> fromList 
        = generateDemoValues.stream();
    //Stream from String
    final Stream<String> abc = Stream.of("ABC");
    final Stream<IntStream> of = Stream.of("ABC".chars());
    final Stream<String> splitOf 
        = Stream.of("A,B,C".split(","));
    //Stream from builder
    final Stream<Pair> builderPairStream =
        Stream.<Pair>builder().add(new Pair()).build();
    //Stream to Array
    final Pair[] toArray =
        generateDemoValues.stream().toArray(Pair[]::new);
    //Stream to List
    final List<Pair> toList =
        generateDemoValues.stream()
        .collect(Collectors.toList());
    //Stream to Set
    final Set<Pair> toSet =
        generateDemoValues.stream()
        .collect(Collectors.toSet());
    //Stream to Map
    final Map<Integer,List<Pair>> collectedToMap =
        generateDemoValues.stream()
        .collect(Collectors.groupingBy(Pair::getId));
    System.out.println("collectedToMap.size() = " 
        + collectedToMap.size());
    for (final Map.Entry<Integer, List<Pair>> entry : 
        collectedToMap.entrySet()) {
            System.out.println("entry = " + entry);
        }
    }

Core methods

Now that we‘ve discussed how the data is fed into the streams and how it is retrieved, we now will deal with the data transformation. Besides others there are the following three basic methods available for Each, match and find – available with which one can quickly and easily undertake the first attempts.

ForEach – a lambda for each element

The method forEach(<lambda>) is actually doing exactly what one suspects. It applies the lambda that has been passed as argument to every single element of the stream. This method can also be found with iterable, list, map and some other classes/interfaces – a fact that, fortunately, leads to shorter code constructs.

When using forEach(<lambda>)one has to consider the following: Through the method accept in the consumer the element is being consumed. This means that forEach(<lambda>)can only once be applied on a stream. In this context one also speaks of a terminal operation. If you need to apply more than one operation on the element, this can also happen within the passing lambda.

The argument of the forEach(<lambda>) – method however can be reused by holding an instance and then applying it on several streams.

Likewise, the manipulation of surrounding variables is not permitted. How this can occur we see in the context of the method map and reduce. The greatest difference to a for-loop though is that it cannot be interrupted ahead of time – neither with break nor with return.

Listing 3

final List<Pair> generateDemoValues = 
    new PairListGenerator(){}.generateDemoValues();
        
//pre JDK8
for (final Pair generateDemoValue : generateDemoValues) {
    System.out.println(generateDemoValue);
}
//long version
generateDemoValues.stream()
    .forEach(v -> { System.out.println(v) });
//short version - seriel
generateDemoValues.stream()
    .forEach(System.out::println);
//short version - parallel
generateDemoValues.parallelStream()
    .forEach(System.out::println);

Map – How about transformations?

The method map(<lambda>) generates a new stream consisting of the sum of all transformations of the elements of the source stream. Again here the argument is a lambda. This means that except for the functional coupling the target stream does not have to have anything in common with the source stream. The method can be applied as many times as required since every time the result is a new stream.

Listing 4

final List<String> stringList = generateDemoValues.stream()
    .map(v -> {
        final String value = v.getValue();
        final DemoElement d = new DemoElement();
        d.setDatum(new Date());
        d.setValue(Base64.getEncoder()
        .encodeToString(value.getBytes()));
        return d;
    })
    .map(DemoElement::getValue)
    .collect(Collectors.toList());

Filter – What method should it be? 

Just like the method map(<lambda>),the method filter(<Lambda>)also generates a new Stream. From the set of source elements the elements for the next steps are filtered out. The method filter(<Lambda>) can be applied several times in sequence whereas with every call the set is filtered further. Therefore a further reduction is taking place. The method filter(<Lambda>) can be applied in any combination. E.g. map -> filter -> map –> filter -> filter.

Listing 5

 final Stream<Pair> filteredPairStream =
        generateDemoValues.stream()
            .filter(v -> v.getId() % 2 == 0);

Sometimes there is a set of elements where the order is not defined and the quantity is indefinite, but of which exactly one element with certain character-istics can be extracted. Queries on the database that thanks to SQL do in most cases not pose a problem, can on the imperative side extend the source code.

The method findFirst() provides the first element from the stream. A trivial method at first sight which delights you on the second. The return value is an optional, in case of an empty stream an empty optional.

Using the method findFirst() the first hit is returned as optional from the defined value range based on the stream content. But this also means that it must not necessarily be the first in the order of the input value. What happens if one defines it as ParallelStream? It can happen that any value of the value list that corresponds to the criterion is returned because the stream is processed parallel.

The method findFirst() belongs to the “terminal“ methods. This means that after the invocation of findFirst() no further stream operations can be performed. The stream is being terminated. With the usage of findFirst() quite complex patterns can be mapped to obtain specific objects from the stream. Since the streams are basically pipelines there are only as many elements produced as necessary for the finding of this one element. In contrast to the expressions in the conventional notation the expressions via streams are usually much more compact. The usage of findFirst() is suitable when a declarative, quantity based description of the individual entity cannot be applied and therefore an imperative approach is necessary.

Listing 6

inal List<String> demoValues =
    Arrays.asList("AB","AAB","AAAB","AAAAB","AAAAAB");
final String value = demoValues
        .stream()
        .filter(o -> o.contains("AAA"))
        .findFirst().orElse("noop ");
    System.out.println("value = " + value);

Reduce – Bring it down to a common denominator

All methods that we have looked at so far were not able to include e.g. elements of the position n-1 in the usage of the element n. Now how can we generate values that built up on each other? As an example we say that on the value n there always has to be attached the value n-1. The input values are the characters of the chain “A,B,C,D,E”. These elements are to be merged. The method reduce((v1,v2)->) receives a lambda with two parameters: V1 and V2 the content of which are the elements n-1 and n from the stream. The output must be one element, based on the two input elements.

The method reduce enables us to merge the values from the source stream to obtain a single result. Here it is important to consider the distinctions between the serial and parallel processing. The results can be different depending on the particular reduction transformation. With trivial things such as the finding of a maximal value side issues do not occur. However also with supposed trivial transformations you should test if the result is still equivalent to the desired outcome. When using streams one finds many basic functions that are already included in the API and that spare you the development of Basic-Utilities.

Listing 7

 

   final List<String> demoValues 
        = Arrays.asList("A", "B", "C", "D", "E");
    System.out.println(demoValues.stream()
        .reduce(String::concat)); //Optional[ABCDE]

Limit / Skip – Please not everything

Streams can be indefinitely long. This means that in the extreme case streams have no end. Therefore, sometimes it can be useful to process streams only to a certain length or to just collect a certain set of results since the rest cannot be used for the following logic. The method limit(count) is designed exactly for this case.

The method skip(count) works a little differently. Here we have also a limitation of the stream, however we have an absolute limit. The counter indicates how many elements are being skipped. The end is open however. The limitation therefore takes place in the beginning by skipping n elements without processing them. The method skip(counter) can also occur several times and in several places of the entire construct.

Listing 8

final List<Integer> demoValues
        = Arrays.asList(1,2,3,4,5,6,7,8,9,10);
        //limit the input -> [1, 2, 3, 4]
        System.out.println(demoValues
            .stream().limit(4)
                .collect(Collectors.toList()));
    //jumping over the first 4 elements -> [5, 6, 7, 8, 9, 10]
    System.out.println(demoValues
        .stream().skip(4)
        .collect(Collectors.toList()));

Distinct – All at once

From SQL one knows the command distinct, to reduce a set of val-ues to only one single value – therefore the generation of a unique-set. The method distinct() is doing exactly the same. The implementation itself works in the class DistinctOps on a ConcurentHashMap because this operation has also been developed for parallel streams. The distinct-set is then the KeySet of the HashMap. The determining element is the hashCode- and equals- implementation of the elements which are supposed to be transferred into the unique-set. At this point you can influence the behavior and the performance of the distinct operation.

Listing 9

 final Random random = new Random();
    System.out.println(
        Stream.generate(() -> random.nextInt(100))
            .limit(40)
            .distinct()
            .collect(Collectors.toList())
    );

Min / Max – Very small, very large

The methods min(<Comparator>) and max(<Comparator>) return the minimum, resp. the maximum from the set of the values in the stream. This value is determined by use of Comparator. This entails that all elements have to be iterated. Thus it cannot be performed on infinite streams. Accordingly the definition of the Comparator allows different interpretations about what is a minimum and what is a maximum. At the same time the implementation of the Comparator is one of the defining components in the performance because it is applied to all elements. In any case it is faster than sorting the elements with ensuing findFirst(), because the complexity of min/max is O(n) and the complexity of the sorting is O(n log n).

Listing 10

 System.out.println(demoValues
        .stream().max(Integer::compareTo));

allMatch, anyMatch, noneMatch, count

The methods allMatch(<Predicate>), anyMatch(<Predicate>),

noneMatch(<Predicate>) are returning a boolean.

  • allMatch if the defined condition is true with exactly all elements

  • anyMatch if some elements correspond to the condition (minimum 2)

  • noneMatch if no single element corresponds to the condition

Looking at the runtime of the single methods you can observe that noneMatch(<Predicate>) has to be applied to the entire value supply. anyMatch(<Predicate>) and allMatch(<Predicate>) on the other hand cancel as soon as the result is derivable.

Now only the method count() is missing. It can be explained quite simply because this method returns the number of elements that have been processed in the stream so far.

Listing 11

 

// true, some are matching
    System.out.println("anyMatch " + demoValues.stream()
        .map((e) -> {
            System.out.println("e = " + e);
            return e;
        })
        .anyMatch((v) -> v % 2 == 0));
    //false, not all are matching
    System.out.println("allMatch " + demoValues.stream()
        .map((e) -> {
            System.out.println("e = " + e);
            return e;
        })
        .allMatch((v) -> v % 2 == 0));
    //false, not all are NOT matching
    System.out.println("noneMatch " + demoValues.stream() 
        .map((e) -> {
            System.out.println("e = " + e);
            return e;
        })
        .noneMatch((v) -> v % 2 == 0));

Parallel / Sequential – Switch if Necessary

The last two methods that we are going to look at here are parallel() and sequential(). This way the methods that in turn return a stream can be operated explicitly in a serial or a parallel version. If a following operation cannot be performed parallel then this can happen with the method call seriell(). You can decide for every individual stream whether it should work parallel or serial.

Listing 12

System.out.println(demoValues.stream()  //seriell
          .map((m1) -> m1)
          .parallel()    
          .map((m2) -> m2)
          .sequential() //seriell
          .collect(Collectors.toList()));

Matrix as Stream

With streams you can also elegantly work on a n-dimensional matrix. In the following example shall be searched for the number 66 in a 2-dimensional matrix. For simplification it is assumed that the number is to be found only once. The Pre-Streams-Solution is based on nested For-Loops with a label on the outermost loop. Generally you can deduce the following transformation rules:

Common For-Loops can be mapped onto forEach if no cancella-tion during the loop iteration is required.

  • If a condition is to be checked via if, then there are two alternatives.
  • If without else: then this can be mapped onto the method filter
  • If with else: this is mapped on the Map-method within which the case differentiation is performed

Thus it is strongly depended on the control flow if a transformation in streams is going to be profitable.

Listing 13

public static void main(String[] args) {

    final List<List<Integer>> matrix = new ArrayList<>();
    matrix.add(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9));
    matrix.add(Arrays.asList(1,2,3,4,5,66,7,8,9));
    matrix.add(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9));
    matrix.forEach(System.out::println);

    final Integer s = matrix.stream()
        .map(l -> l.stream()
            .filter(v -> v.equals(66))
            .findFirst().orElse(null))
        .filter(f->f != null)
        .findFirst() .orElse(null);
    System.out.println("s = " + s);

    Integer result = null;
    endPos:
    for (final List<Integer> integers : matrix) {
        for (final Integer integer : integers) {
            if(integer.equals(66)){
                result = integer;
                break endPos;
            }
        }
    }
    System.out.println("result " + result);
}

Conclusion

When using streams the following questions (among others) come up:

•Is a concurrency required or not? If yes, then in many cases streams by usage of parallelStream() are a simple and quick approach.

•Should the nestling of the control stream be reduced? Here it depends on the constructs within the case differentiation itself. Quite often with slight alterations you can build convincing constructs via streams that in the long run lead to a better maintainability. If this is profitable in old projects must be decided in the individual case.

•Do you need to map mathematical functions? In many cases you can accelerate your success by usage of streams without having to integrate Scala or other functional languages into the project.

All in all streams are a very effective support in the daily work with Java.

You will realize that the generic approach turns out to be quite a relief when working with typical business applications. The adjustment to streams should usually lead to noticeable results within two to three work days. Try it!

The sources are available at [2].more examples are here [3].

Sven Ruppert [1] has been speaking Java since 1996 and has worked on both national and international projects. You can follow him on twitter @SvenRuppert.

References

[1] de.linkedin.com/in/svenruppert/

[2] https://bitbucket.org/rapidpm/entwicklerpress-shortcut-jdk8-streams

[3] https://bitbucket.org/rapidpm/modules



 


Author
Comments
comments powered by Disqus