Warm tip: This article is reproduced from stackoverflow.com, please click
java parallel-processing java-8 java-stream

Should I always use a parallel stream when possible?

发布于 2020-03-27 10:22:31

With Java 8 and lambdas it's easy to iterate over collections as streams, and just as easy to use a parallel stream. Two examples from the docs, the second one using parallelStream:

myShapesCollection.stream()
    .filter(e -> e.getColor() == Color.RED)
    .forEach(e -> System.out.println(e.getName()));

myShapesCollection.parallelStream() // <-- This one uses parallel
    .filter(e -> e.getColor() == Color.RED)
    .forEach(e -> System.out.println(e.getName()));

As long as I don't care about the order, would it always be beneficial to use the parallel? One would think it is faster dividing the work on more cores.

Are there other considerations? When should parallel stream be used and when should the non-parallel be used?

(This question is asked to trigger a discussion about how and when to use parallel streams, not because I think always using them is a good idea.)

Questioner
Matsemann
Viewed
60
7,968 2017-05-10 11:07

A parallel stream has a much higher overhead compared to a sequential one. Coordinating the threads takes a significant amount of time. I would use sequential streams by default and only consider parallel ones if

  • I have a massive amount of items to process (or the processing of each item takes time and is parallelizable)

  • I have a performance problem in the first place

  • I don't already run the process in a multi-thread environment (for example: in a web container, if I already have many requests to process in parallel, adding an additional layer of parallelism inside each request could have more negative than positive effects)

In your example, the performance will anyway be driven by the synchronized access to System.out.println(), and making this process parallel will have no effect, or even a negative one.

Moreover, remember that parallel streams don't magically solve all the synchronization problems. If a shared resource is used by the predicates and functions used in the process, you'll have to make sure that everything is thread-safe. In particular, side effects are things you really have to worry about if you go parallel.

In any case, measure, don't guess! Only a measurement will tell you if the parallelism is worth it or not.