With Java 8 and lambdas it's easy to iterate over collections as streams, and just as easy to use a parallel stream. Two examples from the docs, the second one using parallelStream:
myShapesCollection.stream()
.filter(e -> e.getColor() == Color.RED)
.forEach(e -> System.out.println(e.getName()));
myShapesCollection.parallelStream() // <-- This one uses parallel
.filter(e -> e.getColor() == Color.RED)
.forEach(e -> System.out.println(e.getName()));
As long as I don't care about the order, would it always be beneficial to use the parallel? One would think it is faster dividing the work on more cores.
Are there other considerations? When should parallel stream be used and when should the non-parallel be used?
(This question is asked to trigger a discussion about how and when to use parallel streams, not because I think always using them is a good idea.)
A parallel stream has a much higher overhead compared to a sequential one. Coordinating the threads takes a significant amount of time. I would use sequential streams by default and only consider parallel ones if
I have a massive amount of items to process (or the processing of each item takes time and is parallelizable)
I have a performance problem in the first place
I don't already run the process in a multi-thread environment (for example: in a web container, if I already have many requests to process in parallel, adding an additional layer of parallelism inside each request could have more negative than positive effects)
In your example, the performance will anyway be driven by the synchronized access to System.out.println()
, and making this process parallel will have no effect, or even a negative one.
Moreover, remember that parallel streams don't magically solve all the synchronization problems. If a shared resource is used by the predicates and functions used in the process, you'll have to make sure that everything is thread-safe. In particular, side effects are things you really have to worry about if you go parallel.
In any case, measure, don't guess! Only a measurement will tell you if the parallelism is worth it or not.
Good answer. I would add that if you have a massive amount of items to process, that only increases the thread coordination issues; it's only when processing of each items takes time and is parallelizable that parallelization might be useful.
@WarrenDew I disagree. The Fork/Join system will simply split the N items into, for example, 4 parts, and process these 4 parts sequentially. The 4 results will then be reduced. If massive really is massive, even for fast unit processing, parallelization can be effective. But as always, you have to measure.
i have a collection of objects that implement
Runnable
that I callstart()
to use them asThreads
, is it ok to change that to using java 8 streams in a.forEach()
parallelized ? Then i'd be able to strip the thread code out of the class. But are there any downsides?@JBNizet If 4 parts pocess sequentially, then there is no difference of it being process parallels or sequentially know? Pls clarify
@Harshana he obviously means that the elements of each of the 4 parts will be processed sequentially. However, the parts themselves may be processed simultaneously. In other words, if you have several CPU cores available, each part can run on its own core independently of the other parts, while processing its own elements sequentially. (NOTE: I don't know, if this is how parallel Java streams work, I'm just trying to clarify what JBNizet meant.)