queue-Difference between stream processing and message processing

Matthias J. Sax 2017-05-04 11:24:40

In traditional message processing, you apply simple computations on the messages -- in most cases individually per message.

In stream processing, you apply complex operations on multiple input streams and multiple records (ie, messages) at the same time (like aggregations and joins).

Furthermore, traditional messaging system cannot go "back in time" -- ie, the automatically delete messages after they got delivered to all subscribed consumers. In contrast, Kafka keeps the messages as it uses a pull based model (ie, consumer pull data out of Kafka) for a configurable amount of time. This allows consumers to "rewind" and consume messages multiple times -- or if you add a new consumer, it can read the complete history. This makes stream processing possible, because it allows for more complex applications. Furthermore, stream processing is not necessarily about real-time processing -- it's about processing infinite input stream (in contrast to batch processing that is applied to finite inputs).

And Kafka offers Kafka Connect and Streams API -- so it is a stream processing platform and not just a messaging/pub-sub system (even if it uses this in it's core).

Davos 2017-11-29 07:12:36

Also, the input stream might be infinite, but the processing is more like a sliding window of finite input. In that sense there isn't really any difference between stream and batch processing. Batch processing is just a special case of stream processing where the windows are strongly defined.

Sheel Pancholi 2019-03-21 12:38:10

Very well put! Just one thought though, the word "streaming" these days is interchangeably (confused) used with "microbatching". The moment one talks about sliding windows one is already talking about microbatching. Streaming in the strictest sense is processing the record/event/fact as it comes. In that sense, the sliding window would be of size 1 in case of streaming.

Matthias J. Sax 2019-03-21 17:16:00

Micro batching limits how you can define window boundaries (ie, a hopping window that advanced at least one batch) while in stream processing you can advance a window in any granularity you like. Also, there is a difference between sliding and hopping windows (and many system use the term "sliding windows" to actually describe a hopping window, what can lead to confusion, too). Thus, I don't see why windowing implies micro-batching. Following your arguments, you could not aggregate over a stream, what is certainly possible in stream processing.

Matthias J. Sax 2019-03-21 17:17:13

Micro-batching is more about when to execute a computation and also (as mentioned) leaks implementation details (ie, batch boundaries) into the operator semantics. Stream processing does not do this.

Difference between stream processing and message processing

热门帖子

热门github