Warm tip: This article is reproduced from stackoverflow.com, please click
java apache-kafka-streams spring-kafka

Kafka streams creating a simple materialized view

发布于 2020-03-27 10:28:18

I have events coming in to Kafka with a bunch of non-unique String fields and an event timestamp. I want to create a materialized view of these events so that I can query them. For example:

  1. Display all of the events
  2. Display all of the events where field1 = some string
  3. Display all of the events that match multiple fields
  4. Display the events between 2 dates

All of the examples that I have seen have an aggregation, a join or some other transformative operation on the stream. I cannot find a single simple example of creating a view on a set of events. I don't want to perform any operations, I just want to be able to query the original events coming into the stream.

I am using Spring Kafka so an example with Spring Kafka would be ideal.

I am able to get messages into Kafka and to consume them. However, I have not been able to create a materialized view.

I have the following code which filters the events (not really what I wanted, I want all events, but I just wanted to see if I could get a materialized view):

@StreamListener
    public void process(@Input("input") KTable<String,MyMessage> myMessages) {
        keyValueStore = interactiveQueryService.getQueryableStore(ALL_MESSAGES,QueryableStoreTypes.keyValueStore());

        myMessages.filter((key,value) -> (value.getKey() != null));
                Materialized.<String,MyMessage,KeyValueStore<Bytes,byte[]>> as(ALL_MESSAGES)
                .withKeySerde(Serdes.String())
                .withValueSerde(new MyMessageSerde());

This is throwing an exception:

java.lang.ClassCastException: [B cannot be cast to MyMessage
at org.apache.kafka.streams.kstream.internals.KTableFilter.computeValue(KTableFilter.java:57)
    at org.apache.kafka.streams.kstream.internals.KTableFilter.access$300(KTableFilter.java:25)
    at org.apache.kafka.streams.kstream.internals.KTableFilter$KTableFilterProcessor.process(KTableFilter.java:79)
    at org.apache.kafka.streams.kstream.internals.KTableFilter$KTableFilterProcessor.process(KTableFilter.java:63)
    at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50)
    at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244)
    at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
    at org.apache.kafka.streams.kstream.internals.ForwardingCacheFlushListener.apply(ForwardingCacheFlushListener.java:42)
    at org.apache.kafka.streams.state.internals.CachingKeyValueStore.putAndMaybeForward(CachingKeyValueStore.java:101)
    at org.apache.kafka.streams.state.internals.CachingKeyValueStore.access$000(CachingKeyValueStore.java:38)
    at org.apache.kafka.streams.state.internals.CachingKeyValueStore$1.apply(CachingKeyValueStore.java:83)
    at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:141)
    at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:99)
    at org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:125)
    at org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:123)
    at org.apache.kafka.streams.state.internals.InnerMeteredKeyValueStore.flush(InnerMeteredKeyValueStore.java:284)
    at org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore.flush(MeteredKeyValueBytesStore.java:149)
    at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:239)
    ... 21 more

I don't understand why, because I set the valueSerde of the store to MyMessageSerde which knows how to serialize/deserialize MyMessage to a byte array.

Update

I changed the code to the following:

myMessages.filter((key,value) -> (value.getKey() != null));

and added the following to my application.yml

spring.cloud.stream.kafka.streams.bindings.input:
  consumer:
    materializedAs: all-messages
    key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
    value-deserializer: MyMessageDeserializer  `

Now I get the following stack trace:

Exception in thread "raven-a43f181b-ccb6-4d9b-a8fd-9fe96542c210-StreamThread-1" org.apache.kafka.streams.errors.ProcessorStateException: task [0_3] Failed to flush state store all-messages
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:242)
at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:202)
at org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:420)
at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:394)
at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:382)
at org.apache.kafka.streams.processor.internals.AssignedTasks$1.apply(AssignedTasks.java:67)
at org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:362)
at org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:352)
at org.apache.kafka.streams.processor.internals.TaskManager.commitAll(TaskManager.java:401)
at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:1042)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:845)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Caused by: java.lang.ClassCastException: [B cannot be cast to MyMessage
at org.apache.kafka.streams.kstream.internals.KTableFilter.computeValue(KTableFilter.java:57)
at org.apache.kafka.streams.kstream.internals.KTableFilter.access$300(KTableFilter.java:25)
at org.apache.kafka.streams.kstream.internals.KTableFilter$KTableFilterProcessor.process(KTableFilter.java:79)
at org.apache.kafka.streams.kstream.internals.KTableFilter$KTableFilterProcessor.process(KTableFilter.java:63)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50)
at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
at org.apache.kafka.streams.kstream.internals.ForwardingCacheFlushListener.apply(ForwardingCacheFlushListener.java:42)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.putAndMaybeForward(CachingKeyValueStore.java:101)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.access$000(CachingKeyValueStore.java:38)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore$1.apply(CachingKeyValueStore.java:83)
at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:141)
at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:99)
at org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:125)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:123)
at org.apache.kafka.streams.state.internals.InnerMeteredKeyValueStore.flush(InnerMeteredKeyValueStore.java:284)
at org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore.flush(MeteredKeyValueBytesStore.java:149)
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:239)
... 12 more`
Questioner
mbluke
Viewed
28
Matthias J. Sax 2019-07-06 05:09

The type of queries you want are not supported easily. Note, that there are not secondary indexes, but only regular key-based lookups and ranges are supported.

If you know all queries upfront, you might be able to re-group the data into derived KTables that have the query attribute as key. Note, that keys must be unique, and hence, if a query attribute contains non-unique data, you would need to use some Collection type as value:

KTable originalTable = builder.table(...)
KTable keyedByFieldATable = originalTable.groupBy(/*select field A*/).aggregate(/* the aggregation return a list or similar of entries for the key*/);

Note that you duplicate your storage requirement each time you re-key the original table.

As an alternative, you can do full table scans over the original table and evaluate your filter conditions when you use the returned iterator.

It's a space vs CPU tradeoff. Maybe Kafka Streams is not the right tool for your problem.