I have a couple of questions about the situation where we get a CCA 175 question about consuming data from Kafka into Spark or using flume to consume Socket/Kafka and write to a(ny) sink.
Question 1: Real time streaming data is once-read only so there is no room to make mistake. Does anybody know how to test your streaming solution (and possibly correct your mistake) ?
Question 2: Also, I read that the current CDH version is 5.15 and for 5.8 Cloudera documentation says:
“Due to the change of offset storage from ZooKeeper to Kafka in the CDH 5.8 Flume Kafka client, data might not be consumed by the Flume agents, or might be duplicated (if kafka.auto.offset.reset=smallest) during an upgrade to CDH 5.8”
Does that mean that the possibility for a question that requires reading from kafka and putting it in some sink using flume is low ?