Kafka + Spark Streaming Python integration Issue


#1

Dear ITversity team,

I would be grateful for your help to resolve the issue with Spark Streaming. I submit the code using the following command and provide path for code and output path:

spark-submit --master yarn --conf spark.ui.port=12901 --jars “/usr/hdp/2.5.0.0-1245/spark/kafka/libs/spark-streamin
g-kafka_2.10-1.6.2.jar,/usr/hdp/2.5.0.0-1245/kafka/libs/kafka_2.10-0.8.2.1.jar,/usr/hdp/2.5.0.0-1245/kafka/libs/metrics-core-2.2.0.jar”
/home/alexandergsv/spark/firstTrial.py /user/alexandergsv/output/cnt

However, i get the mistake, that I need a file with version 1.6.3. In the kafka/lib folder I can find only version 1.6.2. How should I fix the problem? Thank you in advance!


#2

@Alexander Use the jar location below and try /usr/hdp/current/kafka-broker/libs/ojdbc6-11.2.0.3.jar


#3

Thanks for the quick response. I changed the parameters, but still get the same error message. Any more ideas, what the problem could be? Here is the code and the screenshot:


spark-submit --master yarn --conf spark.ui.port=12901 --jars “/usr/hdp/current/kafka-broker/libs/ojdbc6-11.2.0.3.jar,/usr/hdp/current/kafka-broker/libs/kafka_2.11-1.0.0.2.6.5.0-292.jar,/usr/hdp/current/kafka-broker/libs/metrics-core-2.2.0.jar” /home/alexandergsv/spark/firstTrial.py /user/alexandergsv/output/cnt


#4

It is not just path issue.

Our team will run the command and see where the issue is.

@Sunil_Itversity @Gurpreet_Singh


#5

Alexander,

We investigated the issue, and found the following inconsistencies in your code:

  1. spark-streaming-kafka_2.10-1.6.2.jar is under /usr/hdp/2.5.0.0-1245/kafka/libs/ and NOT /usr/hdp/2.5.0.0-1245/spark/kafka/libs/
  2. metrics-core-2.2.0.jar is under /usr/hdp/current/kafka-broker/libs/
  3. Broker list in your code should be wn01.itversity.com:6667,wn02.itversity.com:6667,wn03.itversity.com:6667. You can refer to http://gw02.itversity.com:8080/#/main/services/KAFKA/configs in Ambari and see that port 6667 is assigned to wn02, wn02, wn03
  4. ssc.start() and ssc.awaitTermination() is missing in your code which allows the streaming context to start accepting the messages.
  5. Then you need to push the messages to your topic, and accordingly write the code to process the data as needed.

Please revisit the videos again in case of any unclarity. Hope it helps.