Jar file missing for Flume and Spark Streaming Integration

flume
#1

Hi Team,

I am trying to get flume data to Spark sink and use it to get department count for gen_logs.
As mentioned in the tutorial, we need to mention jars in spark submit command.
I was using below command:

spark-submit --master yarn  \
  --conf spark.ui.port=12890 \
  --jars "/usr/hdp/2.5.0.0-1245/spark/lib/spark-streaming-flume_2.10-1.6.2.jar,/usr/hdp/2.5.0.0-1245/spark/lib/spark-streaming-flume-sink_2.10-1.6.2.jar,/usr/hdp/2.6.5.0-292/flume/lib/flume-ng-sdk-1.5.2.2.6.5.0-292.jar"  \
  /home/anujidgupta/python_demo/streamingFlumeDeptCount.py \
  gw02.itversity.com 8123 \
  /user/anujidgupta/streamingFlumeDeptCnt1/cnt

Please note flume-ng-sdk- jar is present in different version (/usr/hdp/2.6.5.0-292) as other required jars are present under /usr/hdp/2.5.0.0-1245/

WHile running I am getting empty files created.
Please suggest if issue is due to jar file location or some other issue.

0 Likes

#2

Not sure. I need to troubleshoot further on this. It could be due to Spark 1.6.3 or some missing jars.

0 Likes