Pyspark streaming on Kafka


#1

Hi,

Can you please let me know how to access Kafka streaming on lab using pyspark?

I tried this but am getting below error message.

from pyspark.streaming.kafka import KafkaUtilsxs
ssc=StreamingContext(sc,5)
kafkaStream = KafkaUtils.createStream(ssc, ‘nn01.itversity.com:6667,nn02.itversity.com:6667,rm01.itversity.com:6667’, ‘spark-streaming’, {‘lingeshtest’:1})


Spark Streaming’s Kafka libraries not found in class path. Try one of the following.

  1. Include the Kafka library and its dependencies with in the
    spark-submit command as

    $ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka:1.6.2 …

  2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
    Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-assembly, Version = 1.6.2.
    Then, include the jar in the spark-submit command as

    $ bin/spark-submit --jars <spark-streaming-kafka-assembly.jar> …



#2

Can anyone help me ? I would like to use Kafka from lab to stream that data using python. Could anyone let me know if there is any configuration i have to change.