Unable to run spark streaming job


#1

Unable to start spark streaming job

from pyspark import SparkContext,SparkConf
from pyspark.streaming import StreamingContext
def main():
conf = SparkConf(appName=“Operations”,Master=“yarn”)
sc = StreamingContext(conf)
ssc = StreamingContext(sc, 10)
data = ssc.socketTextStream(“gw02.itversity.com”,19001)
mapData = data.map(lambda x: x.split(" ")).groupByKey(lambda w: w(0))
print(mapData.take(5))

ssc.start()
ssc.awaitTermination()

if “name” == “main”:
main()

spark-submit /home/haryanam/pyspark-jobs/streaming.py --master yarn --conf spark.ui.port=12890


#2

What is the exception you are getting?


#3

Sunil,I don’t see any exception. Job doesn’t start

[haryanam@gw02 pyspark-jobs]$ spark-submit streaming.py --master yarn --conf=‘spark.ui.port=10101’
Multiple versions of Spark are installed but SPARK_MAJOR_VERSION is not set
Spark1 will be picked by default
[haryanam@gw02 pyspark-jobs]$


#4

@HYm Can you try below streaming.py code:

from pyspark import SparkConf, SparkContext
from pyspark.streaming import StreamingContext
conf = SparkConf().
setAppName(“Streaming Department Count”).
setMaster(“yarn-client”)
sc = SparkContext(conf=conf)
ssc = StreamingContext(sc, 15)
lines = ssc.socketTextStream(“gw03.itversity.com”, 19001)
words = lines.flatMap(lambda line: line.split(" "))
wordTuples = words.map(lambda word: (word, 1))
wordCount = wordTuples.reduceByKey(lambda x, y: x + y)
wordCount.pprint()
ssc.start()
ssc.awaitTermination()

#spark submit command:

spark-submit streaming.py --master yarn --conf spark.ui.port=12901

#Netcat input:

[annapurnachinta@gw03 ~]$ nc -lk gw03.itversity.com 19001
Hi my name is Farhan Misarwala
I am learning big data
Hi my name is Farhan Misarwala
I am learning big data

#Sample word count data from the logs