Getting error while running spark streaming job


#1

Hi Team,

I am getting below error while running a spark streaming job:

18/10/11 12:39:48 ERROR ReceiverTracker: Deregistered receiver for stream 0: Restarting receiver with delay 2000ms: Error connecting to gw01.itversity.com:19999 - java.net.ConnectException: Connection refused

Below is the streaming job i am running:

from pyspark import SparkConf,SparkContext
from pyspark.streaming import StreamingContext

import sys

hostname = sys.argv[1]
port = int(sys.argv[2])
conf = SparkConf().setAppName(“Streaming department count”).setMaster(“yarn-client”)
sc=SparkContext(conf=conf)

ssc= StreamingContext(sc,30)

messages= ssc.socketTextStream(hostname,port)

departmentMessages=messages.filter(lambda msg:msg.split(" “)[6].split(”/")[1]==“department”)
departmentnames=departmentMessages.map(lambda msg:(msg.split(" “)[6].split(”/")[2]),1)

from operator import add

departmentCount=departmentnames.reduceByKey(add)

outputPrefix= sys.argv[3]
departmentCount.saveAsTextFiles(outputPrefix)

ssc.start()
ssc.awaitTermination()


Below is the spark submit i am executing:

spark-submit --master yarn --conf spark.ui.port=12890 StreamingDepartment.py gw01.itversity.com 19999 /user/abhirajs25/Streamingdeparmentpython/cnt

Please help.


#2

@jitendrapp You can’t connect to gw01.itversity.com. use the assigned host name from here https://labs.itversity.com/user/lab


#3

@BaLu_Sal which port number to be used for gw02.itversity.com


#4

@jitendrapp You can use any available port.


#5

Yes…but i am not sure how to check available ports. Please let me know where i can see that information about available ports.


#6

You can use some 5 digit number less than 62222. For now you can use the port 19000-19010.