Regarding error in pyspark

pyspark

#1

Hai,
I am getting the following error given in the screenshot attached below when i try to execute the following code in pyspark.Can anyone tell me what is the error?

from datetime import datetime
crimefile=sc.textFile("/public/crime/csv/*")
crimefirst=crimefile.first();
crimerdd=crimefile.filter(lambda rec: rec!=crimefirst)
crimemap=crimerdd.map(lambda rec: ((datetime.strptime(rec.split(",")[2],’%m/%d/%Y %H:%M:%S %p’).strftime(’%Y%m’),rec.split(",")[5]),rec.split(",")[0]))
crimecountbytype=crimemap.aggregateByKey(0,
lambda inter,id: inter+1,
lambda final,inter: final+inter)
crimecountmap=crimecountbytype.map(lambda rec: ((rec[0][0],-rec[1]),rec[0][1]))
crimecountsort=crimecountmap.sortByKey()
crimecount=crimecountsort.map(lambda rec:rec[0][0]+"/t"+str(-rec[0][1])+"/t"+rec[1])
crimecount.saveAsTextFile(path=“Location”,compressionCodecClass=“org.apache.hadoop.io.compress.GzipCodec”)


#2

@karthick_raja I am able to run the snippet without any issue. Try below code by changing output file name.

from datetime import datetime
    crimefile=sc.textFile("/public/crime/csv/*")
    crimefirst=crimefile.first();
    crimerdd=crimefile.filter(lambda rec: rec!=crimefirst)
    crimemap=crimerdd.map(lambda rec: ((datetime.strptime(rec.split(",")[2],'%m/%d/%Y %H:%M:%S %p').strftime('%Y%m'),rec.split(",")[5]),rec.split(",")[0]))
    crimecountbytype=crimemap.aggregateByKey(0,lambda inter,id: inter+1,lambda final,inter: final+inter)
    crimecountmap=crimecountbytype.map(lambda rec: ((rec[0][0],-rec[1]),rec[0][1]))
    crimecountsort=crimecountmap.sortByKey()
    crimecount=crimecountsort.map(lambda rec:rec[0][0]+"/t"+str(-rec[0][1])+"/t"+rec[1])
    crimecount.saveAsTextFile(path="/user/sseashu1/outdirectoryname",compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec")
    
#with out ziping output file# 

crimecount.saveAsTextFile(path="/user/sseashu1/outdirectoryname")

#3

Thank you @BaLu_SaI. But now I am getting another error given in the screenshot attached below when i try to execute the following command.

crimecountsort=crimecountmap.sortByKey()

I have checked the date format in the table with the date format I have specified.They are same.Can you tell me what is the error?


#4

@karthick_raja Use %Y not %y