Pyspark - Getting a Class not Found while trying this method .saveAsNewAPIHadoopFile

From pyspark,I am trying to save a sequence file to HDFS using the method saveAsNewAPIHadoopFile

Following is the Actual Command x: tuple(x.split(",",1))).saveAsNewAPIHadoopFile("/user/cloudera/pyspark/departmentsSeq","org.apache.hadoop.mapreduce.lib.output.sequenceFileOutputFormat",keyClass="",valueClass="")

I am getting the following Error

py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.saveAsNewAPIHadoopFile.
: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.lib.output.sequenceFileOutputFormat

PS: I have tried the other way of saving sequence files to hdfs via pyspark using saveAsSequenceFile and it works,I am trying to get this way working as well,Any help is appreciated

I am using Cloudera VM

@hsksmails try the below solution

Thank you but Doesn’t help

The error which i am getting is a class not found exception for sequenceFileOutputFormat
The resource you pointed out is for a similar use case in scala, ,But even by that looks like the format i have used is right,Something else is the problem here

Sorry Guys,I figured out what was wrong,The Output Format Class name was wrong
I had a small s instead of Capital S for SequenceFileOutputFormat

This worked x:tuple(x.split(",",1))).saveAsNewAPIHadoopFile("/user/cloudera/pyspark/departmentsSeq","org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat",keyClass="",valueClass="")

1 Like