Failed CCA175 on 15 Sep 2018. this is why

pyspark
apache-spark

#1

failed the exam due to this error: “UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xxxx’ in position xxxx: ordinal not in range(128)” for PYSPARK when i trying to save my output to target HDFS location.

I performed all operations with DF/SPARKSQL then transferred back to RDD, but when trying to running yourrdd.saveAsTextFile function, this error shows up. any idea how to solve this??? Thanks…


#2

Hi, did you face the same issue using df.write.text(“folder_path”)?


#3

Hi,

I have not tried that.

They asked for special column delimiters for textfile output, so I guess need change DF to RDD to map again?


#4

Let’s say your DF has 3 columns (cust_id, cust_fname, cust_lname) and we are asked to use | as the delimiter then as can do as follows.

df.as(“c”).selectExpr(“concat_ws(’|’, c.*) as result”).write.text(“folder_path”)