How can we save the output of a data frame operation with a delimiter of our choice. One thing I found is to import the csv package of databricks. com.databricks.spark.csv. But using this we can only save it as a csv file. Is there an option to save in any other format with a different delimiter or are delimiters only applicable to text files?
Try using this:
input is a dataframe:
input.map(r => r(0)+’’+r(1)+’’+r(2)).toDF.write.orc("/user/cloudera/result/")
Mark the question as answered if this solves ur question.
1 convert data frame in to a DS (similar to RDD here) to apply map function and change delimiter
val convToDS=inputDF.map(rec => rec(0)+"delim "+rec(1)+"delim "+rec(2)).toDS()
- Convert back DS to DF
- save/write the file in desired format
- output file will be saved with desired delimiter
As per my understanding this code snippet will fail.
One more part to the question is, are delimiters applicable only to textFiles or all the other file formats too? Please don’t mind if my quesiton is dumb.
@santoshchada I think you can use this while using dataframe.write function
This is only available if I import the csv package. When I tried to do it directly, it didnt work.
In python we can do it in two ways .
1.Using for loop : data = read.map(lambda x:x.split(",")).map(lambda x:"|".join(x[i] for i in range(3) ))
- Manual process
No Venkat this works. I have verified it.