Changing the delimiter after a data frame operation

How can we save the output of a data frame operation with a delimiter of our choice. One thing I found is to import the csv package of databricks. com.databricks.spark.csv. But using this we can only save it as a csv file. Is there an option to save in any other format with a different delimiter or are delimiters only applicable to text files?

Try using this:
input is a dataframe:

input.map(r => r(0)+’’+r(1)+’’+r(2)).toDF.write.orc("/user/cloudera/result/")

Mark the question as answered if this solves ur question.

Try this:
1 convert data frame in to a DS (similar to RDD here) to apply map function and change delimiter

val convToDS=inputDF.map(rec => rec(0)+"delim "+rec(1)+"delim "+rec(2)).toDS()

  1. Convert back DS to DF

val convBackToDF=convToDS.toDF()

  1. save/write the file in desired format

convBackToDF.write.text(“output path”)

  1. output file will be saved with desired delimiter

@rajevenkat

As per my understanding this code snippet will fail.

One more part to the question is, are delimiters applicable only to textFiles or all the other file formats too? Please don’t mind if my quesiton is dumb.

Thanks,
Santosh.

@santoshchada I think you can use this while using dataframe.write function

.option("delimiter", "||")

This is only available if I import the csv package. When I tried to do it directly, it didnt work.

In python we can do it in two ways .
1.Using for loop : data = read.map(lambda x:x.split(",")).map(lambda x:"|".join(x[i] for i in range(3) ))

  1. Manual process

No Venkat this works. I have verified it.