Read and save Avro, parquet and sequence file in Spark 1.2.1

spark-shell
#1

I have some queries for CCA 175 certification in spark 1.2.1.

  1. For saving as sequence file in Scala as we need give key and value, in the video it is for two columns table. For table of more than two columns how can we read and save as sequence file. Following way of saving using NullWritable is ok or is there any way like python like x.splt(",", 1), some thing like use first column value as key and rest as value.

val categories = sc.textFile("/user/cloudera/sqoop_import/categories")

categories.map(rec => (NullWritable.get(), rec)).saveAsSequenceFile("/user/cloudera/sparkscala/categoriesseq")

val categoriesRead = sc.sequenceFile("/user/cloudera/sparkscala/categoriesseq", classOf[NullWritable], classOf[Text])

categories.map(rec => rec.toString).collect().foreach(println)

  1. How to read and save AvrodataFile

  2. How to read and write parquetFile
    There is method as sqlContext.parquetFile() to read. Also to save to parquetfile there is method saveAsParquetFile. But how we can save to text file.

0 Likes