How to read an ORC File using Spark Core APIs

Hello Everyone,

Is anyone tried to access ORC files by using the Spark Core API’s. I do see that using Spark SQL it can be done. But I am giving a try to access by using the Spark Core API’s. If anyone tried please do let me know.

@email2dgk One using Dataframe

sc.parallelize(records).toDF().write.format(“orc”).save(“people”)

@Rahul Thanks for sharing. I am not interested to do that using Spark SQL (Data Frame).

I could not find any other approach.

I also tried but could not find any direct solution for read ORC using rdd.

And my finding below. I didn’t test, if you get time check below thing is working.

Spark context providing option to read file using hadoopfs. like bleow

sparkContext. hadoopFile(String path, Class<? extends org.apache.hadoop.mapred.InputFormat<K,V>> inputFormatClass, Class keyClass, Class valueClass, int minPartitions)

You can specify
org.apache.orc.mapreduce.OrcInputFormat

https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/SparkContext.html#hadoopFile(java.lang.String,%20java.lang.Class,%20java.lang.Class,%20java.lang.Class,%20int)

https://orc.apache.org/api/orc-mapreduce/index.html?org/apache/orc/mapreduce/OrcInputFormat.html

Thnaks
Suresh Selvaraj