Hi Can anyone please help me


#1

I am trying to load data from avro directory which contains imported avro files of orders table. I used following command while launching spark-shell

spark-shell --packages com.databricks:spark-avro_2.11:4.0.0 --master yarn --conf spark.ui.port=12564

I imported avro package from databricks

import com.databricks.spark.avro._;

after that when I tried to load data to scala variable using following code,

var dataFile = sqlContext.read.avro("/user/karteekkhadoop/solutions/problem4/avro")

I got erroneous are inaccessible type error.

I am unable to figure out what wrong with it. Please help.

I got the following message also.

Thank you in advance


#2

Hii @Karteek_Kadari

we need to use avro version of 2.0.1 for spark version 1.6.2 to read an avro file and also change the version of scala to 2.10

so please use below command to launch spark-shell

spark-shell --packages com.databricks:spark-avro_2.10:2.0.1 --master yarn --conf spark.ui.port=12564

PFA for compatible versions

Thanks & Regards,
Sunil Abhishek


#3

Thank you for the reply. But now I got following exception when I tried to convert avro data to parquet. Please help.


#4

Hi @Karteek_Kadari
Launch spark-shell using below command

spark-shell --packages com.databricks:spark-avro_2.10:2.0.1 --master yarn --conf spark.ui.port=12564

Then try below way to read and load avro file and convert it to paraquet file

val df = sqlContext.read.format("com.databricks.spark.avro").load("/user/username/filename");

df.repartition(1).write.parquet("/user/username/filename");

check whether the file exists in hdfs or not using below command

hdfs dfs -ls /user/username/path

Thanks & Regards,
Sunil Abhishek