Avro File reading Issue

I’m starting Spark with below option to read Avro file using Scala. But ending up with issue.

spark-shell --packages com.databricks:spark-avro_2.10:2.0.1

Then, ran below command
import com.databricks.spark.avro._

Thereafter tried reading avro file in multiple ways:

val df1=spark.read.format(“avro”).load("/user/itv769399/customers-avro/part-m-00003.avro")

Also, tried with below code
val df=spark.read.format(“com.databricks.spark.avro”).load("/user/itv769399/kiran/customers-avro/part-m-00003.avro")

However, ending up seeing below message. Can you please help me in getting this resolved. Wondering what else I’m missing.

java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.avro.AvroFileFormat. Please find packages at Third-Party Projects | Apache Spark
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:675)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:213)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:197)
… 51 elided
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.avro.AvroFileFormat.DefaultSource
at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:652)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:652)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:652)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:652)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:652)
… 53 more

Hi @hadoop_training

For 2.4.7 use :

org.apache.spark:spark-avro_2.11:2.4.0 

spark-shell --master yarn --conf spark.ui.port=0 --packages org.apache.spark:spark-avro_2.11:2.4.0

This is not a lab issue.

Thank you, it worked now!!

Also, from you example I see, you are writing the file in Avro format. I also tried the similar thing and doing so, I applied deflate compression and below is the code. When I looked to the file thus created in HDFS location, I can see it is AVRO file, but the compression deflate is not appearing. Anything that you advise here please?

data.write.option(“compression”,“deflate”).mode(“overwrite”).format(“avro”).save("/user/itv769399/kiran/subm/rslts/pdts_avro")