Reading avro file


#1

I am unable to import avro.schema and read avro file in pyspark.

Please help

Thanks
Dileep


#2

Try the below code to read avro file in a data frame

import avro.schema

df = sqlContext.read.format("com.databricks.spark.avro").load("file.avro")
df.show()

Also load your pyspark using the below command

pyspark --packages com.databricks:spark-avro_2.10:2.0.1

#3

Hi Varun,

pyspark --packages com.databricks:spark-avro_2.10:2.0.1 is executed and entered to pyspark mode.But after that if we write import avro.schema then the below error is coming.

Using Python version 2.7.5 (default, Aug 4 2017 00:39:18)
SparkContext available as sc, HiveContext available as sqlContext.

import avro.schema
Traceback (most recent call last):
File “”, line 1, in
ImportError: No module named avro.schema

Please help on this.

Thanks


#4

@dileepdominic
Apologies for a late reply.

You don’t need to do the import. Go ahead with the code directly once you have loaded pyspark with avro library.