Reading avro file


I am unable to import avro.schema and read avro file in pyspark.

Please help



Try the below code to read avro file in a data frame

import avro.schema

df ="com.databricks.spark.avro").load("file.avro")

Also load your pyspark using the below command

pyspark --packages com.databricks:spark-avro_2.10:2.0.1


Hi Varun,

pyspark --packages com.databricks:spark-avro_2.10:2.0.1 is executed and entered to pyspark mode.But after that if we write import avro.schema then the below error is coming.

Using Python version 2.7.5 (default, Aug 4 2017 00:39:18)
SparkContext available as sc, HiveContext available as sqlContext.

import avro.schema
Traceback (most recent call last):
File “”, line 1, in
ImportError: No module named avro.schema

Please help on this.



Apologies for a late reply.

You don’t need to do the import. Go ahead with the code directly once you have loaded pyspark with avro library.