Avro Format Not working

Team,

I am trying to work with Avro, but for some reason it doesn;t work. Did anyone of you setup the process or a program that can give me hint of what is missing.


Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster

Hi @Shree_M,

working with Avro in case of pyspark first you have to import avro package while launching pyspark-

use below command to launch spark-with-avro

pyspark2 --master yarn --conf spark.ui.port=0 --packages com.databricks:spark-avro_2.11:4.0.0

after this you can perfom avro operation.

still getting issue please come up with the code which you tried.

Here is my code

import pyspark
from pyspark.sql import SparkSession
from pyspark.conf import SparkConf

conf = SparkConf()
conf.set(“spark.jars.packages”, “com.databricks:spark-avro_2.11:2.4.0”)
spark = SparkSession.builder.appName(“AVRO-Excersices”).
config(conf= conf).
getOrCreate()

spark.conf.set(“spark.sql.legacy.replaceDatabricksSparkAvro.enabled”,“true”)
inn="/public/orders/part-00000"
out="/user/shree624/Solutions/problem1"

#SQL="select sum(cast(_c2 as Bigint)) as Total_Value from mytable "

df=spark.read.format(“csv”).option(‘sep’,’,’).option(“header”,“false”).load(inn)
df.printSchema()
df.show()

df.write.mode(“overwrite”).option(“compression”,“snappy”).format(“avro”).save(out)