Urgent: how to import avro package in lab environment


#1

Here is what I did in the lab:

scala> import com.databricks.spark.avro._
:25: error: object databricks is not a member of package com
import com.databricks.spark.avro._

Why it throws error? I need to read and write arvo file, how can I practice in the lab?

Thank you very much.


#2

Hello,

By default spark-shell not comes up Avro supported libraries and need to import Avro packages first. you have to launch the shell using the command.

spark-shell --packages com.databricks:spark-avro_2.11:4.0.0

Then start typing your commands in the shell, it will be working fine. :slight_smile:

Regards
Venkatesh


#3

Thank you for telling me that, what about in the exam environment? What about orc and parquet formats? can you give me a working example on lab?

Thank you.


#4

Hello,

In exam environment, spark by default comes with libraries of Parquet and ORC. No issues with that.

Parquet

import org.apache.spark.sql._
import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)
import sqlContext.implicits._

val DataFrameName = sqlContext.read.parquet(“path”)

DataFrameName.write.mode(SaveMode.Append).parquet(“path”)

ORC

import org.apache.spark.sql._
import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)
import sqlContext.implicits._

val DataFrameName = sqlContext.read.orc(“path”)

DataFrameName.write.mode(SaveMode.Append).orc(“path”)

Regards
Venkatesh


#5

@venkateshm,

Thank you, you helped me :slight_smile:

@paslechoix,

Avro, Parquet kind of file formats are not supported in old Spark versions i.e. below Spark 1.6.
But in recent latest versions, Spark SQL is also capable to ready these file formats by default without help of external libraries from Databricks. Recently they merged them with Spark binaries.