Cannot convert to dataframe when packaged in application, but standalone working

Hi @itversity @itversity1 @Itversity_Training @annapurna ,

When I run the below code standalone, it is running fine and data is getting saved.
But when i package it inside an application and compile with sbt package, it is getting the below error at compile time:
[error] /home/sabby5180/sparkPipeline/src/main/scala/getDepartments.scala:42: value toDF is not a member of org.apache.spark.rdd.RDD[(String, String)]
[error] possible cause: maybe a semicolon is missing before `value toDF’?
[error] toDF(“dept_id”,“count_in_rdd”).

Below is my entire code:

import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming._
import scala.io.Source
import org.apache.spark.sql.SaveMode._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql._
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.SQLContext._

object getDepartments{
def main(args: Array[String]){

val conf= new SparkConf().
setMaster(“yarn-client”).
setAppName(“spark streaming”).
set(“spark.executor.cores”,“2”).
set(“spark.executor.instances”,“2”).
set(“spark.executor.memory”,“2g”)

val sc= new SparkContext(conf)

val sqlContext= new HiveContext(sc)//although not required

val rawRdd= sc.parallelize(Source.
fromFile("/home/sabby5180/testFile").
getLines().
toList)

val onlyDepartmenetsRdd=
rawRdd.
filter(x=> x.split(" “)(6).split(”/")(1).toLowerCase==“department”).
map(row=> {
val rowArr= row.split(" “)
val onlyDeptPart= rowArr(6).split(”/")(2).toString
(onlyDeptPart,1.toString)
})

//saving as textfile
onlyDepartmenetsRdd.
coalesce(1).
toDF(“dept_id”,“count_in_rdd”).
write.
partitionBy(“dept_id”).
format(“text”).
mode(“append”).
save(“streaming_data”)
}
}
error:

below is the build.sbt:
name:= “spark_pipelines”
version:= “0.13”
scalaVersion:= “2.10.6”

libraryDependencies++= Seq(
“org.apache.spark” % “spark-core_2.10” % “1.6.3”,
“org.apache.spark” % “spark-sql_2.10” % “1.6.3”,
“org.apache.spark” % “spark-hive_2.10” % “1.6.3”,
“org.apache.spark” % “spark-streaming_2.10” % “1.6.3”
)

Can you please help.

Thanks and Regards,
Sabyasachi


Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster

solved it,
needed to import the "sqlContext.implicits." after creating “sqlContext” object.
i.e, basically added the below line after sqlContext:
"import sqlContext.implicits.
"

Thanks and Regards,
Sabyasachi