Issue while adding spark-hive and spark-sql dependencies together


#1

I am trying to load spark RDD data into a hive table after transformation. So I am using the below code

import org.apache.spark.{SparkConf,SparkContext}
import org.apache.spark.sql.hive.HiveContext

case class Dataflt(id:Int,name:String,sal:Int)

object filterHiveDemo{
def main(args:Array[String]){
val conf = new SparkConf().setMaster(args(0)).setAppName(“Filter_LOAD_Hive”)
val sc = new SparkContext(conf)
val sqlContext= new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val hiveContext = new HiveContext(sc)
import hiveContext.implicits._

val path_dt = args(1)
val dat = scala.io.Source.fromFile(path_dt).getLines.toList
val dat_rdd = sc.parallelize(dat)
val dat_filtered = dat_rdd.filter(x=> x.split('|')(2).toInt>3000)



val dat_tuple = dat_filtered.map(x=>Dataflt(x.split('|')(0).toInt,x.split('|')(1),x.split('|')(2).toInt)).toDF()
dat_tuple.write.mode("append").saveAsTable("swayam_db.filter_data")

}

}
Below are the dependencies I have added for this sbt project.

libraryDependencies += “org.apache.spark” % “spark-core_2.10” % “1.6.3”
libraryDependencies += “org.apache.spark” % “spark-streaming_2.10” % “1.6.3”
libraryDependencies += “org.apache.spark” % “spark-sql_2.10” % “1.6.3”
libraryDependencies += “org.apache.spark” % “spark-hive_2.10” % “1.6.3”

Though while running the code through REPL working fine, for making it SBT project through sbt package, its throwing the below error.

value toDF is not a member of org.apache.spark.rdd.RDD[Dataflt]
[error] val dat_tuple = dat_filtered.map(x=>Dataflt(x.split(’|’)(0).toInt,x.split(’|’)(1),x.split(’|’)(2).toInt)).toDF()

But If I am removing the hive dependencies from the project, toDF() function is working fine as the sqlContext is already imported but due to unavailability of hive context hive import is failing. Can’t we add both spark-hive and spark-sql dependencies in same project? How should we work around for this?