Spark foreach failing to print the RDD read from HBase table

apache-spark
scala
apache-hbase

#1

Hi All,

In the below snippet I’m trying to read the HBase table using spark application.

val conf = HBaseConfiguration.create()
conf.set(“hbase.zookeeper.property.clientPort”,“2181”)
conf.set(“hbase.zookeeper.quorum”,“nn01.itversity.com,nn02.itversity.com,rm01.itversity.com”)
conf.set(TableInputFormat.INPUT_TABLE,“user”)

val usersRDD = sc.newAPIHadoopRDD(conf,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result])
usersRDD.cache()
usersRDD.foreach{case(_,result) =>
val key = Bytes.toString(result.getRow)
val name = Bytes.toString(result.getValue(“basic”.getBytes,“name”.getBytes))
val age = Bytes.toString(result.getValue(“basic”.getBytes,“age”.getBytes))
println("Row Key : " + key + " Name : " + name + " Age : " + age)
}

Here in the above snippet I’m able to obtain the table data but failing to iterate through it and print them.
I also got the count of usersRDD using count function, it returns correct no. of rows available. That makes sure that table data is fetched.

Please someone tell me where I’m making the mistake.


#2

Firstly, when I am trying to do this
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
its saying error: object mapreduce is not a member of package org.apache.hadoop.hbase
So I am unable to use TableInputFormat on the first hand… Can you help me with this? Thank You


#3

Please find the libraries mentioned below, which I added in my project.
Adding hbase-server should solve your problem. Because TableInputFormat is a part of hbase-server package.

libraryDependencies ++= Seq(
“org.apache.spark” % “spark-core_2.11” % “2.1.0”,
“org.apache.hbase” % “hbase-client” % “1.1.2”,
“org.apache.hbase” % “hbase-common” % “1.1.2”,
“org.apache.hbase” % “hbase-server” % “1.1.2”
)


#4

still I am facing this issue:

i am using these maven dependencies:

In my code:

conf.set(“hbase.zookeeper.quorum”,“nn01.itversity.com,nn02.itversity.com”)
conf.set(“hbase.zookeeper.property.clientPort”,“2181”)
conf.set(“zookeeper.znode.parent”,"/hbase-unsecure")
conf.set(“spark.driver.extraClassPath”,"/usr/hdp/2.5.0.0-1245/hbase/conf")

my spark submit script:

spark-submit --verbose --master yarn
–jars /usr/hdp/2.5.0.0-1245/hbase/lib/hbase-common.jar,
/usr/hdp/2.5.0.0-1245/hbase/lib/hbase-client.jar,
/usr/hdp/2.5.0.0-1245/hbase/lib/hbase-protocol.jar,
/usr/hdp/2.5.0.0-1245/hbase/lib/hbase-annotations.jar
–class sparkhbasexample hbase-example-1.0.0.jar
Please help me resolve this


#5

Can you please post your code complete snippet.