Spark-hbase integration



Hi, I’m trying to connect hbase to spark at scala REPL shell, As I’m going thru internet to find out connectivity. Most of them saying to import hbase libraries, few are saying start spark-shell with $hbase bin directory so that it will include jars while running.

When trying to import few libraries, getting error:
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.{HTableDescriptor,HColumnDescriptor}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.client.{Put,HTable}

import org.apache.hadoop.hbase.HBaseConfiguration

since, with this I’m unable to create the val conf = HBaseConfiguration.create().
Also, I’m not able to find any $hbase path bin directory at Ambari UI / anywhere.

Plz guys help me on this issue, as it is most part of the concept to deal with Spark while storing data.

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster


i just started my seesion with
spark-shell --master yarn

and imported all the below ones required for my job ( reading data from hbase and performing some count based on haskey)

import org.apache.hadoop.hbase.{HBaseConfiguration, TableName}
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat
import org.apache.hadoop.hbase.util.Bytes
import org.apache.log4j.Logger
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Row
import org.apache.spark.rdd.RDD.rddToPairRDDFunctions


Okay, tq for your response. Problem got resolved.:slight_smile: @tejasvini