Load Data into Hive From HDFS



i’m facing an exercise in which i have to create an RDD and then load this data to HIVE.
I don’t know if i did in the correct way.

Create the RDD.
Create the DF which i registered in a temp table: rddTemp
Create the hive_table (the rdd and the hive table have the same structure).
sqlContext.sql(“insert into database.hive_table select * from rddTemp”).

or should have i used LOAD DATA INPATH 'path ’ INTO TABLE hive_table
if yes, where i can find the path of the INPATH?

@itversity am i correct?

I really find Hive quite difficult


Can you give the link for the exercise?


Hi it’s the Problem 13 of the 20 Problems given by itversity


Data Frame have API called write which have APIs such as insertInto, you can use that instead of struggling with Hive.


  • LOAD is used to copy the data as is
  • INSERT is used to insert the data after transformations

Here are the steps if you want to use INSERT statement as part of Spark SQL:

  • Create data frame
  • Register temp table
  • Run insert command as part of sqlContext.sql (you cannot use LOAD in this case)

If you are struggling with Hive, try to use data frame’s write APIs such as insertInto to load data into hive tables.