New records Update in Hive Table

#1

Thanks in Advance …

0 Likes

#2

Hi Mohitkumar, Typically source database should have update date/update time stamp to see what are the records are recently changed. During the SQOOP import to Hive, last fetched date/time stamp should be stored in log table, this should be used next time when you import so that you are taking recent changes only. Refer https://www.youtube.com/watch?v=ntSK_oJtWlQ for incremental load for reference.

0 Likes

#3

Hi gnanaprakasam,

First of all, thanks for your efforts.
I will just go through it and let you know.

0 Likes

#4

@mohitkumar, @gnanaprakasam

This question is more related to hive than bigdata-labs, hence I have changed the category to Apache Hive. Apache Hive is sub category of big data.

We can change the category by clicking on edit (pencil icon) beside topic title.

0 Likes

#5

:+1:

Exactly …

Is there any other way by using HBase Hive, etc…

0 Likes

#6

While importing with sqoop we have --incremental last-modified. so it will import once any of the row got updated and in the same way if your base table is updated by rows for the we have --incremental append.
Hope you got this.

0 Likes

#7

Dear All,

Please go through the scenario completely.
I am not asking to insert or import the new row in hive table.

I am asking to only import the newly updated records, not newly inserted record !!

I think if we see the above two table minutely, we will get the complete idea of it.
Make sure that both the above tables have same rows count.

0 Likes

#8

@mohitkumar we cannot update data directly in Hive. We need to have periodic merge strategy.

If the table is small

  • Import the whole table on regular basis with overwrite option in sqoop

If the table is big, you have to develop merge strategy

  • Get the updated or newly inserted data into stage table (using incremental load or where condition on timestamp column)
  • Perform full outer join between staged data with original data on primary column
  • Replace original data with data which you got with full outer join

Another approach is to use HBase and create external table in hive.

2 Likes