Hive incremental load


#1

How to achieve incremental load on hive table? Please do mention the steps how it can be done?

Note: Already some old data are available in Hive table. Now we have to load some updated old data and some new data to hive table.


#2

Hi @siddharthmahakur,

If you have created your target table as a transaction table, then you can simply update your table using the UPDATE…SET query.

If your target table is a non-transactional table, you can update it via the following workaround.

  • Create another intermediate table and populate it with the rows from the target table except for those you want to update.

  • Now insert rows manually which will have your updated data to the intermediate table.

  • Now overwrite the target table with the rows from the intermediate table.

For inserting new data, you can do the following.

  • Use INSERT INTO query if you want insert rows from another table.

  • Use LOAD DATA…INTO query if you want to insert rows from a file.


#3

But in Production environment we cannot update records one by one record manually right ?
It is not a feasible approach to apply on production environment.


#4

@siddharthmahakur
Check this out.

https://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/