Sqoop Export - Using update-mode - allow-insert

Let us understand how to insert new records and update existing records in the target table in relational database using sqoop export.

  • If we only specify --update-key, the export will run update-only mode.
  • To ensure that data is inserted if the key does not exist we have to define --update-mode with allowinsert.
  • When we perform sqoop export with --update-mode allowinsert into tables in MySQL database, it will build INSERT ON DUPLICATE KEY UPDATE command.
  • It will ensure that new records are inserted and existing records are updated based on the primary key.

Let us export the data into table using update-mode allowinsert where there is no primary key and check the behavior.

sqoop export \
  --connect jdbc:mysql://ms.itversity.com:3306/retail_export \
  --username retail_user \
  --password itversity \
  --export-dir /apps/hive/warehouse/training_sqoop_retail.db/daily_revenue \
  --table training_daily_revenue \
  --update-key order_date \
  --update-mode allowinsert

Let us validate against the table where there is no primary key defined.

USE retail_export;
SELECT count(1) FROM training_daily_revenue;
SELECT * FROM training_daily_revenue
  WHERE order_date LIKE '2013-07-25%';

Let us see the steps to perform sqoop export to merge or upsert into the target table.

  • Connect to MySQL
  • Switch to the database retail_export
  • Truncate table training_daily_revenue
  • Alter table and add primary key to the table
  • Export data into the table, training_daily_revenue
  • Delete data beyond 2013-12-31 from training_daily_revenue, so that we can upsert into the target table.

Here is the SQL Script to prepare the table for the export in upsert mode.

USE retail_export;
TRUNCATE TABLE training_daily_revenue;
ALTER TABLE training_daily_revenue ADD PRIMARY KEY (order_date);

Here is the first sqoop export command to export data once again into training_daily_revenue.

sqoop export \
  --connect jdbc:mysql://ms.itversity.com:3306/retail_export \
  --username retail_user \
  --password itversity \
  --export-dir /apps/hive/warehouse/training_sqoop_retail.db/daily_revenue \
  --table training_daily_revenue

Let us delete the data beyond 2013-12-31.

USE retail_export;
DELETE FROM training_daily_revenue 
  WHERE order_date > '2013-12-31 00:00:00.0';

Here is the second sqoop export command to export the data in upsert mode.

sqoop export \
  --connect jdbc:mysql://ms.itversity.com:3306/retail_export \
  --username retail_user \
  --password itversity \
  --export-dir /apps/hive/warehouse/training_sqoop_retail.db/daily_revenue \
  --table training_daily_revenue \
  --update-key order_date \
  --update-mode allowinsert

Now data beyond 2013-12-31 will be inserted and upto 2013-12-31 will be updated.

Hi Team,

The source db has billion records using sqoop I have exported half billion records in between the process job got failed I would like to export only remaining data without any duplication. and I dont have any incremental.