Problem 5 - Some advanced Sqoop options


#1

Originally published at: http://www.itversity.com/lessons/problem-5-some-advanced-sqoop-options/

To cover most of the important options of Sqoop, we have divided into 2 pieces Sqoop is one of the important topics for the exam. Based on generally reported exam pattern from anonymous internet bloggers, you can expect 2 out of 10 questions on this topic related to Data Ingest and Data Export using Sqoop.…


#2

hi @itversity

ther’s some issue on importing the table from Mysql to Hive. When i run the select * into Hive its rusults is 0. Help me please!

here the code:

sqoop import
–connect jdbc:mysql://quickstart.cloudera:3306/retail_db
–password=cloudera
–username=retail_dba
–table products_replica
–target-dir /user/cloudera/problem5/products-text
–fields-terminated-by ‘|’
–lines-terminated-by ‘\n’
–null-non-string -1
–null-string ‘NOT-AVAILABLE’
–where “product_id >= 1 and product_id <= 1000”
–num-mappers 3
–outdir /home/cloudera/sqoop1

hdfs dfs -rm -R /user/cloudera/problem5/products-text
hdfs dfs -ls /user/cloudera/problem5/products-text

hdfs dfs -rm -R /user/cloudera/problem5/products-text-part1

sqoop import
–connect jdbc:mysql://quickstart.cloudera:3306/retail_db
–password=cloudera
–username=retail_dba
–table products_replica
–target-dir /user/cloudera/problem5/products-text-part1
–fields-terminated-by ‘*’
–lines-terminated-by ‘\n’
–null-non-string -1000
–null-string ‘NA’
–where “product_id <= 1111”
–as-textfile
-m 2
–outdir /home/cloudera/sqoop2

hdfs dfs -ls /user/cloudera/problem5/products-text-part1

hdfs dfs -rm -R /user/cloudera/problem5/products-text-part2

sqoop import
–connect jdbc:mysql://quickstart.cloudera:3306/retail_db
–password=cloudera
–username=retail_dba
–table products_replica
–target-dir /user/cloudera/problem5/products-text-part2
–fields-terminated-by ‘*’
–lines-terminated-by ‘\n’
–null-non-string -1000
–null-string ‘NA’
–where “product_id > 1111”
–as-textfile
-m 5
–outdir /home/cloudera/sqoop3

hdfs dfs -ls /user/cloudera/problem5/products-text-part2


SQOOP MERGE

sqoop merge
–class-name products_replica
–jar-file /tmp/sqoop-cloudera/compile/1f7e8605ae8ebf2632880078ffe4c03f/products_replica.jar
–new-data /user/cloudera/problem5/products-text-part2
–onto /user/cloudera/problem5/products-text-part1
–target-dir /user/cloudera/problem5/products-text-both-parts
–merge-key product_id

hdfs dfs -ls /user/cloudera/problem5/products-text-both-parts


SQOOP JOB-IMPORT

sqoop job --create import_prod_replica
– import
–connect “jdbc:mysql://quickstart.cloudera:3306/retail_db”
–password “cloudera”
–username “retail_dba”
–table products_replica
–target-dir /user/cloudera/problem5/products-incremental
–incremental append
–check-column product_id
–last-value 0

sqoop job --exec import_prod_replica

hdfs dfs -ls /user/cloudera/problem5/products-incremental

mysql -u retail_dba -p
use retail_db;
insert into products_replica values (1346,2,‘something 1’,‘something 2’,300.00,‘not avaialble’,3,‘STRONG’);
insert into products_replica values (1347,5,‘something 787’,‘something 2’,356.00,‘not avaialble’,3,‘STRONG’);

sqoop job --exec import_prod_replica

mysql
insert into products_replica values (1376,4,‘something 1376’,‘something 2’,1.00,‘not avaialble’,3,‘WEAK’);
insert into products_replica values (1365,4,‘something 1376’,‘something 2’,10.00,‘not avaialble’,null,‘NOT APPLICABLE’);

sqoop job --exec import_prod_replica

hdfs dfs -tail /user/cloudera/problem5/products-incremental/part-m-00004


HIVE

hive
create database problem5;
create table products_hive (product_id int, product_category_id int, product_name string, product_description string, product_price float, product_imaage string,product_grade int, product_sentiment string);

sqoop job
–create hive_sqoop_job
– import
–connect “jdbc:mysql://quickstart.cloudera:3306/retail_db”
–username “retail_dba”
–password “cloudera”
–table products_replica
–check-column product_id
–incremental append
–last-value 0
–hive-import
–hive-table products_hive
–hive-database problem5 \

Here are the table description in hive and mysql:


#3

Hi,
Did you miss execute statement?
sqoop job --exec hive_sqoop_job

Or even after executing the job, you are not getting any data in products_hive table?


#4

Hi @Apurva_Shrivastava,
i didn’t specified the sqoop job --exec command in the question but definetly i run this command to execute the hive import by sqoop


#5

Hi, when I tried to create sqoop job for Hive Incremental Import, I got below message:

Append mode for hive imports is not yet supported. Please remove the parameter --append-mode

Maybe this is because of differences in Hive version, maybe.

However, I found this workaround to achieve the Hive incremental import:
https://community.hortonworks.com/questions/11373/sqoop-incremental-import-in-hive-i-get-error-messa.html