Sqoop Question 5 Arun's Blog



Using sqoop, import products_replica table from MYSQL into hdfs such that fields are separated by a ‘|’ and lines are separated by ‘\n’. Null values are represented as -1 for numbers and “NOT-AVAILABLE” for strings. Only records with product id greater than or equal to 1 and less than or equal to 1000 should be imported and use 3 mappers for importing. The destination file should be stored as a text file to directory /user/cloudera/problem5/products-text.

Step 1:
sqoop import
–connect “jdbc:mysql://quickstart.cloudera:3306/retail_db”
–username retail_dba
–password cloudera
–table products_replica
–target-dir /user/cloudera/problem5/products-text
–fields-terminated-by ‘|’
–lines-terminated-by ‘\n’
–null-non-string -1
–null-string “NOT-AVAILABLE”
-m 3
–where “product_id between 1 and 1000”
–outdir /home/cloudera/sqoop1
–boundary-query “select min(product_id), max(product_id) from products_replica where product_id between 1 and 1000”;

This is the solution from aruns blog. Why has he used boundary query when table and where clause are present already. What’s the outdir here mentioned?