Why do we use WHERE $CONDITIONS in Sqoop -EXPLAINED an ANSWERED

This is an FYI…

We have often seen WHERE $CONDITIONS in Sqoop and i personally haven’t known why this was used for until i recently found out what its use is.

Explanation: Sqoop performs highly efficient data transfers by inheriting Hadoop’s parallelism. To help Sqoop split your query into multiple chunks that can be transferred in parallel, you need to include the $CONDITIONS placeholder in the where clause of your query. Sqoop will automatically substitute this placeholder with the generated conditions specifying which slice of data should be transferred by each individual task. While you could skip $CONDITIONS by forcing Sqoop to run only one job using the --num-mappers 1 parameter, such a limitation would have a severe performance impact.

Even when we use single mapper, $CONDITIONS is mandatory with --query parameter. $CONDITIONS is inferred as 0=1 by SQOOP. This is just to get the query definition (without data) and no where related to parallelism.
Example -
sqoop import -
-connect jdbc:mysql://localhost:3306/increment
–username root --password pass
-m 1
–query ‘select * from customer_mapper_time where custid = 1 and $CONDITIONS’
–target-dir /user/cloudera/conditions;

This query will be run by SQOOP as below to get the definition for the query (without data records)
select * from customer_mapper_time where custid = 1 and (1 = 0)