Why do we use WHERE $CONDITIONS in Sqoop -EXPLAINED an ANSWERED


#1

This is an FYI…

We have often seen WHERE $CONDITIONS in Sqoop and i personally haven’t known why this was used for until i recently found out what its use is.

Explanation: Sqoop performs highly efficient data transfers by inheriting Hadoop’s parallelism. To help Sqoop split your query into multiple chunks that can be transferred in parallel, you need to include the $CONDITIONS placeholder in the where clause of your query. Sqoop will automatically substitute this placeholder with the generated conditions specifying which slice of data should be transferred by each individual task. While you could skip $CONDITIONS by forcing Sqoop to run only one job using the --num-mappers 1 parameter, such a limitation would have a severe performance impact.