Sqoop Import - Execution Flow


Let us understand the execution flow of sqoop import.

  1. Get metadata of the table by running simple query.
  2. Build the POJO class with appropriate getters and setters.
  3. Compile the POJO class into jar file
  4. Run boundary vals query or boundary query to get min and max by split column (default is primary key column).
  5. Compute split size max - min.
  6. Divide it with number of mappers and compute splits.
  7. Submit map reduce job with number of mappers equal to 4 by default.
  8. Each map task will run select query on the source table with where condition based on the splits to read the data.
  9. Data will be written to the files in the location specified.