Doubt in executing boundary-query in sqoop import

[rkathiravan@gw01 ~]$ sqoop-import --connect “jdbc:mysql://”\

–username retail_dba
–password itversity
–table categories
–target-dir “/user/rkathiravan/rkathiravan_sqoop_import/categories”
-m 8
–outdir javafiles
–fields-terminated-by ‘|’
–lines-terminated-by ‘\n’
–boundary-query "select 1,45 from categories "
–columns category_id,category_department_id,category_name
Warning: /usr/hdp/ does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/03/01 16:50:32 INFO sqoop.Sqoop: Running Sqoop version:
17/03/01 16:50:32 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/03/01 16:50:32 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/03/01 16:50:32 INFO tool.CodeGenTool: Beginning code generation
17/03/01 16:50:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM categories AS t LIMIT 1
17/03/01 16:50:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM categories AS t LIMIT 1
17/03/01 16:50:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/
Note: /tmp/sqoop-rkathiravan/compile/398b447c4815dced3930f4c310d6bd3c/ uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/03/01 16:50:34 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-rkathiravan/compile/398b447c4815dced3930f4c310d6bd3c/categories.jar
17/03/01 16:50:34 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/03/01 16:50:34 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/03/01 16:50:34 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/03/01 16:50:34 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/03/01 16:50:34 INFO mapreduce.ImportJobBase: Beginning import of categories
17/03/01 16:50:34 WARN db.DataDrivenDBInputFormat: Could not find $CONDITIONS token in query: select 1,45 from categories ; splits may not partition data.
17/03/01 16:50:36 INFO impl.TimelineClientImpl: Timeline service address:
17/03/01 16:50:36 INFO client.RMProxy: Connecting to ResourceManager at
17/03/01 16:50:36 INFO client.AHSProxy: Connecting to Application History server at
17/03/01 16:50:41 INFO db.DBInputFormat: Using read commited transaction isolation
17/03/01 16:50:41 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: select 1,45 from categories
17/03/01 16:50:41 INFO db.IntegerSplitter: Split size: 5; Num splits: 8 from: 1 to: 45
17/03/01 16:50:41 INFO mapreduce.JobSubmitter: number of splits:8
17/03/01 16:50:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1485099329996_7773
17/03/01 16:50:42 INFO impl.YarnClientImpl: Submitted application application_1485099329996_7773
17/03/01 16:50:42 INFO mapreduce.Job: The url to track the job:
17/03/01 16:50:42 INFO mapreduce.Job: Running job: job_1485099329996_7773
17/03/01 16:50:48 INFO mapreduce.Job: Job job_1485099329996_7773 running in uber mode : false
17/03/01 16:50:48 INFO mapreduce.Job: map 0% reduce 0%
17/03/01 16:50:53 INFO mapreduce.Job: map 13% reduce 0%
17/03/01 16:50:54 INFO mapreduce.Job: map 75% reduce 0%
17/03/01 16:50:55 INFO mapreduce.Job: map 100% reduce 0%
17/03/01 16:50:56 INFO mapreduce.Job: Job job_1485099329996_7773 completed successfully
17/03/01 16:50:56 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=1282176
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=942
HDFS: Number of bytes written=837
HDFS: Number of read operations=32
HDFS: Number of large read operations=0
HDFS: Number of write operations=16
Job Counters
Launched map tasks=8
Other local map tasks=8
Total time spent by all maps in occupied slots (ms)=54496
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=27248
Total vcore-milliseconds taken by all map tasks=27248
Total megabyte-milliseconds taken by all map tasks=41852928
Map-Reduce Framework
Map input records=45
Map output records=45
Input split bytes=942
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
Merged Map outputs=0
GC time elapsed (ms)=626
CPU time spent (ms)=9670
Physical memory (bytes) snapshot=1976111104
Virtual memory (bytes) snapshot=26113978368
Total committed heap usage (bytes)=1641021440
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=837
17/03/01 16:50:56 INFO mapreduce.ImportJobBase: Transferred 837 bytes in 21.2073 seconds (39.4675 bytes/sec)
17/03/01 16:50:56 INFO mapreduce.ImportJobBase: Retrieved 45 records.
17/03/01 16:50:56 INFO util.AppendUtils: Appending to directory categories

what does here split may not partition data mean??

I am facing the same issue. What I found is that , when I saw the output by -cat command I found that there was one row absent on which boundary query condition was there which is totally weird behavior.
Some one please help.

@Revathi_K Were you able to resolve the issue or not. If yes , Kindly help me.
Thank you.

boundary query can be used oly for primary and unique keys. I am not sure of this split may not partition data.


I am using boundary query on primary key only and still facing issue of one record getting missed in the output files which is in boundary query and splits not happening properly.

If anyone knows the root cause. Kindly post it.
Thank you.

@mharkhani - Could you please post your sqoop import command which you are trying ? Are you trying in ?