--query command using in sqoop import

Hello,
I have been following itversity CCA-175 youtube videos. I got one doubt on sqoop import with --query argument.
sqoop import --connect “jdbc:mysql://quickstart.cloudera:3306/retail_db” --username retail_dba --password cloudera --query “SELECT order_items.order_item_product_id, orders.order_status FROM orders INNER JOIN order_items ON orders.order_id = order_items.order_item_order_id WHERE $CONDITIONS” --target-dir /user/cloudera/order_join1 --split-by order_id --num-mappers 4

when i tried to executed the above query in mysql and sqoop eval, it went well but when tried in import arguments getting error as below

[cloudera@quickstart ~]$ sqoop import --connect “jdbc:mysql://quickstart.cloudera:3306/retail_db” --username retail_dba --password cloudera --query “SELECT order_items.order_item_product_id, orders.order_status FROM orders INNER JOIN order_items ON orders.order_id = order_items.order_item_order_id WHERE $CONDITIONS” --target-dir /user/cloudera/order_join1 --split-by order_id --num-mappers 4
Warning: /usr/lib/sqoop/…/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/01/15 14:34:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
17/01/15 14:34:59 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/01/15 14:35:00 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/01/15 14:35:00 INFO tool.CodeGenTool: Beginning code generation
17/01/15 14:35:02 INFO manager.SqlManager: Executing SQL statement: SELECT order_items.order_item_product_id, orders.order_status FROM orders INNER JOIN order_items ON orders.order_id = order_items.order_item_order_id WHERE (1 = 0)
17/01/15 14:35:02 INFO manager.SqlManager: Executing SQL statement: SELECT order_items.order_item_product_id, orders.order_status FROM orders INNER JOIN order_items ON orders.order_id = order_items.order_item_order_id WHERE (1 = 0)
17/01/15 14:35:03 INFO manager.SqlManager: Executing SQL statement: SELECT order_items.order_item_product_id, orders.order_status FROM orders INNER JOIN order_items ON orders.order_id = order_items.order_item_order_id WHERE (1 = 0)
17/01/15 14:35:03 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/f6cf89b54d33e5676419b1646a648100/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/01/15 14:35:10 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/f6cf89b54d33e5676419b1646a648100/QueryResult.jar
17/01/15 14:35:10 INFO mapreduce.ImportJobBase: Beginning query import.
17/01/15 14:35:12 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/01/15 14:35:15 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/01/15 14:35:16 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/10.0.2.15:8032
17/01/15 14:35:20 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:862)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeInternal(DFSOutputStream.java:830)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:826)
17/01/15 14:35:23 INFO db.DBInputFormat: Using read commited transaction isolation
17/01/15 14:35:23 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(order_id), MAX(order_id) FROM (SELECT order_items.order_item_product_id, orders.order_status FROM orders INNER JOIN order_items ON orders.order_id = order_items.order_item_order_id WHERE (1 = 1) ) AS t1
17/01/15 14:35:24 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/cloudera/.staging/job_1484512313628_0005
17/01/15 14:35:24 WARN security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column ‘order_id’ in 'field list’
17/01/15 14:35:24 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column ‘order_id’ in 'field list’
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:207)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:305)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:203)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:176)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:273)
at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:748)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:509)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column ‘order_id’ in 'field list’
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
at com.mysql.jdbc.Util.getInstance(Util.java:360)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:978)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2435)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2582)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2526)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1446)
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:178)
… 22 more

Can anyone please help me where i did mistake in command and also why we have to use WHERE $CONDITIONS in the query?

Thank you
Chandana

@chandana204

Whenever SQOOP has split-by “Column_name” , it executes a BoundingValsQuery.

In your case, the given SELECT does not have the split-by column name i.e Order_id so it throws an exception.

If you could include Order_id then it solves the problem :slight_smile:

sqoop import --connect “jdbc:mysql://nn01.itversity.com:3306/retail_db”
–username retail_dba
–password itversity
–query ‘SELECT T2.order_item_product_id, T1.order_status FROM orders T1
INNER JOIN order_items T2 ON T1.order_id = T2.order_item_order_id WHERE $CONDITIONS’
–target-dir /user/email2dgk/review/mysql/orders_n_items
–split-by T1.order_id
–num-mappers 4

Snippet from the Logs:
17/01/15 21:46:36 INFO mapreduce.ImportJobBase: Transferred 3.4094 MB in 19.1158 seconds (182.6334 KB/sec)
17/01/15 21:46:36 INFO mapreduce.ImportJobBase: Retrieved 172198 records.

Thank you for the response. But Still getting error after include orders.order_id at --split-by

I have given the code as below
sqoop import --connect “jdbc:mysql://quickstart.cloudera:3306/retail_db” \
–username retail_dba
–password cloudera
–query “select order_items.order_item_product_id, orders.order_status from orders inner join order_items on orders.order_id=order_items.order_item_order_id where $CONDITIONS”
–target-dir /user/cloudera/order_join1
–split-by orders.order_id
–num-mappers 4

Error is showed up as below

[cloudera@quickstart ~]$ sqoop import --connect “jdbc:mysql://quickstart.cloudera:3306/retail_db” --username retail_dba --password cloudera --query “select order_items.order_item_product_id, orders.order_status from orders inner join order_items on orders.order_id=order_items.order_item_order_id where $CONDITIONS” --target-dir /user/cloudera/order_join1 --split-by orders.order_id --num-mappers 4
Warning: /usr/lib/sqoop/…/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/01/16 07:14:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
17/01/16 07:14:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/01/16 07:14:30 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/01/16 07:14:30 INFO tool.CodeGenTool: Beginning code generation
17/01/16 07:14:33 INFO manager.SqlManager: Executing SQL statement: select order_items.order_item_product_id, orders.order_status from orders inner join order_items on orders.order_id=order_items.order_item_order_id where (1 = 0)
17/01/16 07:14:33 INFO manager.SqlManager: Executing SQL statement: select order_items.order_item_product_id, orders.order_status from orders inner join order_items on orders.order_id=order_items.order_item_order_id where (1 = 0)
17/01/16 07:14:33 INFO manager.SqlManager: Executing SQL statement: select order_items.order_item_product_id, orders.order_status from orders inner join order_items on orders.order_id=order_items.order_item_order_id where (1 = 0)
17/01/16 07:14:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/0c55b0d6db1e4ea8f9b80ba6a01b900a/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/01/16 07:14:41 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/0c55b0d6db1e4ea8f9b80ba6a01b900a/QueryResult.jar
17/01/16 07:14:41 INFO mapreduce.ImportJobBase: Beginning query import.
17/01/16 07:14:42 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/01/16 07:14:45 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/01/16 07:14:45 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/10.0.2.15:8032
17/01/16 07:14:50 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:862)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:600)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:789)
17/01/16 07:14:51 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:862)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:600)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:789)
17/01/16 07:14:51 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:862)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:600)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:789)
17/01/16 07:14:53 INFO db.DBInputFormat: Using read commited transaction isolation
17/01/16 07:14:53 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(t1.order_id), MAX(t1.order_id) FROM (select order_items.order_item_product_id, orders.order_status from orders inner join order_items on orders.order_id=order_items.order_item_order_id where (1 = 1) ) AS t1
17/01/16 07:14:54 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/cloudera/.staging/job_1484579099702_0004
17/01/16 07:14:54 WARN security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column ‘t1.order_id’ in 'field list’
17/01/16 07:14:54 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column ‘t1.order_id’ in 'field list’
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:207)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:305)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:203)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:176)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:273)
at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:748)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:509)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column ‘t1.order_id’ in 'field list’
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
at com.mysql.jdbc.Util.getInstance(Util.java:360)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:978)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2435)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2582)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2526)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1446)
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:178)
… 22 more

@chandana204 I mentioned to use in the Select .

Thank you i got the point.