A simple sqoop export failed, error unknown, couldn't find the log


#1

I have the following data on HDFS:

[paslechoix@gw03 ~]$ hdfs dfs -cat departments_new/*
10,physicss,2018-01-31 22:24:18.0
11,chemistry,2018-01-31 22:24:18.0
12,math,2018-01-31 22:24:18.0
13,science,2018-01-31 22:24:18.0
14,engineering,2018-01-31 22:24:18.0
10,physicss,2018-01-31 22:24:18.0
11,chemistry,2018-01-31 22:24:18.0
12,math,2018-01-31 22:24:18.0
13,science,2018-01-31 22:24:18.0
14,engineering,2018-01-31 22:24:18.0
110,Civil,2018-01-31 22:32:41.0
111,Mechanical,2018-01-31 22:32:49.0
112,Automobile,2018-01-31 22:32:49.0
113,Pharma,2018-01-31 22:32:49.0
114,social engineering,2018-01-31 22:32:52.0
10,physicss,2018-01-31 22:24:18.0
11,chemistry,2018-01-31 22:24:18.0
12,math,2018-01-31 22:24:18.0
13,science,2018-01-31 22:24:18.0
14,engineering,2018-01-31 22:24:18.0
110,Civil,2018-01-31 22:32:41.0
111,Mechanical,2018-01-31 22:32:49.0
112,Automobile,2018-01-31 22:32:49.0
113,Pharma,2018-01-31 22:32:49.0
114,social engineering,2018-01-31 22:32:52.0
[paslechoix@gw03 ~]$

I created a new table in retail_export on mysql:

mysql> create table dep_export (dep_id int, dep_name varchar(20), createdon date);
Query OK, 0 rows affected (0.00 sec)

mysql> desc dep_export;
±----------±------------±-----±----±--------±------+
| Field | Type | Null | Key | Default | Extra |
±----------±------------±-----±----±--------±------+
| dep_id | int(11) | YES | | NULL | |
| dep_name | varchar(20) | YES | | NULL | |
| createdon | date | YES | | NULL | |
±----------±------------±-----±----±--------±------+
3 rows in set (0.00 sec)

Now I want to export the data from hdfs to mysql using sqoop export, below is the script:

sqoop export
–connect jdbc:mysql://ms.itversity.com/retail_export
–username=retail_user
–password=itversity
–table dep_export
–export-dir departments_new

The sqoop export started but failed, error is as below:
[paslechoix@gw03 ~]$ sqoop export \

–connect jdbc:mysql://ms.itversity.com/retail_export
–username=retail_user
–password=itversity
–table dep_export
–export-dir departments_new
Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/02/23 23:54:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.0.0-1245
18/02/23 23:54:38 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/02/23 23:54:39 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/02/23 23:54:39 INFO tool.CodeGenTool: Beginning code generation
18/02/23 23:54:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM dep_export AS t LIMIT 1
18/02/23 23:54:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM dep_export AS t LIMIT 1
18/02/23 23:54:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.0.0-1245/hadoop-mapreduce
Note: /tmp/sqoop-paslechoix/compile/054e70232380bec119d1681814932e4d/dep_export.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/02/23 23:54:41 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-paslechoix/compile/054e70232380bec119d1681814932e4d/dep_export.jar
18/02/23 23:54:41 INFO mapreduce.ExportJobBase: Beginning export of dep_export
18/02/23 23:54:43 INFO impl.TimelineClientImpl: Timeline service address: http://rm01.itversity.com:8188/ws/v1/timeline/
18/02/23 23:54:43 INFO client.RMProxy: Connecting to ResourceManager at rm01.itversity.com/172.16.1.106:8050
18/02/23 23:54:43 INFO client.AHSProxy: Connecting to Application History server at rm01.itversity.com/172.16.1.106:10200
18/02/23 23:55:12 INFO input.FileInputFormat: Total input paths to process : 3
18/02/23 23:55:12 INFO input.FileInputFormat: Total input paths to process : 3
18/02/23 23:55:12 INFO mapreduce.JobSubmitter: number of splits:3
18/02/23 23:55:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1517228278761_14994
18/02/23 23:55:13 INFO impl.YarnClientImpl: Submitted application application_1517228278761_14994
18/02/23 23:55:13 INFO mapreduce.Job: The url to track the job: http://rm01.itversity.com:8088/proxy/application_1517228278761_14994/
18/02/23 23:55:13 INFO mapreduce.Job: Running job: job_1517228278761_14994
18/02/23 23:55:22 INFO mapreduce.Job: Job job_1517228278761_14994 running in uber mode : false
18/02/23 23:55:22 INFO mapreduce.Job: map 0% reduce 0%
18/02/23 23:55:30 INFO mapreduce.Job: map 100% reduce 0%
18/02/23 23:55:31 INFO mapreduce.Job: Job job_1517228278761_14994 failed with state FAILED due to: Task failed task_1517228278761_14994_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0

18/02/23 23:55:31 INFO mapreduce.Job: Counters: 13
Job Counters
Failed map tasks=1
Killed map tasks=2
Launched map tasks=3
Other local map tasks=1
Rack-local map tasks=2
Total time spent by all maps in occupied slots (ms)=28772
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=14386
Total vcore-milliseconds taken by all map tasks=14386
Total megabyte-milliseconds taken by all map tasks=29462528
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
18/02/23 23:55:31 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
18/02/23 23:55:31 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 48.88 seconds (0 bytes/sec)
18/02/23 23:55:31 INFO mapreduce.ExportJobBase: Exported 0 records.
18/02/23 23:55:31 ERROR mapreduce.ExportJobBase: Export job failed!
18/02/23 23:55:31 ERROR tool.ExportTool: Error during export: Export job failed!

Can anyone point out where is the problem?

I also tried to check the log but I don’t see any log files under /var/log/sqoop, where is the log location? can we see the logs in the lab?

Thanks.


#2

@paslechoix

port number is missing.
Try below command.

sqoop export --connect "jdbc:mysql://ms.itversity.com:3306/retail_export --username retail_user --password itversity --table dep_export --export-dir departments_new --input-lines-terminated-by ‘\n’ --input-fields-terminated-by ‘|’


#3

Thank you for your reply, but the command doesn’t work. And by the way, I’ve noticed that port number is not needed.
Here is the screenshot and error message:
18/02/24 06:40:47 INFO mapreduce.JobSubmitter: number of splits:3
18/02/24 06:40:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1517228278761_15132
18/02/24 06:40:48 INFO impl.YarnClientImpl: Submitted application application_1517228278761_15132
18/02/24 06:40:48 INFO mapreduce.Job: The url to track the job: http://rm01.itversity.com:8088/proxy/application_1517228278761_15132/
18/02/24 06:40:48 INFO mapreduce.Job: Running job: job_1517228278761_15132
18/02/24 06:40:57 INFO mapreduce.Job: Job job_1517228278761_15132 running in uber mode : false
18/02/24 06:40:57 INFO mapreduce.Job: map 0% reduce 0%
18/02/24 06:41:05 INFO mapreduce.Job: map 100% reduce 0%
18/02/24 06:41:07 INFO mapreduce.Job: Job job_1517228278761_15132 failed with state FAILED due to: Task failed task_1517228278761_15132_m_000002
Job failed as tasks failed. failedMaps:1 failedReduces:0

18/02/24 06:41:07 INFO mapreduce.Job: Counters: 9
Job Counters
Failed map tasks=3
Launched map tasks=3
Other local map tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=38124
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=19062
Total vcore-milliseconds taken by all map tasks=19062
Total megabyte-milliseconds taken by all map tasks=39038976
18/02/24 06:41:07 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
18/02/24 06:41:07 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 47.4853 seconds (0 bytes/sec)
18/02/24 06:41:07 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
18/02/24 06:41:07 INFO mapreduce.ExportJobBase: Exported 0 records.
18/02/24 06:41:07 ERROR mapreduce.ExportJobBase: Export job failed!
18/02/24 06:41:07 ERROR tool.ExportTool: Error during export: Export job failed!

Thank you.


#4

I digged out more details on the error from the failed map job but still don’t understand what happened in the back and how to fix it:

Error: java.io.IOException: Can’t export data, please check failed map task logs at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:122) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.RuntimeException: Can’t parse input data: ‘10,physicss,2018-01-31 22:24:18.0’ at dep_export.__loadFromFields(dep_export.java:316) at dep_export.parse(dep_export.java:254) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:89) … 10 more Caused by: java.lang.NumberFormatException: For input string: “10,physicss,2018-01-31 22:24:18.0” at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.valueOf(Integer.java:766) at dep_export.__loadFromFields(dep_export.java:303) … 12 more