Sqoop Export - .csv file

Hi All,

I’m trying to export a .csv file present in HDFS location to a mysql table.

step 1: put a file from local to HDFS
step 2: create a table on mysql
step 3: export data from HDFS to mysql table.

export --connect “jdbc:mysql://quickstart.cloudera:3306/movielens” -m 1 --username root --password cloudera --table movies --export-dir /home/sqoopex/movielens/data/movies --input-fields-terminated-by “,” --input-lines-terminated-by ‘\n’

Error:
17/01/20 02:43:04 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
17/01/20 02:43:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484889594587_0023
17/01/20 02:43:04 INFO impl.YarnClientImpl: Submitted application application_1484889594587_0023
17/01/20 02:43:04 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1484889594587_0023/
17/01/20 02:43:04 INFO mapreduce.Job: Running job: job_1484889594587_0023
17/01/20 02:43:16 INFO mapreduce.Job: Job job_1484889594587_0023 running in uber mode : false
17/01/20 02:43:16 INFO mapreduce.Job: map 0% reduce 0%
17/01/20 02:43:26 INFO mapreduce.Job: map 100% reduce 0%
17/01/20 02:43:26 INFO mapreduce.Job: Job job_1484889594587_0023 failed with state FAILED due to: Task failed task_1484889594587_0023_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

17/01/20 02:43:26 INFO mapreduce.Job: Counters: 8
Job Counters
Failed map tasks=1
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=8024
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=8024
Total vcore-seconds taken by all map tasks=8024
Total megabyte-seconds taken by all map tasks=8216576
17/01/20 02:43:26 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
17/01/20 02:43:26 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 25.1481 seconds (0 bytes/sec)
17/01/20 02:43:26 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/01/20 02:43:26 INFO mapreduce.ExportJobBase: Exported 0 records.
17/01/20 02:43:26 ERROR tool.ExportTool: Error during export: Export job failed!

I get error while doing this. Below is the command used and the error file.

@sowmya_nagaraj apparently no syntax error. But just verify once again the given "export-dir " contents are in .csv format. If not export would will.

Yep, the data is in proper format

@sowmya_nagaraj Ok fine, then please send the CREATE TABLE DDL of movies table and sample records from your export-dir.

Have you verified task level logs at this link - http://quickstart.cloudera:8088/proxy/application_1484889594587_0023/

This is good exercise to understand how to troubleshoot by going to logs.
Click on the URL and you will see failed tasks. Click on those and go to logs and get into the details.

Please confirm whether you are able to figure out the issue.

Thank you for the suggestion. The error is because of the header line present in the file present in the HDFS location. How do I ignore the header line while importing data into MYSQL table?

2 Likes

First, use the following command to remove the header line from the file

sed -i 1d .csv

Hi Sowmya
Did you get the solution to skip the header record from file during sqoop export?? I’m also looking for it.