Sqoop import-all-tables to hive

Hi,
I am executing the below sqoop command to import-all-tables in Hive
sqoop import-all-tables
–num-mappers 1
–connect “jdbc:mysql://quickstart.cloudera:3306/retail_db”
–username=retail_dba
–password=cloudera
–hive-import
–hive-overwrite
–create-hive-table
–compress
–compression-codec org.apache.hadoop.io.compress.SanppyCodec
–outdir java_files

this will start executing and stuck at below point-

@gauravfrankly - You have typo error in SnappyCodec, see if that resolves.
–compression-codec org.apache.hadoop.io.compress.SnappyCodec \

1 Like

Gaurav,

Were you able to resolve your issue.If so, could you please let me know the steps you followed to identify and fix the root cause.

All,

I am facing the same issue. Sqoop import starts and creates the files @ /user/cloudera and then gets stuck after the Logging initialized stage. I dont see any tables created under the default database in hive. Is there any log we can check to identify the issue?


[cloudera@quickstart ~]$ sqoop import --num-mappers 1 --connect “jdbc:mysql://localhost:3306/retail_db” --username retail_dba --password cloudera --table departments --hive-import --hive-overwrite -create-hive-table --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --outdir java_files
Warning: /usr/lib/sqoop/…/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/01/11 16:25:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
17/01/11 16:25:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/01/11 16:25:56 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
17/01/11 16:25:56 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
17/01/11 16:25:57 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/01/11 16:25:57 INFO tool.CodeGenTool: Beginning code generation
17/01/11 16:25:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM departments AS t LIMIT 1
17/01/11 16:25:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM departments AS t LIMIT 1
17/01/11 16:25:59 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/aa244f523f14fac6a93d4851b01e4a8c/departments.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/01/11 16:26:09 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/aa244f523f14fac6a93d4851b01e4a8c/departments.jar
17/01/11 16:26:09 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/01/11 16:26:09 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/01/11 16:26:09 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/01/11 16:26:09 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/01/11 16:26:09 INFO mapreduce.ImportJobBase: Beginning import of departments
17/01/11 16:26:11 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/01/11 16:26:15 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/01/11 16:26:16 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/192.168.56.102:8032
17/01/11 16:26:24 INFO db.DBInputFormat: Using read commited transaction isolation
17/01/11 16:26:24 INFO mapreduce.JobSubmitter: number of splits:1
17/01/11 16:26:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484180326003_0002
17/01/11 16:26:26 INFO impl.YarnClientImpl: Submitted application application_1484180326003_0002
17/01/11 16:26:27 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1484180326003_0002/
17/01/11 16:26:27 INFO mapreduce.Job: Running job: job_1484180326003_0002
17/01/11 16:26:54 INFO mapreduce.Job: Job job_1484180326003_0002 running in uber mode : false
17/01/11 16:26:54 INFO mapreduce.Job: map 0% reduce 0%
17/01/11 16:27:14 INFO mapreduce.Job: map 100% reduce 0%
17/01/11 16:27:14 INFO mapreduce.Job: Job job_1484180326003_0002 completed successfully
17/01/11 16:27:15 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=139592
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=98
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=2193664
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=17138
Total vcore-seconds taken by all map tasks=17138
Total megabyte-seconds taken by all map tasks=2193664
Map-Reduce Framework
Map input records=8
Map output records=8
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=88
CPU time spent (ms)=1630
Physical memory (bytes) snapshot=132718592
Virtual memory (bytes) snapshot=725782528
Total committed heap usage (bytes)=48234496
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=98
17/01/11 16:27:15 INFO mapreduce.ImportJobBase: Transferred 98 bytes in 59.07 seconds (1.659 bytes/sec)
17/01/11 16:27:15 INFO mapreduce.ImportJobBase: Retrieved 8 records.
17/01/11 16:27:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM departments AS t LIMIT 1
17/01/11 16:27:15 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/usr/jars/hive-common-1.1.0-cdh5.7.0.jar!/hive-log4j.properties


Hi guys,

Try the below steps:

  1. Shutdown the VM and start it again
  2. Open cloudera manager and try restarting the services (HDFS, Impala, Hive, Sqoop and YARN)
  3. Try the sqoop import again

I guess this should work.

Thanks.

Thanks Santy.

HDFS, Impala, Hive, Sqoop and YARN are up. VM was also restarted. Still facing the same issue. Are there any logs we can look up to identify the exact issue? Thanks.

It didn’t worked for me in VM .
@itversity
I am using now in bigdata-lab below command
sqoop import-all-tables
-m 1
–connect “jdbc:mysql://nn01.itversity.com:3306/retail_db”
–username=retail_dba
–password=itversity
–hive-import
–hive-overwrite
–hive-database grv_sqoop_import
–create-hive-table
–compress
–compression-codec org.apache.hadoop.io.compress.SnappyCodec
–outdir java_files

getting below error:
Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/01/13 04:59:01 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.0.0-1245
17/01/13 04:59:01 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/01/13 04:59:01 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
17/01/13 04:59:01 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
17/01/13 04:59:01 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/01/13 04:59:02 INFO tool.CodeGenTool: Beginning code generation
17/01/13 04:59:02 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM categories AS t LIMIT 1
17/01/13 04:59:02 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM categories AS t LIMIT 1
17/01/13 04:59:02 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.0.0-1245/hadoop-mapreduce
Note: /tmp/sqoop-gauravfrankly/compile/6fb180103c8c0204ab6d257e82ff5c2f/categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/01/13 04:59:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-gauravfrankly/compile/6fb180103c8c0204ab6d257e82ff5c2f/categories.jar
17/01/13 04:59:03 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/01/13 04:59:03 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/01/13 04:59:03 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/01/13 04:59:03 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/01/13 04:59:03 INFO mapreduce.ImportJobBase: Beginning import of categories
17/01/13 04:59:05 INFO impl.TimelineClientImpl: Timeline service address: http://rm01.itversity.com:8188/ws/v1/timeline/
17/01/13 04:59:05 INFO client.RMProxy: Connecting to ResourceManager at rm01.itversity.com/172.16.1.106:8050
17/01/13 04:59:05 INFO client.AHSProxy: Connecting to Application History server at rm01.itversity.com/172.16.1.106:10200
17/01/13 04:59:05 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Ou
tput directory hdfs://nn01.itversity.com:8020/user/gauravfrankly/categories already exists

why it is looking into /user/gauravfrankly location, I’m passing my Hive data base.
Please help me in correcting it.

1 Like

add --hive-home /apps/hive/warehouse and try once

@Avinash_Parida I have tried below and it’s also not working
sqoop import-all-tables -m 1 --connect “jdbc:mysql://nn01.itversity.com:3306/retail_db” --username=retail_dba --password=itve
rsity –hive-home /apps/hive/warehouse/grv_sqoop_import.db --hive-import --hive-database grv_sqoop_import --hive-overwrite --create-hive-table --comp
ress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --outdir java_files

try with the below script,

sqoop import-all-tables
-m 1
–connect “jdbc:mysql://nn01.itversity.com:3306/retail_db”
–username=retail_dba
–password=itversity
–hive-import
–hive-home /apps/hive/warehouse
–hive-overwrite
–hive-database grv_sqoop_import
–create-hive-table
–compress
–compression-codec org.apache.hadoop.io.compress.SnappyCodec
–outdir java_files

I just tried the script you given, facing the same issue:
sqoop import-all-tables
-m 1
–connect “jdbc:mysql://nn01.itversity.com:3306/retail_db”
–username=retail_dba
–password=itversity
–hive-import
–hive-home /apps/hive/warehouse **
–hive-overwrite
–hive-database grv_sqoop_import
–create-hive-table
–compress
–compression-codec org.apache.hadoop.io.compress.SnappyCodec
–outdir java_files

I’ve found below solution,

but wanted to understand is there no better way to handle it.

either you need to specify --warehouse-dir or --target-dir which points to a location other than your default hdfs location

or

you need to delete/remove the files from default hdfs location before you perform a hive import.

I’ve got it done by removing /user/gauravfrankly/categories.

1 Like

@itversity

Please help. I am not able to import any table into hive using sqoop. The tables from mysql are getting loaded into temp space in hadoop. However once process gets to “loading uploaded data into hive starts” step, the process just gets stuck. Is there any log I can look at to check why it is stuck?

Based on some comments in stack overflow, I have ensured that the service hive-metastore is running as well with out any luck. I have tried removing the compression as well just to rule that out. Also, confirmed that the files got created under /user/cloudera.

Appreciate any feedback on this.Thanks.

Apart from HDFS, Impala, Hive, Sqoop and YARN, I also turned Zoo-keeper on as mentioned by chandana in the linked discussion group and now import is working fine.

In addition to these services, I also have Oozie, Spark and Hue turned on. So, not sure if Zoo keeper being off is the only culprit.

It would still be nice to know what logs to look at in order to identify the cause of such issues.

[manojkbig@gw01 ~]$ sqoop import-all-tables \

-m 1
–connect “jdbc:mysql://nn01.itversity.com:3306/retail_db”
–username=retail_dba
–password=itversity
–hive-import
–hive-home /apps/hive/warehouse/manojk_sqoop_import.db
Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/01/22 00:37:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.0.0-1245
17/01/22 00:37:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/01/22 00:37:11 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
17/01/22 00:37:11 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
17/01/22 00:37:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/01/22 00:37:11 INFO tool.CodeGenTool: Beginning code generation
17/01/22 00:37:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM categories AS t LIMIT 1
17/01/22 00:37:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM categories AS t LIMIT 1
17/01/22 00:37:11 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.0.0-1245/hadoop-mapreduce
Note: /tmp/sqoop-manojkbig/compile/83c3aaccb53fd210180d47da60307e6a/categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/01/22 00:37:13 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-manojkbig/compile/83c3aaccb53fd210180d47da60307e6a/categories.jar
17/01/22 00:37:13 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/01/22 00:37:13 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/01/22 00:37:13 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/01/22 00:37:13 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/01/22 00:37:13 INFO mapreduce.ImportJobBase: Beginning import of categories
17/01/22 00:37:14 INFO impl.TimelineClientImpl: Timeline service address: http://rm01.itversity.com:8188/ws/v1/timeline/
17/01/22 00:37:14 INFO client.RMProxy: Connecting to ResourceManager at rm01.itversity.com/172.16.1.106:8050
17/01/22 00:37:14 INFO client.AHSProxy: Connecting to Application History server at rm01.itversity.com/172.16.1.106:10200
17/01/22 00:37:21 INFO db.DBInputFormat: Using read commited transaction isolation
17/01/22 00:37:22 INFO mapreduce.JobSubmitter: number of splits:1
17/01/22 00:37:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484271824919_1280
17/01/22 00:37:22 INFO impl.YarnClientImpl: Submitted application application_1484271824919_1280
17/01/22 00:37:22 INFO mapreduce.Job: The url to track the job: http://rm01.itversity.com:8088/proxy/application_1484271824919_1280/
17/01/22 00:37:22 INFO mapreduce.Job: Running job: job_1484271824919_1280
17/01/22 00:37:28 INFO mapreduce.Job: Job job_1484271824919_1280 running in uber mode : false
17/01/22 00:37:28 INFO mapreduce.Job: map 0% reduce 0%
17/01/22 00:37:33 INFO mapreduce.Job: map 100% reduce 0%
17/01/22 00:37:34 INFO mapreduce.Job: Job job_1484271824919_1280 completed successfully
17/01/22 00:37:35 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=159690
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=1029
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=5400
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=2700
Total vcore-milliseconds taken by all map tasks=2700
Total megabyte-milliseconds taken by all map tasks=4147200
Map-Reduce Framework
Map input records=58
Map output records=58
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=41
CPU time spent (ms)=1040
Physical memory (bytes) snapshot=225886208
Virtual memory (bytes) snapshot=3271639040
Total committed heap usage (bytes)=170917888
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=1029
17/01/22 00:37:35 INFO mapreduce.ImportJobBase: Transferred 1.0049 KB in 20.8275 seconds (49.4059 bytes/sec)
17/01/22 00:37:35 INFO mapreduce.ImportJobBase: Retrieved 58 records.
17/01/22 00:37:35 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
17/01/22 00:37:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM categories AS t LIMIT 1
17/01/22 00:37:35 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/usr/hdp/2.5.0.0-1245/hive/lib/hive-common-1.2.1000.2.5.0.0-1245.jar!/hive-log4j.properties
FAILED: IllegalStateException Unxpected Exception thrown: Unable to fetch table categories. org.apache.hadoop.security.AccessControlException: Permission denied: user=manojkbig, access=EXECUTE, inode="/user/subramanyamsibbala/categories":subramanyamsibbala:hdfs:drwx------
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1827)
at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3972)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1130)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:851)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)

thanks @Kiran. turning on the zoo-keeper works fine with me.

To import all tables, it is mandatory that every table in that database must have a primary key field.

check this

and add --driver option also to your command
–driver com.mysql.jdbc.Driver

see if that helps you.