Hello Team;
I imported the data into HDFS using SQOOP avrodatafile successfully by using the below code:
sqoop import -Dmapreduce.job.user.classpath.first=true -m 1 --connect “jdbc:mysql://nn01.itversity.com:3306/retail_db” --username retail_dba --password itversity --table departments --target-dir /user/shubhaprasadsamal/training/sqoop_import/department_avro --as-avrodatafile
But the data count of the part-m-00000.avro file and the database is not matching. Please find the below result:
Data count from HDFS location:
[shubhaprasadsamal@gw01 ~]$ hadoop fs -cat /user/shubhaprasadsamal/training/sqoop_import/department_avro/part*|wc -l
1
Data count from database:
[shubhaprasadsamal@gw01 ~]$ sqoop eval --connect “jdbc:mysql://nn01.itversity.com:3306/retail_db” --username retail_dba --password itversity \
–query "select count(1) from departments"
Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/01/10 13:39:08 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.0.0-1245
17/01/10 13:39:08 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/01/10 13:39:08 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
count(1) |
---|
7 |
Is it the desired result and the data count of the file is smaller due to garbled data in HDFS location ?
Getting the same result with sequencefile as well.
Please suggest. Thanks,
Shitansu.