Sqoop Import data compression

sqoop

#1

Hi,

I have imported data from retail_DB database from sql using sqoop import command to hive database satishp381_db.db using snappy compression. When i see the table properties using hive command “describe formatted categories” , it shows up the compressed properties as NO.

Detailed Table Information

Database: satishp381_db
Owner: satishp38
CreateTime: Sun Apr 08 22:44:59 EDT 2018
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://nn01.itversity.com:8020/apps/hive/warehouse/satishp381_db.db/categories
Table Type: MANAGED_TABLE
Table Parameters:
comment Imported by sqoop on 2018/04/08 22:44:49
numFiles 4
numRows 0
rawDataSize 0
totalSize 943
transient_lastDdlTime 1523241901

Storage Information

SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \u0001
line.delim \n
serialization.format \u0001

When I look at the data at location, i see the data compressed in snappy. Why does it not show up in the table properties that data is compressed but it shows up in the physical location with *.snappy extension


Practice Sqoop on state of the art Big Data cluster - https://labs.itversity.com



#2

The Compressed field is not a reliable indicator of whether the table contains compressed data. It typically always shows No, because the compression settings only apply during the session that loads data and are not stored persistently with the table metadata.

Reference: https://www.cloudera.com/documentation/cdh/5-0-x/Impala/Installing-and-Using-Impala/ciiu_describe.html