Sqoop Import - Using Compression

#1

Let us understand how to compress the data while importing data using sqoop import

  • We can enable the compression by using –compress
  • Default compression with be deflate
  • We can pass a compression algorithm using –compression-codec.
  • We can review io.compression.codecs property in core-site.xml to get list of valid compression algorithms that can be used.
  • All compression algorithms might not be compatible with all file formats and hence it is important to use only compatible compression algorithms based on the file formats used.

Here is the example of sqoop import command to compress the data using default compression algorithm.

sqoop import \
  --connect "jdbc:mysql://ms.itversity.com:3306/retail_db" \
  --username retail_user \
  --password itversity \
  --table order_items \
  --warehouse-dir /user/training/sqoop_import/retail_db \
  --delete-target-dir \
  --compress

Here is the example of sqoop import command to compress the data using snappy compression algorithm.

sqoop import \
  --connect "jdbc:mysql://ms.itversity.com:3306/retail_db" \
  --username retail_user \
  --password itversity \
  --table order_items \
  --warehouse-dir /user/training/sqoop_import/retail_db \
  --delete-target-dir \
  --compress \
  --compression-codec org.apache.hadoop.io.compress.SnappyCodec
0 Likes