Generic question on compression

Hi @annapurna ,

When I run sqoop commands with parquet file format with snappy compression , the files are stored with only .parquet extension.
Example :…
sqoop import
–connect jdbc:mysql://ms.itversity.com/retail_db
–username retail_user
–password itversity
–table products
–columns product_id,product_category_id,product_name,product_price,product_image
–where “product_price < 100”
–delete-target-dir
–target-dir /user/rumanshi/set3/prob1
–as-parquetfile
–compress
–compression-codec org.apache.hadoop.io.compress.SnappyCodec
RESULT is -
/user/rumanshi/set3/prob1/067b2964-2e83-40be-8260-e1d399bbfb26.parquet

but when I execute spark sql commands and then save the result (parquet with snappy) , the files are stored differently:
Example :…
sqlContext.setConf(“spark.sql.parquet.compression.codec”,“snappy”)
result.coalesce(5).write.parquet("/user/rumanshi/prob1")
RESULT is-
/user/rumanshi/prob1/part-r-00004-188128df-bcd4-42c4-aed5ea9fce3978f2.snappy.parquet

Please help me understand that why there is difference or that’s normal behavior.
Thanks.


Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster

@Rumanshi_Ahuja Yes, it is normal. You get the information in .metadata file