Unable to merge small snappy encoded files in cloudera which are output of query(insert into tableB select * from tableA) in hive with hive.execution.engine=spark.
Merging is working fine with with hive.execution.engine=mr.
I am using following properties to merge and having stocks_eod data in snappycodec.
set hive.hadoop.supports.splittable.combineinputformat = true;
with these properties if execution engine is set to mr : output is 1 snappy file of 183.5MB.
if execution engine is set to spark : output is 3 snappy files of approx 68, 77, 37 MB