Hi All… I have a query, when we load data into a hive table which is stored as ORC files. we could see the data in hdfs location …some times there will be only one file . and some times the data will be splited in more than than one file based on the input file size that we are trying to load. Now my question is, will that files division be based on row counts like for example first 100 rows in first file next 100 in second file and next 100 in third file and so on… or there can chances like half of the row could be in one file and half of remaining row in second consecutive file…?
You need to provide examples.
ORC file format stores data in columnar format not the row format.
so will it divide based on columns. into different files…i mean can there be chance that half of column data in one file and part of column data in second file…please suggest…Thanks…
I am also not sure if it will divide data of same column into multiple sets of files.