Question on Data Partition

#1

Hello Team ,

I have to two basic question to ask.

  1. While importing a MySql Table , HDFS is creating the partition for the same where as while putting a Text file into the HDFS it’s not creating the partition What is the reason for the same ??
  2. Can we allocate a partition to a different machine to process instead of processing in a single node cluster ??

Please clarify the doubt.

Regards ,
Amit

0 Likes

#2

Hi, @amit0900, when you import a table, a Map Reduce program is run and the program emits output files, depending upon the number of mappers. If the number of mappers specified is 4, then you will find 4 files in the output folder. If you specify just one mapper then there will be only 1 file in the output folder.
When you put the data into an HDFS location, underneath the data is partitioned depending upon the HDFS block size specified. The HDFS path is just logical and an abstraction so that it becomes easy for the users, that is the reason you find only one file but underneath it is many blocks.
2. Yes, depending upon the available nodes in the cluster, any machine can process the partition but it depends upon on the master JobTracker, to schedule jobs on the the available slave machines.

1 Like

#3

Changed the category to Hive which is sub category to Big Data.

0 Likes

#4

Pramod , Thanks a lot for your response. My Question is : If we put a Text document into the HDFS it’s not partitioning the Data where as if we import a MySql Table into the HDFS its partitioning the same. May I know the reason WHY ?

0 Likes

#5

Hi amith,

When you put a text document into HDFS it’s not partitioning as all the data in that file is fitting within the 1 block of your HDFS. So you need to check what is your default block size in HDFS. (Ex: your file size is 100mb and your default HDFS block size is 128 mb, then when you copy any file within size 128mb to HDFS wont split into multiple files). Hope it has clarified your quire.

Kind Regards,
Srini.

0 Likes

#6

@amit0900

If you run hadoop fs -put command to copy the data into a partitioned table it does not understand how the table is partitioned. It will only try to copy the data to target directory as is.

When you run sqoop import, the map reduce will understand underlying table structure with partitions and try to partition the data.

Your 2nd question is not clear.

0 Likes

#7

Durga Sir , My 2nd Question is : Can we assign each partition to different systems ? As we know , HDFS is storing the data is different clusters. If yes , How to assign different partition to different systems ? Hope my question is clear enough.

0 Likes

#8

@amit0900 assigning to different systems will be taken care by HDFS as per properties mentioned in hdfs-site.xml and core-site.xml

0 Likes