One of hardest level I faced in interview. Can anybody suggest me how much my ans is close to and give some suggestion and share if any detail article/videos related to that Please.
**1) HDFS **
i) HDFS important component. (My ans Name node, data node )
ii)Who is doing splitting in HDFS. either name node or data node.
My ans is once client approach to master node, master node see the size of the file and find location where to copy by (calculate the ((totalfile size/ block size) * replication factor)) then will tell what where to store data to client. finally client split the data and copy data to the respective data node. Once client copied to the data node. data node send the block report. which contains block id, location, owner, permission.
iii) Mata data is update is happen after copied or before copied. If after copied mean who is updating that. (I don’t have sharp answer).
Iv) Block report contains starting and ending index of file? I said no.
**2) HBASE **
HBASE is sit on top of HDFS. So Who is splitting the file here, Whose is updating the meta data info in name node.
I said client is doing splitting as per ZK instruction. interviewer asked ZK is giving storing info based on input split size.
First of all I did not expect this kind of questions.
I worked Hbase and know like if client wants to store data then client talk to zoo keeper and zookeper knows In which region server can store the data then zk give that info client and then client started to write data to memstore and
WAL. once memstore grows high Region server will create Region and stores as HFILE.
- Even he is asked question sharding and what are the modification happening in HDFS level.