Introduction to Hadoop eco system - Overview of HDFS - Getting File Metadata

Let us see how to get metadata for the files stored in HDFS using hdfs fsck command.

  • We have files copied under HDFS location /user/${USER}/retail_db. We also have some sample large files copied under HDFS location /public/randomtextwriter. We can use hdfs fsck command.
  • We will first see how to get metadata of these files and then try to interpret it in subsequent topics.
  • HDFS stands for Hadoop Distributed File System. It means files are copied in distributed fashion.
  • Our cluster has master nodes and worker nodes, in this case the files will be physically copied in the worker nodes where data node process is running. We will cover this as part of the HDFS architecture.
  • Here are the details about worker nodes along with corresponding private IPs.
Private IP Full DNS Short DNS
172.16.1.102 wn01.itversity.com wn01
172.16.1.103 wn02.itversity.com wn02
172.16.1.104 wn03.itversity.com wn03
172.16.1.107 wn04.itversity.com wn04
172.16.1.108 wn05.itversity.com wn05

Key Concepts Explanation

HDFS fsck Command

The hdfs fsck command is used to check the status of files and directories in HDFS, including retrieving metadata about the files. Here is how to use it with different options to get detailed information.

hdfs fsck -help

Checking Metadata for a Folder

To get a high-level overview of a folder in HDFS, you can use the command hdfs fsck /user/${USER}/retail_db.

hdfs fsck /user/${USER}/retail_db

Getting File Names

To retrieve details about file names within a directory, use the -files option with the hdfs fsck command.

hdfs fsck /user/${USER}/retail_db -files

Understanding Block Storage

Files in HDFS are physically stored in worker nodes as blocks. You can get details about the blocks associated with files using the -blocks option.

hdfs fsck /user/${USER}/retail_db -files -blocks

Retrieving Block Locations

To view details about the worker nodes where blocks are physically stored, use the -locations option along with -blocks.

hdfs fsck /user/${USER}/retail_db -files -blocks -locations

Hands-On Tasks

Here are some hands-on tasks you can perform to practice working with file metadata in HDFS:

  1. Use hdfs fsck command to check the metadata of a specific file in HDFS.
  2. Explore the different options available with the hdfs fsck command and analyze the output for better understanding.

Conclusion

In this article, we learned how to utilize the hdfs fsck command to retrieve metadata for files stored in HDFS. Understanding the storage and block structure in HDFS is crucial for efficient data management. Practice these commands and concepts to enhance your skills in Hadoop environments. Happy learning!

Getting File Metadata

[Insert YouTube Video Here]

Watch the video tutorial here