Introduction to Hadoop eco system - Overview of HDFS - Using HDFS Stat Commands

Let us understand how to get details about HDFS files such as replication factor, block size, etc.

hdfs dfs -stat

The hdfs dfs -stat command can be used to get statistics related to a file or directory. Below are some examples of using this command:

hdfs dfs -help stat
hdfs dfs -stat /user/${USER}/retail_db/orders
hdfs dfs -stat %b /user/${USER}/retail_db/orders/part-00000
hdfs dfs -stat %F /user/${USER}/retail_db/orders/part-00000
hdfs dfs -stat %F /user/${USER}/retail_db/orders/
hdfs dfs -stat %o /user/${USER}/retail_db/orders/part-00000
hdfs dfs -stat %r /user/${USER}/retail_db/orders/part-00000

Hands-On Tasks

Here are some hands-on tasks that you can perform to apply the concepts discussed in the article:

  1. Use hdfs dfs -stat to get the statistics of a file in your HDFS directory.
  2. Experiment with different format specifiers like %b, %F, %o, and %r to understand their outputs.

Conclusion

In this article, we learned about using HDFS stat commands to gather information about files stored in the Hadoop Distributed File System. By using these commands, you can obtain details such as replication factor, block size, owner, and group of a file. Practice these commands and explore more functionalities to deepen your understanding of HDFS.

Watch the video tutorial here