Introduction to Hadoop eco system - Overview of HDFS - Previewing Data in HDFS Files

Let us see how we can preview the data in HDFS. If we are dealing with files containing text data (files of text file format), we can preview contents of the files using different commands such as -tail, -cat, etc.

  • -tail can be used to preview the last 1 KB of the file.
  • -cat can be used to print the whole contents of the file on the screen. Be careful while using -cat as it will take a while for even medium-sized files.
  • If you want to get the first few lines from a file, you can redirect the output of hadoop fs -cat or hdfs dfs -cat to the Linux more command.

Key Concepts Explanation

Previewing Files with HDFS Commands

You can use the following commands to preview data in HDFS files:

hdfs dfs -ls /user/${USER}/retail_db
hdfs dfs -ls -R /user/${USER}/retail_db
hdfs dfs -put -f /data/retail_db /user/${USER}/
hdfs dfs -help tail
hdfs dfs -tail /user/${USER}/retail_db/orders/part-00000
hdfs dfs -help cat
hdfs dfs -cat /user/${USER}/retail_db/departments/part-*

To see the first few lines in a file, you can use the following command in the terminal or CLI:

hdfs dfs -cat /user/${USER}/retail_db/orders/part-00000 | more

Hands-On Tasks

Here are some hands-on tasks for you to try:

  1. Use hdfs dfs -ls to list files in a directory.
  2. Preview the last 1 KB of a file using hdfs dfs -tail.
  3. Print the contents of a file using hdfs dfs -cat.
  4. Redirect the output of hdfs dfs -cat to the more command to view the first few lines of a file.

Conclusion

In this article, we learned how to preview data in HDFS files using commands such as -tail and -cat. By practicing these commands, you can efficiently view the contents of files in HDFS. Feel free to explore more and engage with the community for further learning.

Watch the video tutorial here