Introduction to Hadoop eco system - Overview of HDFS - Overriding Properties

Let us understand how we can override the properties while running hdfs dfs or hadoop fs commands.

  • We can change any property which is not defined as final in core-site.xml or hdfs-site.xml.

  • We can change blocksize as well as replication while copying the files. We can also change them after copying the files as well.

  • We can either pass individual properties using -D or bunch of properties by passing xml similar to core-site.xml or hdfs-site.xml as part of --conf.

  • Let’s copy a file /data/crime/csv/rows.csv with default values. The file is split into 12 blocks with 2 copies each (as our default blocksize is 128 MB and replication factor is 2).

hdfs dfs -ls /user/${USER}/crime
hdfs dfs -rm -R -skipTrash /user/${USER}/crime
hdfs dfs -mkdir -p /user/${USER}/crime/csv
ls -lhtr /data/crime/csv
hdfs dfs -put /data/crime/csv/rows.csv /user/${USER}/crime/csv
hdfs dfs -stat %r /user/${USER}/crime/csv/rows.csv
hdfs dfs -stat %o /user/${USER}/crime/csv/rows.csv
hdfs dfs -stat %b /user/${USER}/crime/csv/rows.csv
hdfs dfs -rm -R -skipTrash /user/${USER}/crime/csv/rows.csv
hdfs dfs -Ddfs.blocksize=64M -Ddfs.replication=3 -put /data/crime/csv/rows.csv /user/${USER}/crime/csv
hdfs dfs -stat %r /user/${USER}/crime/csv/rows.csv
hdfs dfs -stat %o /user/${USER}/crime/csv/rows.csv
hdfs dfs -stat %b /user/${USER}/crime/csv/rows.csv
ls -ltr /etc/hadoop/conf/
cat /etc/hadoop/conf/hdfs-site.xml

Watch the video tutorial here