Exercise 11 - Copying files to HDFS

Resources:

  • Click here for $35 coupon for CCA 175 Spark and Hadoop Developer using Python.
  • Click here for $35 coupon for CCA 175 Spark and Hadoop Developer using Scala.
  • Click here for $25 coupon for HDPCD:Spark using Python.
  • Click here for $25 coupon for HDPCD:Spark using Scala.
  • Click here for access to state of the art 13 node Hadoop and Spark Cluster

This exercise is very important to have hands on knowledge to deal with files in HDFS as a developer.

Description

  • External source (your PC)
  • Stage files on gateway Node (which have access to the entire cluster)
  • Files can be downloaded to your PC from github account

Problem Statement

  • Copy files to HDFS (both retail_db directory as well as largedeck.txt file)
  • For largedeck.txt
  • Change the block size to 64 MB
  • Change replication factor to 1
  • Get the metadata of the files (hdfs fsck)

Also, answer these questions

  • What is default block size? What is the purpose of block size?
  • What is default replication factor? Explain the purpose of replication factor
  • Make sure you understand the role of name node/secondary name node and data node
  • What are different commands to copy files from local filesystem on Gateway node to HDFS?
  • What are different commands to copy files from HDFS to local filesystem on Gateway node?
  • Also make sure to understand commands to copy/move files from one HDFS location to another HDFS location
  • Permissions on the files
  • Locations of the parameter files

Note: for certification purposes commands to copy the files from and to HDFS is enough.

1 Like

hdfs fsck largedeck.txt -files -blocks -locations
Connecting to namenode via http://nn01.itversity.com:50070/fsck?ugi=parameshpathakunta&files=1&blocks=1&locations=1&path=%2Fuser%2Fparameshpathakunta%2Flargedeck.txt
FSCK started by parameshpathakunta (auth:SIMPLE) from /172.16.1.100 for path /user/parameshpathakunta/largedeck.txt at Mon Jan 23 06:10:12 EST 2017
/user/parameshpathakunta/largedeck.txt 726663168 bytes, 11 block(s): OK
0. BP-292116404-172.16.1.101-1479167821718:blk_1074196145_456097 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.103:50010,DS-7fb58858-abe9-4a52-9b75-755d849a897b,DISK]]

  1. BP-292116404-172.16.1.101-1479167821718:blk_1074196146_456098 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.102:50010,DS-1edb1d35-81bf-471b-be04-11d973e2a832,DISK]]
  2. BP-292116404-172.16.1.101-1479167821718:blk_1074196147_456099 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.104:50010,DS-f4667aac-0f2c-463c-9584-d625928b9af5,DISK]]
  3. BP-292116404-172.16.1.101-1479167821718:blk_1074196148_456100 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.103:50010,DS-1f4edfab-2926-45f9-a37c-ae9d1f542680,DISK]]
  4. BP-292116404-172.16.1.101-1479167821718:blk_1074196149_456101 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.102:50010,DS-b0f1636e-fd08-4ddb-bba9-9df8868dfb5d,DISK]]
  5. BP-292116404-172.16.1.101-1479167821718:blk_1074196150_456102 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.104:50010,DS-98fec5a6-72a9-4590-99cc-cee3a51f4dd5,DISK]]
  6. BP-292116404-172.16.1.101-1479167821718:blk_1074196151_456103 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.103:50010,DS-7fb58858-abe9-4a52-9b75-755d849a897b,DISK]]
  7. BP-292116404-172.16.1.101-1479167821718:blk_1074196152_456104 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.103:50010,DS-1f4edfab-2926-45f9-a37c-ae9d1f542680,DISK]]
  8. BP-292116404-172.16.1.101-1479167821718:blk_1074196153_456105 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.103:50010,DS-7fb58858-abe9-4a52-9b75-755d849a897b,DISK]]
  9. BP-292116404-172.16.1.101-1479167821718:blk_1074196154_456106 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.102:50010,DS-1edb1d35-81bf-471b-be04-11d973e2a832,DISK]]
  10. BP-292116404-172.16.1.101-1479167821718:blk_1074196155_456107 len=55574528 repl=1 [DatanodeInfoWithStorage[172.16.1.104:50010,DS-f4667aac-0f2c-463c-9584-d625928b9af5,DISK]]

Status: HEALTHY
Total size: 726663168 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 11 (avg. block size 66060288 B)
Minimally replicated blocks: 11 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Mon Jan 23 06:10:12 EST 2017 in 0 milliseconds

The filesystem under path ‘/user/parameshpathakunta/largedeck.txt’ is HEALTHY

Default Block size = 128 (in hadoop 2…x.x)

Block size is the minimum size for distribution of data in to different nodes

default replication factor=3

For high avilability of data replication factor is used (when ever a node crashes/server down)

name node
IN which metadata of all the data nodes is stored (FS IMAGE)

data node
In which actual data is stored in the form of blocks

hadoop fs -put/copyFromLocal

hadoop fs -get/copyToLocal

cd /etc/hadoop/conf

hdfs-site.xml
core-site.xml

Copy files to HDFS

[mahesh007@gw01 ~]$ hadoop fs -ls retail_db
Found 6 items
drwxr-xr-x - mahesh007 hdfs 0 2017-01-23 05:39 retail_db/categories
drwxr-xr-x - mahesh007 hdfs 0 2017-01-23 05:39 retail_db/customers
drwxr-xr-x - mahesh007 hdfs 0 2017-01-23 05:39 retail_db/departments
drwxr-xr-x - mahesh007 hdfs 0 2017-01-23 05:39 retail_db/order_items
drwxr-xr-x - mahesh007 hdfs 0 2017-01-23 05:39 retail_db/orders
drwxr-xr-x - mahesh007 hdfs 0 2017-01-23 05:39 retail_db/products

[mahesh007@gw01 ~]$ hadoop fs -Ddfs.replication=1 -Ddfs.blocksize=67108864 -put largedeck.txt cards

[mahesh007@gw01 ~]$ hdfs fsck cards/largedeck.txt -files -blocks -locations
Status: HEALTHY
Total size: 726663168 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 11 (avg. block size 66060288 B)
Minimally replicated blocks: 11 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Mon Jan 23 05:57:03 EST 2017 in 0 milliseconds

Answers:

  1. Default block size is 128mb.
    Block size is the minimum amount of data that can be read or written (generally referred to as a “block”) in HDFS.

  2. Default replication factor is 3
    Hadoop uses Replication factor to provide fault tolerance and high availability in HDFS i.e When data is stored over HDFS, NameNode replicates the data to several DataNode. If a DataNode goes down, the NameNode will automatically copy the data to another node from the replicas and make the data available. Hence there is no fear of data loss.

  3. Commands to copy from LFS to HDFS:
    a. hadoop fs -put (lfs_src)… (hdfs_dst)
    b. hadoop fs -copyFromLocal (lfs_src)… (hdfs_dst)

  4. Commands to copy from HDFS to LFS:
    a. hadoop fs -get (hdfs_src)… (lfs_dst)
    b. hadoop fs -copyToLocal (hdfs_src)… (lfs_dst)

1 Like

[raoufkhan@gw01 ~]$ hdfs fsck retail_db/largedeck.txt
Connecting to namenode via http://nn01.itversity.com:50070/fsck?ugi=raoufkhan&path=%2Fuser%2Fraoufkhan%2Fretail_db%2Flargedeck.txt
FSCK started by raoufkhan (auth:SIMPLE) from /172.16.1.100 for path /user/raoufkhan/retail_db/largedeck.txt at Mon Jan 23 06:25:20 EST 2017
.Status: HEALTHY
Total size: 726663168 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 11 (avg. block size 66060288 B)
Minimally replicated blocks: 11 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Mon Jan 23 06:25:20 EST 2017 in 1 milliseconds

The filesystem under path ‘/user/raoufkhan/retail_db/largedeck.txt’ is HEALTHY

Default Block size = 128 (in hadoop 2…x.x)

Block size is the minimum size for distribution of data in to different nodes

default replication factor=3

For high avilability of data replication factor is used (when ever a node crashes/server down)

name node
IN which metadata of all the data nodes is stored (FS IMAGE)

data node
In which actual data is stored in the form of blocks

hadoop fs -put/copyFromLocal

hadoop fs -get/copyToLocal

cd /etc/hadoop/conf

hdfs-site.xml
core-site.xml

[raoufkhan@gw01 ~]$ hdfs fsck retail_db/largedeck.txt
Connecting to namenode via http://nn01.itversity.com:50070/fsck?ugi=raoufkhan&path=%2Fuser%2Fraoufkhan%2Fretail_db%2Flargedeck.txt
FSCK started by raoufkhan (auth:SIMPLE) from /172.16.1.100 for path /user/raoufkhan/retail_db/largedeck.txt at Mon Jan 23 06:25:20 EST 2017
.Status: HEALTHY
Total size: 726663168 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 11 (avg. block size 66060288 B)
Minimally replicated blocks: 11 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Mon Jan 23 06:25:20 EST 2017 in 1 milliseconds

The filesystem under path ‘/user/raoufkhan/retail_db/largedeck.txt’ is HEALTHY

This is an alternative way to change the replication factor of a file in HDFS.Is this legit in this case?

We can also change the replication factor recursively of all the files in HDFS using the same by adding -R.

Here are the required commands:

hadoop fs -setrep -w 1 /user/harsha4480/largedeck.txt.gz

hadoop fs -setrep -R -w 1 /

1 Like

Yes, it is correct for the files which are copied with this replication factor.

1 Like

 hadoop fs -Ddfs.replication=1 -Ddfs.blocksize=67108864 -put largedeck.txt /user/mamilla_revathi
 hdfs fsck /user/mamilla_revathi/largedeck.txt -files -blocks -locations
[mamilla_revathi@gw01 ~]$ hadoop fs -ls /user/mamilla_revathi/cards
Found 1 items
-rw-r–r-- 1 mamilla_revathi hdfs 726663168 2017-01-23 07:52 /user/mamilla_revathi/cards/largedeck.txt
[mamilla_revathi@gw01 ~]$ hdfs fsck /user/mamilla_revathi/cards/largedeck.txt -files -blocks -locations
Connecting to namenode via http://nn01.itversity.com:50070/fsck?ugi=mamilla_revathi&files=1&blocks=1&locations=1&path=%2Fuser%2Fmamilla_revathi%2Fcards%2Flargedeck.txt
FSCK started by mamilla_revathi (auth:SIMPLE) from /172.16.1.100 for path /user/mamilla_revathi/cards/largedeck.txt at Mon Jan 23 07:53:18 EST 2017
/user/mamilla_revathi/cards/largedeck.txt 726663168 bytes, 11 block(s): OK
0. BP-292116404-172.16.1.101-1479167821718:blk_1074197219_457171 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.103:50010,DS-1f4edfab-2926-45f9-a37c-ae9d1f542680,DISK]]

  1. BP-292116404-172.16.1.101-1479167821718:blk_1074197221_457173 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.104:50010,DS-f4667aac-0f2c-463c-9584-d625928b9af5,DISK]]
  2. BP-292116404-172.16.1.101-1479167821718:blk_1074197222_457174 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.102:50010,DS-b0f1636e-fd08-4ddb-bba9-9df8868dfb5d,DISK]]
  3. BP-292116404-172.16.1.101-1479167821718:blk_1074197223_457175 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.102:50010,DS-1edb1d35-81bf-471b-be04-11d973e2a832,DISK]]
  4. BP-292116404-172.16.1.101-1479167821718:blk_1074197224_457176 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.102:50010,DS-b0f1636e-fd08-4ddb-bba9-9df8868dfb5d,DISK]]
  5. BP-292116404-172.16.1.101-1479167821718:blk_1074197225_457177 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.103:50010,DS-1f4edfab-2926-45f9-a37c-ae9d1f542680,DISK]]
  6. BP-292116404-172.16.1.101-1479167821718:blk_1074197226_457178 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.104:50010,DS-98fec5a6-72a9-4590-99cc-cee3a51f4dd5,DISK]]
  7. BP-292116404-172.16.1.101-1479167821718:blk_1074197227_457179 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.102:50010,DS-1edb1d35-81bf-471b-be04-11d973e2a832,DISK]]
  8. BP-292116404-172.16.1.101-1479167821718:blk_1074197228_457180 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.103:50010,DS-7fb58858-abe9-4a52-9b75-755d849a897b,DISK]]
  9. BP-292116404-172.16.1.101-1479167821718:blk_1074197229_457181 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.102:50010,DS-b0f1636e-fd08-4ddb-bba9-9df8868dfb5d,DISK]]
  10. BP-292116404-172.16.1.101-1479167821718:blk_1074197230_457182 len=55574528 repl=1 [DatanodeInfoWithStorage[172.16.1.103:50010,DS-1f4edfab-2926-45f9-a37c-ae9d1f542680,DISK]]

Status: HEALTHY
Total size: 726663168 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 11 (avg. block size 66060288 B)
Minimally replicated blocks: 11 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Mon Jan 23 07:53:18 EST 2017 in 0 milliseconds

The filesystem under path ‘/user/mamilla_revathi/cards/largedeck.txt’ is HEALTHY

• What is default block size? - 128 MB
• What is the purpose of block size?
Block size is the min size of the data that id being distributed across the nodes .
• What is default replication factor? - 3
• Explain the purpose of replication factor - High Avalability
• What are different commands to copy files from local filesystem on Gateway node to HDFS?
Hadoop fs -ls -put/copyFromLocal LOCAL_location HDFS_location
• What are different commands to copy files from HDFS to local filesystem on Gateway node?
Hadoop fs -ls -get/copyToLocal LOCAL_location HDFS_location

[souravkumar@gw01 ~]$ hdfs fsck /user/souravkumar/largedeck.txt -files -blocks -locations
Connecting to namenode via http://nn01.itversity.com:50070/fsck?ugi=souravkumar&files=1&blocks=1&locations=1&path=%2Fuser%2Fsouravkumar%2Flargedeck.txt
FSCK started by souravkumar (auth:SIMPLE) from /172.16.1.100 for path /user/souravkumar/largedeck.txt at Tue Jan 24 01:04:35 EST 2017
/user/souravkumar/largedeck.txt 726663168 bytes, 11 block(s): OK
0. BP-292116404-172.16.1.101-1479167821718:blk_1074196979_456931 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.103:50010,DS-1f4edfab-2926-45f9-a37c-ae9d1f542680,DISK]]

  1. BP-292116404-172.16.1.101-1479167821718:blk_1074196980_456932 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.103:50010,DS-7fb58858-abe9-4a52-9b75-755d849a897b,DISK]]
  2. BP-292116404-172.16.1.101-1479167821718:blk_1074196981_456933 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.104:50010,DS-98fec5a6-72a9-4590-99cc-cee3a51f4dd5,DISK]]
  3. BP-292116404-172.16.1.101-1479167821718:blk_1074196982_456934 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.102:50010,DS-1edb1d35-81bf-471b-be04-11d973e2a832,DISK]]
  4. BP-292116404-172.16.1.101-1479167821718:blk_1074196983_456935 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.104:50010,DS-f4667aac-0f2c-463c-9584-d625928b9af5,DISK]]
  5. BP-292116404-172.16.1.101-1479167821718:blk_1074196984_456936 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.104:50010,DS-98fec5a6-72a9-4590-99cc-cee3a51f4dd5,DISK]]
  6. BP-292116404-172.16.1.101-1479167821718:blk_1074196985_456937 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.102:50010,DS-b0f1636e-fd08-4ddb-bba9-9df8868dfb5d,DISK]]
  7. BP-292116404-172.16.1.101-1479167821718:blk_1074196986_456938 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.102:50010,DS-1edb1d35-81bf-471b-be04-11d973e2a832,DISK]]
  8. BP-292116404-172.16.1.101-1479167821718:blk_1074196987_456939 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.104:50010,DS-f4667aac-0f2c-463c-9584-d625928b9af5,DISK]]
  9. BP-292116404-172.16.1.101-1479167821718:blk_1074196988_456940 len=67108864 repl=1 [DatanodeInfoWithStorage[172.16.1.103:50010,DS-1f4edfab-2926-45f9-a37c-ae9d1f542680,DISK]]
  10. BP-292116404-172.16.1.101-1479167821718:blk_1074196989_456941 len=55574528 repl=1 [DatanodeInfoWithStorage[172.16.1.103:50010,DS-7fb58858-abe9-4a52-9b75-755d849a897b,DISK]]

Status: HEALTHY
Total size: 726663168 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 11 (avg. block size 66060288 B)
Minimally replicated blocks: 11 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Tue Jan 24 01:04:35 EST 2017 in 1 milliseconds

The filesystem under path ‘/user/souravkumar/largedeck.txt’ is HEALTHY

hadoop fs -put retail_db/
hadoop fs -Ddfs.replication=1 -Ddfs.blocksize=67108864 -put cards/largedeck.txt
hdfs fsck largedeck.txt
Connecting to namenode via http://nn01.itversity.com:50070/fsck?ugi=himamarkonda&path=%2Fuser%2Fhimamarkonda%2Flargedeck.txt
FSCK started by himamarkonda (auth:SIMPLE) from /172.16.1.100 for path /user/himamarkonda/largedeck.txt at Tue Jan 24 01:17:47 EST 2017
.Status: HEALTHY
Total size: 726663168 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 11 (avg. block size 66060288 B)
Minimally replicated blocks: 11 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Tue Jan 24 01:17:47 EST 2017 in 0 milliseconds

The filesystem under path ‘/user/himamarkonda/largedeck.txt’ is HEALTHY

hadoop fs -D fs.block.size=67108864 -put data-master/cards/largedeck.txt.gz
[gopikreddy143@gw01 ~]$ hadoop fs -D fs.block.size=67108864 -put data-master/cards/largedeck.txt.gz
[gopikreddy143@gw01 ~]$ hadoop dfs -setrep -w 1 data-master/cards/largedeck.txt.gz
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

setrep: `data-master/cards/largedeck.txt.gz’: No such file or directory
[gopikreddy143@gw01 ~]$ hadoop dfs -setrep -w 1 /user/gopikreddy143/largedeck.txt.gz
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Replication 1 set: /user/gopikreddy143/largedeck.txt.gz
Waiting for /user/gopikreddy143/largedeck.txt.gz …
WARNING: the waiting time may be long for DECREASING the number of replications.
. done
[gopikreddy143@gw01 ~]$ hadoop fsck /user/gopikreddy143/largedeck.txt.gz
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nn01.itversity.com:50070/fsck?ugi=gopikreddy143&path=%2Fuser%2Fgopikreddy143%2Flargedeck.txt.gz
FSCK started by gopikreddy143 (auth:SIMPLE) from /172.16.1.100 for path /user/gopikreddy143/largedeck.txt.gz at Tue Jan 24 02:12:15 EST 2017
.Status: HEALTHY
Total size: 3523170 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 3523170 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Tue Jan 24 02:12:15 EST 2017 in 0 milliseconds

The filesystem under path ‘/user/gopikreddy143/largedeck.txt.gz’ is HEALTHY

hadoop fs -put /home/gopikreddy143/data-master/retail_db /user/gopikreddy143/retail_db

hadoop fs -Dsfs.replication=1 -Ddfs.blocksize=67108864 -put largedeck.txt.gz
hadoop fs -put retail_db/
hadoop fsck /user/sharatchandra/largedeck.txt.gz
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nn01.itversity.com:50070/fsck?ugi=sharatchandra&path=%2Fuser%2Fsharatchandra%2Flargedeck.txt.gz
FSCK started by sharatchandra (auth:SIMPLE) from /172.16.1.100 for path /user/sharatchandra/largedeck.txt.gz at Tue Jan 24 01:52:21 EST 2017
.Status: HEALTHY
Total size: 3523170 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 3523170 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Tue Jan 24 01:52:21 EST 2017 in 0 milliseconds

The filesystem under path ‘/user/sharatchandra/largedeck.txt.gz’ is HEALTHY

Sir, for the sqoop eval the -p command does not work I used the regular procedure for the password which we used for show_databses and tables and it worked fine.Can you look into it please.

The commands are as follows :

sqoop eval --connect “jdbc:mysql://nn01.itversity.com:3306/retail_db” --username retail
_dba --password itversity --query “select count(1) from order_items” # worked fine

sqoop eval --connect “jdbc:mysql://nn01.itversity.com:3306/retail_db” --username retail
_dba -p --query “select count(1) from order_items” # not working

the error is as follows:

Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/01/24 13:44:18 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.0.0-1245
17/01/24 13:44:18 ERROR tool.BaseSqoopTool: Error parsing arguments for eval:
17/01/24 13:44:18 ERROR tool.BaseSqoopTool: Unrecognized argument: -p
17/01/24 13:44:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --query
17/01/24 13:44:18 ERROR tool.BaseSqoopTool: Unrecognized argument: select count(1) from order_items
Try --help for usage instructions.

Its -P not -p. Update code and try.

1 Like

Worked fine.Thank you !

hi,

i am facing this error while i run -put command with -p in it

[rkathiravan@gw01 /]$ hadoop fs -put -p /data/cards/largedeck.txt /user/rkathiravan/data
put: `/user/rkathiravan/data/largedeck.txt’: File exists
[rkathiravan@gw01 /]$ hadoop fs -put -p -f /data/cards/largedeck.txt /user/rkathiravan/data
17/02/25 22:39:57 WARN retry.RetryInvocationHandler: Exception while invoking ClientNamenodeProtocolTranslatorPB.setOwner over null. Not retrying because try once and fail.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Non-super user cannot change owner
at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:85)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setOwner(FSNamesystem.java:1708)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setOwner(NameNodeRpcServer.java:821)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setOwner(ClientNamenodeProtocolServerSideTranslatorPB.java:472)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552)
at org.apache.hadoop.ipc.Client.call(Client.java:1496)
at org.apache.hadoop.ipc.Client.call(Client.java:1396)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy10.setOwner(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setOwner(ClientNamenodeProtocolTranslatorPB.java:419)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176)
at com.sun.proxy.$Proxy11.setOwner(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.setOwner(DFSClient.java:2519)
at org.apache.hadoop.hdfs.DistributedFileSystem$32.doCall(DistributedFileSystem.java:1624)

help me out!!

What is your requirement? -p is used to preserve the permissions of source files in HDFS. largedeck.txt is owned by root and you are trying to change owner to root on HDFS files as well. As you are not super user, it is not letting you do it.

thanks sir. i have analysed and later come to know that it is in root directory and i have no permission. also i forgot to update in the forum.

[vishnureddy@gw01 ~]$ hdfs fsck retail_db/categories/ -files -blocks -locations
Connecting to namenode via http://nn01.itversity.com:50070/fsck?ugi=vishnureddy&files=1&blocks=1&locations=1&path=%2Fuser%2Fvishnureddy%2Fretail_db%2Fcategories
FSCK started by vishnureddy (auth:SIMPLE) from /172.16.1.100 for path /user/vishnureddy/retail_db/categories at Sat Mar 11 14:31:58 EST 2017
/user/vishnureddy/retail_db/categories
/user/vishnureddy/retail_db/categories/part-00000 1029 bytes, 1 block(s): OK
0. BP-292116404-172.16.1.101-1479167821718:blk_1074645580_905789 len=1029 repl=1 [DatanodeInfoWithStorage[172.16.1.103:50010,DS-1f4edfab-2926-45f9-a37c-ae9d1f542680,DISK]]

Status: HEALTHY
Total size: 1029 B
Total dirs: 1
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 1029 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Sat Mar 11 14:31:58 EST 2017 in 1 milliseconds

The filesystem under path ‘/user/vishnureddy/retail_db/categories’ is HEALTHY