Error occurred while moving data from itversity cluster to HDFS in EMR cluster


#1

@dgadiraju sir, kindly help me

I am trying to move data from one cluster(Itversity labs) toHDFS in EMR cluster. I have performed the following command

hadoop distcp -pb hdfs://nn01.itversity.com:8020/user/nikhilvemula/flight_data/1994.csv hdfs://ec2-13-59-78-105.us-east-2.compute.amazonaws.com:8020/itversity

here is the log

18/07/12 18:18:51 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[BLOCKSIZE], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[hdfs://nn01.itversity.com:8020/user/nikhilvemula/flight_data/1994.csv], targetPath=hdfs://ec2-13-59-78-105.us-east-2.compute.amazonaws.com:8020/itversity, targetPathExists=true, filtersFile='null'}
18/07/12 18:18:52 INFO impl.TimelineClientImpl: Timeline service address: http://rm01.itversity.com:8188/ws/v1/timeline/
18/07/12 18:18:52 INFO client.RMProxy: Connecting to ResourceManager at rm01.itversity.com/172.16.1.106:8050
18/07/12 18:18:52 INFO client.AHSProxy: Connecting to Application History server at rm01.itversity.com/172.16.1.106:10200
18/07/12 18:18:52 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 1; dirCnt = 0
18/07/12 18:18:52 INFO tools.SimpleCopyListing: Build file listing completed.
18/07/12 18:18:53 INFO tools.DistCp: Number of paths in the copy list: 1
18/07/12 18:18:53 INFO tools.DistCp: Number of paths in the copy list: 1
18/07/12 18:18:53 INFO impl.TimelineClientImpl: Timeline service address: http://rm01.itversity.com:8188/ws/v1/timeline/
18/07/12 18:18:53 INFO client.RMProxy: Connecting to ResourceManager at rm01.itversity.com/172.16.1.106:8050
18/07/12 18:18:53 INFO client.AHSProxy: Connecting to Application History server at rm01.itversity.com/172.16.1.106:10200
18/07/12 18:18:54 INFO mapreduce.JobSubmitter: number of splits:1
18/07/12 18:18:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1528589352821_22168
18/07/12 18:18:54 INFO impl.YarnClientImpl: Submitted application application_1528589352821_22168
18/07/12 18:18:54 INFO mapreduce.Job: The url to track the job: http://rm01.itversity.com:19288/proxy/application_1528589352821_22168/
18/07/12 18:18:54 INFO tools.DistCp: DistCp job-id: job_1528589352821_22168
18/07/12 18:18:54 INFO mapreduce.Job: Running job: job_1528589352821_22168
18/07/12 18:19:00 INFO mapreduce.Job: Job job_1528589352821_22168 running in uber mode : false
18/07/12 18:19:00 INFO mapreduce.Job:  map 0% reduce 0%
18/07/12 18:19:10 INFO mapreduce.Job:  map 100% reduce 0%
18/07/12 18:19:18 INFO mapreduce.Job: Task Id : attempt_1528589352821_22168_m_000000_0, Status : FAILED
**_`Error: java.io.IOException: File copy failed: hdfs://nn01.itversity.com:8020/user/nikhilvemula/flight_data/1994.csv --> hdfs://ec2-13-59-78-105.us-east-2.compute.amazonaws.com:8020/itversity/1994.csv`_**
        at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:287)
        at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:255)
        at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://nn01.itversity.com:8020/user/nikhilvemula/flight_data/1994.csv to hdfs://ec2-13-59-78-105.us-east-2.compute.amazonaws.com:8020/itversity/1994.csv
        at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
        at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:283)
        ... 10 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /itversity/.distcp.tmp.attempt_1528589352821_22168_m_000000_0 could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1735)
        at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2561)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:829)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:847)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:790)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2486)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552)
        at org.apache.hadoop.ipc.Client.call(Client.java:1496)
        at org.apache.hadoop.ipc.Client.call(Client.java:1396)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:457)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176)
        at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1489)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1284)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)

Error: java.io.IOException: File copy failed: hdfs://nn01.itversity.com:8020/user/nikhilvemula/flight_data/1994.csv --> hdfs://ec2-13-59-78-105.us-east-2.compute.amazonaws.com:8020/itversity/1994.csv