Hadoop streaming using shell script : reducer fails with error : No such file or directory

hadoop

#1

I am user of Big Data Lab.

I am trying to run a simple WordCount job using shell script on Bash.Below is the Commmand line arguments I am using.

yarn jar /usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-streaming-2.7.3.2.6.5.0-292.jar \
-mapper 'wc -l' \
-reducer './reducer_wordcount.sh' \
-file /home/pathirippilly/map_reduce_jobs/shell_scripts/reducer_wordcount.sh \
-numReduceTasks 1 \
-input /user/pathirippilly/cards/smalldeck.txt \
-output /user/pathirippilly/mapreduce_jobs/output_shell
  1. Here reducer_wordcount.sh is the reducer shell script which is available in my local directory /home/pathirippilly/map_reduce_jobs/shell_scripts
  2. smalldeck.txt is the input file on hadoop directory /user/pathirippilly/cards
  3. /user/pathirippilly/mapreduce_jobs/output_shell is the output directory

reducer_wordcount.sh is having:

#! /user/bin/env bash
awk '{line_count += $1} END  { print line_count }'

When I run this , I am getting below error for reducer_wordcount.sh

     Error: java.lang.RuntimeException: Error in configuring object
                at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
                at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
                at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
                at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:410)
                at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
                at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:422)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
                at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
        Caused by: java.lang.reflect.InvocationTargetException
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.lang.reflect.Method.invoke(Method.java:498)
                at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
                ... 9 more
        Caused by: java.lang.RuntimeException: configuration exception
                at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
                at org.apache.hadoop.streaming.PipeReducer.configure(PipeReducer.java:67)
                ... 14 more
        Caused by: java.io.IOException: Cannot run program "/hdp01/hadoop/yarn/local/usercache/pathirippilly/appcache/application_1533622723243_17238/container_e38_1533622723243_17238_01_000004/./reducer_wordcount.sh": error=2, No such file or directory
                at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
                at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
                ... 15 more
        Caused by: java.io.IOException: error=2, No such file or directory
                at java.lang.UNIXProcess.forkAndExec(Native Method)
                at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
                at java.lang.ProcessImpl.start(ProcessImpl.java:134)
                at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)

If I run the same reducer script directly as commandline commad as below, it works

yarn jar /usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-streaming.jar \
-mapper 'wc -l' \
-reducer "awk '{line_count += \$1} END  { print line_count }'" \
-numReduceTasks 1 \
-input /user/pathirippilly/cards/smalldeck.txt \
-output /user/pathirippilly/mapreduce_jobs/output_shell

Expecting helping hands here, I am pretty new to hadoop streaming.