Step 02 - Writing map reduce programs - Aggregations

This is 2nd 2 week plan for the preparation of HDPCD Java examination

Goal: Get comfortable in writing, executing and troubleshooting simple map reduce programs. Also make sure you understand HDFS commands

  • HDFS briefly
  • Introduction to map reduce APIs
  • Default Mappers and Reducers
  • Develop row count program
  • Run row count program on the cluster
  • Understand counters

Here is the playlist for the reference

Next Plan: Aggregations and role of combiners

I pulled the master branch from the git for mapreduce
I tried running RecordCount.java it fails with the below

ps:Its windows machine

Error:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread “main” java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:483)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:815)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:798)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:728)
at org.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFileSystem.java:486)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:527)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:305)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:133)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:144)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at nyse.RecordCount.run(RecordCount.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at nyse.RecordCount.main(RecordCount.java:58)

What are the parameters you have passed?

I ran the same program in my linux Cludera vm’s eclipse .My project is working. I passed the same parameters . Thanks

Do we have map side joins and reduce side joins video tutorial ? or any plans in the near future. I could only see distributed cache beacuse on the HDPCD-JAVA tasks does include map side join.

Yes, I will add videos soon.