MapReduce job Map Task get stuck at 67%

hadoop
mapreduce

#1

I have created MapReduce job in Java in Eclipse and created MapReduce job for wordcount which reads data (approximate 7453215 record - 670MB) from SQL server and store result back to SQL server. I have created HDInsight cluster on azure which has 2 head node and 3 worker nodes. Each node has 4 cores and 14GB RAM. MapReduce job running successfully on a local machine but while I am submitting jar file of MapReduce job to HDInsight cluster on Azure then it stopped on map task at 67%.

Here is the log,

17/12/01 13:23:20 INFO client.AHSProxy: Connecting to Application History server at headnodehost/10.0.0.20:10200 17/12/01 13:23:21 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]… 17/12/01 13:23:21 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm2] 17/12/01 13:23:21 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 17/12/01 13:23:36 INFO mapreduce.JobSubmitter: number of splits:2 17/12/01 13:23:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1512119994740_0011 17/12/01 13:23:37 INFO impl.YarnClientImpl: Submitted application application_1512119994740_0011 17/12/01 13:23:37 INFO mapreduce.Job: The url to track the job: http://hn1-hdpclu.53o3id15rwte5en44vyo02sv0h.dx.internal.cloudapp.net:8088/proxy/application_1512119994740_0011/ 17/12/01 13:23:37 INFO mapreduce.Job: Running job: job_1512119994740_0011 17/12/01 13:23:47 INFO mapreduce.Job: Job job_1512119994740_0011 running in uber mode : false

17/12/01 13:23:47 INFO mapreduce.Job: map 0% reduce 0%
17/12/01 13:24:00 INFO mapreduce.Job: map 33% reduce 0%
17/12/01 13:24:06 INFO mapreduce.Job: map 67% reduce 0%

And I am getting this error :

2017-12-01 12:56:41,303 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:1 AssignedMaps:0 AssignedReds:0 CompletedMaps:2 CompletedReds:0 ContAlloc:2 ContRel:0 HostLocal:0 RackLocal:0 2017-12-01 12:56:41,304 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1512119994740_0008_m_000001_0: Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143


#2

When u write from Database to HDFS , is it working?


#3

I am not writing anything to the HDFS. I am connecting to the SQL server database processing the data and storing output back to the SQL server database, all these steps are performing from single map reduce job. I am writing this MapReduce Job in Java in Eclipse.Then I am creating the Runnable jar and executing on HDInsight Cluster on Azure.

Following is my Main Class:

package com.hadoop;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBInputFormat;
import org.apache.hadoop.mapreduce.lib.db.DBOutputFormat;
import org.apache.hadoop.mapreduce.lib.db.DataDrivenDBInputFormat;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;

public class Main {
 
  public static void main(String[] args) throws Exception
    {
     Configuration conf = new Configuration();
      DBConfiguration.configureDB(conf,
      "com.microsoft.sqlserver.jdbc.SQLServerDriver",   
      "jdbc:sqlserver://xxx.xxx.xxx.xx:xxxxx;databaseName=HadoopFlight", 
      "username",    
      "password"); 
      System.out.println("connected to db");
      Job job = Job.getInstance(conf);
      job.setJarByClass(Main.class);
      job.setMapperClass(Map.class);
      job.setReducerClass(Reduce.class);
      job.setMapOutputKeyClass(Text.class);
      job.setMapOutputValueClass(IntWritable.class);
      job.setOutputKeyClass(DBOutputWritable.class);
      job.setOutputValueClass(NullWritable.class);
      job.setInputFormatClass(DBInputFormat.class);
      job.setOutputFormatClass(DBOutputFormat.class);
      job.setNumReduceTasks(10); 
      DataDrivenDBInputFormat.setInput(job, DBInputWritable.class,
            "SELECT * FROM Flight WHERE Id IS NOT NULL",
            "SELECT MIN(Id),MAX(Id) FROM Flight"); 
      System.out.println("execute the query");
     DBOutputFormat.setOutput(
      job,
      "appflight", 
      new String[]  {"FlightNum" ,"cnt"} 
      );
      System.exit(job.waitForCompletion(true) ? 0 : 1);
      }
     }