I have created MapReduce job in Java in Eclipse and created MapReduce job for wordcount which reads data from SQL server and store result back to SQL server. I have created an HDInsight cluster on Azure which has 2 head node and 3 worker nodes. Each node has 4 cores and 14GB RAM. MapReduce job running successfully on a local machine but while I am submitting jar file of MapReduce job to HDInsight cluster the output gets triple value. Like I word count is 1 it stores 3.
Following is log which I get while submitting a job to the HDInsight cluster:
17/12/02 11:59:55 INFO client.AHSProxy: Connecting to Application History server at headnodehost/10.0.0.13:10200
17/12/02 11:59:58 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]…
17/12/02 11:59:58 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm1]
17/12/02 11:59:58 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/12/02 12:00:09 INFO mapreduce.JobSubmitter: number of splits:3
17/12/02 12:00:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1512210926531_0008
17/12/02 12:00:10 INFO impl.YarnClientImpl: Submitted application application_1512210926531_0008
17/12/02 12:00:10 INFO mapreduce.Job: The url to track the job: http://hn0-hdpclu.tm3xcsmmkyjeddccs1mje5d5nd.ix.internal.cloudapp.net:8088/proxy/application_1512210926531_0008/
17/12/02 12:00:10 INFO mapreduce.Job: Running job: job_1512210926531_0008