Exercise 22 - Compression and Map Reduce

compression
mapreduce
hadoop
#1

Problem statement

  • Run word count program on /public/randomtextwriter
  • Jar file location /usr/hdp/2.5.0.0-1245/hadoop-mapreduce/hadoop-mapreduce-examples.jar
  • Parameters for compression need to be set to gzip (both map output and final files) - here are the parameters
    mapreduce.output.fileoutputformat.compress
    mapreduce.output.fileoutputformat.compress.codec
    mapreduce.map.output.compress
    mapreduce.map.output.compress.codec
  • Get the compression codec for gzip from core-site.xml
  • Increase number of reducers to 6 mapreduce.job.reduces
  • Increase split size to 256 mb mapreduce.input.fileinputformat.split.minsize
  • Add all the parameters to configuration file of xml format and pass it as part of hadoop jar command --conf filename.xml

Please provide the following

  • hadoop jar command
  • Output of hadoop fs -ls on output directory
  • Counters

Make sure you understand the information provided by counters.

1 Like

#2

hadoop jar /usr/hdp/2.5.0.0-1245/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount --conf filename.xml
/public/randomtextwriter /user/paramesh/randomtextwriter_outputcompress

output

-rw-r–r-- 3 paramesh hdfs 0 2016-12-22 00:24 /user/paramesh/randomtextwriter_outputcompress/_SUCCESS
-rw-r–r-- 3 paramesh hdfs 9492364 2016-12-22 00:24 /user/paramesh/randomtextwriter_outputcompress/part-r-00000.gz
-rw-r–r-- 3 paramesh hdfs 9501513 2016-12-22 00:24 /user/paramesh/randomtextwriter_outputcompress/part-r-00001.gz
-rw-r–r-- 3 paramesh hdfs 9498540 2016-12-22 00:24 /user/paramesh/randomtextwriter_outputcompress/part-r-00002.gz
-rw-r–r-- 3 paramesh hdfs 9498600 2016-12-22 00:24 /user/paramesh/randomtextwriter_outputcompress/part-r-00003.gz
-rw-r–r-- 3 paramesh hdfs 9500911 2016-12-22 00:24 /user/paramesh/randomtextwriter_outputcompress/part-r-00004.gz
-rw-r–r-- 3 paramesh hdfs 9489039 2016-12-22 00:24 /user/paramesh/randomtextwriter_outputcompress/part-r-00005.gz

Launched map tasks=50
Launched reduce tasks=6
Data-local map tasks=50
Total time spent by all maps in occupied slots (ms)=4775511
Total time spent by all reduces in occupied slots (ms)=703569
Total time spent by all map tasks (ms)=4775511
Total time spent by all reduce tasks (ms)=703569
Total vcore-milliseconds taken by all map tasks=4775511
Total vcore-milliseconds taken by all reduce tasks=703569
Total megabyte-milliseconds taken by all map tasks=4890123264
Total megabyte-milliseconds taken by all reduce tasks=1080681984

0 Likes

#3

#hadoop jar command

hadoop jar /usr/hdp/2.5.0.0-1245/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount --conf conf.xml /public/randomtextwriter /user/infosnehasish/dec22out01


#conf.xml

<configuration>
    
    <property>
      <name>mapreduce.job.reduces</name>
      <value>6</value>
    </property>
	
	<property>
      <name>mapreduce.input.fileinputformat.split.minsize</name>
      <value>256000000</value>
    </property>
	
	
	
	<property>
      <name>mapreduce.output.fileoutputformat.compress</name>
      <value>true</value>
    </property>
	
	<property>
      <name>mapreduce.output.fileoutputformat.compress.codec</name>
      <value>org.apache.hadoop.io.compress.GzipCodec</value>
    </property>
	
	<property>
		<name>mapreduce.map.output.compress</name>
		<value>true</value>
	</property>
	
	<property>
      <name>mapreduce.map.output.compress.codec</name>
      <value>org.apache.hadoop.io.compress.GzipCodec</value>
    </property>
</configuration>

#Counters

16/12/22 00:12:55 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=258301492
FILE: Number of bytes written=382966808
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=11027760271
HDFS: Number of bytes written=56980967
HDFS: Number of read operations=168
HDFS: Number of large read operations=0
HDFS: Number of write operations=12
Job Counters
Launched map tasks=50
Launched reduce tasks=6
Data-local map tasks=50
Total time spent by all maps in occupied slots (ms)=4597512
Total time spent by all reduces in occupied slots (ms)=482695
Total time spent by all map tasks (ms)=4597512
Total time spent by all reduce tasks (ms)=482695
Total vcore-milliseconds taken by all map tasks=4597512
Total vcore-milliseconds taken by all reduce tasks=482695
Total megabyte-milliseconds taken by all map tasks=4707852288
Total megabyte-milliseconds taken by all reduce tasks=741419520
Map-Reduce Framework
Map input records=724222
Map output records=1009320852
Map output bytes=15225100073
Map output materialized bytes=148467163
Input split bytes=6600
Combine input records=1009320852
Combine output records=22691499
Reduce input groups=9905414
Reduce shuffle bytes=148467163
Reduce input records=22691499
Reduce output records=9905414
Spilled Records=66407780
Shuffled Maps =300
Failed Shuffles=0
Merged Map outputs=300
GC time elapsed (ms)=108905
CPU time spent (ms)=3161680
Physical memory (bytes) snapshot=44328914944
Virtual memory (bytes) snapshot=160603971584
Total committed heap usage (bytes)=44898975744
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=11027753671
File Output Format Counters
Bytes Written=56980967


#Output of hadoop fs -ls on output directory

[infosnehasish@gw01 ~]$ hdfs dfs -ls -R /user/infosnehasish/dec22out01
-rw-r–r-- 3 infosnehasish hdfs 0 2016-12-22 00:12 /user/infosnehasish/dec22out01/_SUCCESS
-rw-r–r-- 3 infosnehasish hdfs 9492364 2016-12-22 00:12 /user/infosnehasish/dec22out01/part-r-00000.gz
-rw-r–r-- 3 infosnehasish hdfs 9501513 2016-12-22 00:12 /user/infosnehasish/dec22out01/part-r-00001.gz
-rw-r–r-- 3 infosnehasish hdfs 9498540 2016-12-22 00:12 /user/infosnehasish/dec22out01/part-r-00002.gz
-rw-r–r-- 3 infosnehasish hdfs 9498600 2016-12-22 00:12 /user/infosnehasish/dec22out01/part-r-00003.gz
-rw-r–r-- 3 infosnehasish hdfs 9500911 2016-12-22 00:12 /user/infosnehasish/dec22out01/part-r-00004.gz
-rw-r–r-- 3 infosnehasish hdfs 9489039 2016-12-22 00:12 /user/infosnehasish/dec22out01/part-r-00005.gz

0 Likes

#4

hadoop jar command ::

hadoop jar /usr/hdp/2.5.0.0-1245/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount --conf conf.xml /public/randomtextwriter /user/saswat232/wordconf

Output of hadoop fs -ls on output directory ::

[saswat232@gw01 ~]$ hadoop dfs -ls /user/saswat232/wordconf
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Found 7 items
-rw-r–r-- 3 saswat232 hdfs 0 2016-12-22 00:26 /user/saswat232/wordconf/_SUCCESS
-rw-r–r-- 3 saswat232 hdfs 9492364 2016-12-22 00:26 /user/saswat232/wordconf/part-r-00000.gz
-rw-r–r-- 3 saswat232 hdfs 9501513 2016-12-22 00:26 /user/saswat232/wordconf/part-r-00001.gz
-rw-r–r-- 3 saswat232 hdfs 9498540 2016-12-22 00:26 /user/saswat232/wordconf/part-r-00002.gz
-rw-r–r-- 3 saswat232 hdfs 9498600 2016-12-22 00:26 /user/saswat232/wordconf/part-r-00003.gz
-rw-r–r-- 3 saswat232 hdfs 9500911 2016-12-22 00:26 /user/saswat232/wordconf/part-r-00004.gz
-rw-r–r-- 3 saswat232 hdfs 9489039 2016-12-22 00:26 /user/saswat232/wordconf/part-r-00005.gz

Counters ::

Job Counters
Launched map tasks=50
Launched reduce tasks=6
Data-local map tasks=50
Total time spent by all maps in occupied slots (ms)=4125942
Total time spent by all reduces in occupied slots (ms)=721131
Total time spent by all map tasks (ms)=4125942
Total time spent by all reduce tasks (ms)=721131
Total vcore-milliseconds taken by all map tasks=4125942
Total vcore-milliseconds taken by all reduce tasks=721131
Total megabyte-milliseconds taken by all map tasks=4224964608
Total megabyte-milliseconds taken by all reduce tasks=1107657216

<property>
  <name>mapreduce.output.fileoutputformat.compress</name>
  <value>true</value>
</property>

<property>
  <name>mapreduce.output.fileoutputformat.compress.codec</name>
  <value>org.apache.hadoop.io.compress.GzipCodec</value>
</property>

<property>
  <name>mapreduce.map.output.compress</name>
  <value>true</value>
</property>

<property>
  <name>mapreduce.map.output.compress.codec</name>
  <value>org.apache.hadoop.io.compress.GzipCodec</value>
</property>

<property>
  <name>mapreduce.job.reduces</name>
  <value>6</value>
</property>

<property>
  <name>mapreduce.input.fileinputformat.split.minsize</name>
  <value>268435456</value>
</property>
0 Likes

#5

#hadoop jar command

hadoop jar hadoop-mapreduce-examples.jar wordcount --conf config.xml /public/randomtextwriter /user/farhanmisarwala/output/mr-output

###config.xml

 <configuration>

    <property>
      <name>mapreduce.output.fileoutputformat.compress</name>
      <value>true</value>
    </property>

    <property>
      <name>mapreduce.output.fileoutputformat.compress.codec</name>
      <value>org.apache.hadoop.io.compress.GzipCodec</value>
    </property>

    <property>
      <name>mapreduce.map.output.compress</name>
      <value>true</value>
    </property>

    <property>
      <name>mapreduce.map.output.compress.codec</name>
      <value>org.apache.hadoop.io.compress.GzipCodec</value>
    </property>

    <property>
      <name>mapreduce.job.reduces</name>
      <value>6</value>
    </property>

    <property>
      <name>mapreduce.input.fileinputformat.split.minsize</name>
      <value>268435456</value>
    </property>

</configuration>

#Output of hadoop fs -ls on output directory

[farhanmisarwala@gw01 jars]$ hadoop fs -ls /user/farhanmisarwala/output/mr-output
Found 7 items
-rw-r--r--   3 farhanmisarwala hdfs          0 2016-12-22 00:34 /user/farhanmisarwala/output/mr-output/_SUCCESS
-rw-r--r--   3 farhanmisarwala hdfs    9492364 2016-12-22 00:34 /user/farhanmisarwala/output/mr-output/part-r-00000.gz
-rw-r--r--   3 farhanmisarwala hdfs    9501513 2016-12-22 00:34 /user/farhanmisarwala/output/mr-output/part-r-00001.gz
-rw-r--r--   3 farhanmisarwala hdfs    9498540 2016-12-22 00:34 /user/farhanmisarwala/output/mr-output/part-r-00002.gz
-rw-r--r--   3 farhanmisarwala hdfs    9498600 2016-12-22 00:34 /user/farhanmisarwala/output/mr-output/part-r-00003.gz
-rw-r--r--   3 farhanmisarwala hdfs    9500911 2016-12-22 00:34 /user/farhanmisarwala/output/mr-output/part-r-00004.gz
-rw-r--r--   3 farhanmisarwala hdfs    9489039 2016-12-22 00:34 /user/farhanmisarwala/output/mr-output/part-r-00005.gz

#Counters

16/12/22 00:23:07 INFO impl.TimelineClientImpl: Timeline service address: http://rm01.itversity.com:8188/ws/v1/timeline/
16/12/22 00:23:07 INFO client.RMProxy: Connecting to ResourceManager at rm01.itversity.com/172.16.1.106:8050
16/12/22 00:23:07 INFO client.AHSProxy: Connecting to Application History server at rm01.itversity.com/172.16.1.106:10200
16/12/22 00:23:08 INFO input.FileInputFormat: Total input paths to process : 10
16/12/22 00:23:09 INFO mapreduce.JobSubmitter: number of splits:50
16/12/22 00:23:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1480307771710_4336
16/12/22 00:23:09 INFO impl.YarnClientImpl: Submitted application application_1480307771710_4336
16/12/22 00:23:10 INFO mapreduce.Job: The url to track the job: http://rm01.itversity.com:8088/proxy/application_1480307771710_4336/
16/12/22 00:23:10 INFO mapreduce.Job: Running job: job_1480307771710_4336
16/12/22 00:28:25 INFO mapreduce.Job: Job job_1480307771710_4336 running in uber mode : false
16/12/22 00:28:25 INFO mapreduce.Job:  map 0% reduce 0%
16/12/22 00:30:53 INFO mapreduce.Job:  map 1% reduce 0%
16/12/22 00:30:56 INFO mapreduce.Job:  map 2% reduce 0%
16/12/22 00:30:59 INFO mapreduce.Job:  map 3% reduce 0%
16/12/22 00:31:02 INFO mapreduce.Job:  map 4% reduce 0%
16/12/22 00:31:04 INFO mapreduce.Job:  map 5% reduce 0%
16/12/22 00:31:05 INFO mapreduce.Job:  map 6% reduce 0%
16/12/22 00:31:08 INFO mapreduce.Job:  map 7% reduce 0%
16/12/22 00:31:10 INFO mapreduce.Job:  map 8% reduce 0%
16/12/22 00:31:13 INFO mapreduce.Job:  map 9% reduce 0%
16/12/22 00:31:16 INFO mapreduce.Job:  map 10% reduce 0%
16/12/22 00:31:17 INFO mapreduce.Job:  map 11% reduce 0%
16/12/22 00:31:19 INFO mapreduce.Job:  map 12% reduce 0%
16/12/22 00:31:20 INFO mapreduce.Job:  map 13% reduce 0%
16/12/22 00:31:22 INFO mapreduce.Job:  map 14% reduce 0%
16/12/22 00:31:23 INFO mapreduce.Job:  map 15% reduce 0%
16/12/22 00:31:26 INFO mapreduce.Job:  map 16% reduce 0%
16/12/22 00:31:28 INFO mapreduce.Job:  map 17% reduce 0%
16/12/22 00:31:29 INFO mapreduce.Job:  map 18% reduce 0%
16/12/22 00:31:32 INFO mapreduce.Job:  map 19% reduce 0%
16/12/22 00:31:34 INFO mapreduce.Job:  map 20% reduce 0%
16/12/22 00:31:37 INFO mapreduce.Job:  map 21% reduce 0%
16/12/22 00:31:38 INFO mapreduce.Job:  map 22% reduce 0%
16/12/22 00:31:40 INFO mapreduce.Job:  map 23% reduce 0%
16/12/22 00:31:44 INFO mapreduce.Job:  map 24% reduce 0%
16/12/22 00:31:48 INFO mapreduce.Job:  map 25% reduce 0%
16/12/22 00:31:51 INFO mapreduce.Job:  map 26% reduce 0%
16/12/22 00:31:55 INFO mapreduce.Job:  map 27% reduce 0%
16/12/22 00:31:59 INFO mapreduce.Job:  map 28% reduce 0%
16/12/22 00:32:02 INFO mapreduce.Job:  map 29% reduce 0%
16/12/22 00:32:04 INFO mapreduce.Job:  map 30% reduce 0%
16/12/22 00:32:08 INFO mapreduce.Job:  map 31% reduce 0%
16/12/22 00:32:12 INFO mapreduce.Job:  map 32% reduce 0%
16/12/22 00:32:14 INFO mapreduce.Job:  map 33% reduce 0%
16/12/22 00:32:16 INFO mapreduce.Job:  map 34% reduce 0%
16/12/22 00:32:18 INFO mapreduce.Job:  map 35% reduce 0%
16/12/22 00:32:20 INFO mapreduce.Job:  map 36% reduce 0%
16/12/22 00:32:22 INFO mapreduce.Job:  map 37% reduce 0%
16/12/22 00:32:24 INFO mapreduce.Job:  map 38% reduce 0%
16/12/22 00:32:26 INFO mapreduce.Job:  map 39% reduce 0%
16/12/22 00:32:27 INFO mapreduce.Job:  map 40% reduce 0%
16/12/22 00:32:29 INFO mapreduce.Job:  map 41% reduce 0%
16/12/22 00:32:31 INFO mapreduce.Job:  map 43% reduce 0%
16/12/22 00:32:32 INFO mapreduce.Job:  map 45% reduce 0%
16/12/22 00:32:34 INFO mapreduce.Job:  map 47% reduce 0%
16/12/22 00:32:35 INFO mapreduce.Job:  map 48% reduce 0%
16/12/22 00:32:36 INFO mapreduce.Job:  map 49% reduce 0%
16/12/22 00:32:37 INFO mapreduce.Job:  map 50% reduce 0%
16/12/22 00:32:40 INFO mapreduce.Job:  map 51% reduce 0%
16/12/22 00:32:42 INFO mapreduce.Job:  map 52% reduce 0%
16/12/22 00:32:45 INFO mapreduce.Job:  map 52% reduce 1%
16/12/22 00:32:46 INFO mapreduce.Job:  map 53% reduce 1%
16/12/22 00:32:47 INFO mapreduce.Job:  map 55% reduce 1%
16/12/22 00:32:48 INFO mapreduce.Job:  map 57% reduce 2%
16/12/22 00:32:50 INFO mapreduce.Job:  map 58% reduce 3%
16/12/22 00:32:51 INFO mapreduce.Job:  map 60% reduce 3%
16/12/22 00:32:52 INFO mapreduce.Job:  map 60% reduce 5%
16/12/22 00:32:54 INFO mapreduce.Job:  map 61% reduce 6%
16/12/22 00:32:56 INFO mapreduce.Job:  map 62% reduce 6%
16/12/22 00:32:57 INFO mapreduce.Job:  map 63% reduce 8%
16/12/22 00:32:59 INFO mapreduce.Job:  map 63% reduce 10%
16/12/22 00:33:00 INFO mapreduce.Job:  map 63% reduce 11%
16/12/22 00:33:01 INFO mapreduce.Job:  map 64% reduce 11%
16/12/22 00:33:02 INFO mapreduce.Job:  map 65% reduce 13%
16/12/22 00:33:03 INFO mapreduce.Job:  map 67% reduce 13%
16/12/22 00:33:04 INFO mapreduce.Job:  map 69% reduce 13%
16/12/22 00:33:05 INFO mapreduce.Job:  map 70% reduce 13%
16/12/22 00:33:06 INFO mapreduce.Job:  map 71% reduce 15%
16/12/22 00:33:08 INFO mapreduce.Job:  map 73% reduce 16%
16/12/22 00:33:09 INFO mapreduce.Job:  map 74% reduce 17%
16/12/22 00:33:10 INFO mapreduce.Job:  map 75% reduce 17%
16/12/22 00:33:14 INFO mapreduce.Job:  map 76% reduce 17%
16/12/22 00:33:17 INFO mapreduce.Job:  map 77% reduce 18%
16/12/22 00:33:18 INFO mapreduce.Job:  map 79% reduce 18%
16/12/22 00:33:19 INFO mapreduce.Job:  map 81% reduce 18%
16/12/22 00:33:20 INFO mapreduce.Job:  map 83% reduce 18%
16/12/22 00:33:22 INFO mapreduce.Job:  map 84% reduce 19%
16/12/22 00:33:24 INFO mapreduce.Job:  map 85% reduce 19%
16/12/22 00:33:25 INFO mapreduce.Job:  map 86% reduce 19%
16/12/22 00:33:26 INFO mapreduce.Job:  map 87% reduce 20%
16/12/22 00:33:28 INFO mapreduce.Job:  map 88% reduce 21%
16/12/22 00:33:29 INFO mapreduce.Job:  map 89% reduce 21%
16/12/22 00:33:30 INFO mapreduce.Job:  map 89% reduce 22%
16/12/22 00:33:31 INFO mapreduce.Job:  map 89% reduce 23%
16/12/22 00:33:32 INFO mapreduce.Job:  map 90% reduce 23%
16/12/22 00:33:33 INFO mapreduce.Job:  map 91% reduce 24%
16/12/22 00:33:36 INFO mapreduce.Job:  map 91% reduce 25%
16/12/22 00:33:40 INFO mapreduce.Job:  map 92% reduce 25%
16/12/22 00:33:48 INFO mapreduce.Job:  map 93% reduce 26%
16/12/22 00:33:51 INFO mapreduce.Job:  map 93% reduce 27%
16/12/22 00:33:52 INFO mapreduce.Job:  map 94% reduce 27%
16/12/22 00:33:54 INFO mapreduce.Job:  map 95% reduce 27%
16/12/22 00:33:56 INFO mapreduce.Job:  map 95% reduce 28%
16/12/22 00:33:59 INFO mapreduce.Job:  map 95% reduce 29%
16/12/22 00:34:00 INFO mapreduce.Job:  map 96% reduce 29%
16/12/22 00:34:03 INFO mapreduce.Job:  map 97% reduce 29%
16/12/22 00:34:05 INFO mapreduce.Job:  map 97% reduce 30%
16/12/22 00:34:06 INFO mapreduce.Job:  map 98% reduce 30%
16/12/22 00:34:07 INFO mapreduce.Job:  map 99% reduce 30%
16/12/22 00:34:08 INFO mapreduce.Job:  map 99% reduce 31%
16/12/22 00:34:09 INFO mapreduce.Job:  map 99% reduce 32%
16/12/22 00:34:21 INFO mapreduce.Job:  map 99% reduce 33%
16/12/22 00:34:31 INFO mapreduce.Job:  map 100% reduce 33%
16/12/22 00:34:35 INFO mapreduce.Job:  map 100% reduce 35%
16/12/22 00:34:36 INFO mapreduce.Job:  map 100% reduce 39%
16/12/22 00:34:37 INFO mapreduce.Job:  map 100% reduce 40%
16/12/22 00:34:38 INFO mapreduce.Job:  map 100% reduce 43%
16/12/22 00:34:39 INFO mapreduce.Job:  map 100% reduce 47%
16/12/22 00:34:40 INFO mapreduce.Job:  map 100% reduce 49%
16/12/22 00:34:41 INFO mapreduce.Job:  map 100% reduce 52%
16/12/22 00:34:42 INFO mapreduce.Job:  map 100% reduce 58%
16/12/22 00:34:43 INFO mapreduce.Job:  map 100% reduce 60%
16/12/22 00:34:44 INFO mapreduce.Job:  map 100% reduce 64%
16/12/22 00:34:45 INFO mapreduce.Job:  map 100% reduce 68%
16/12/22 00:34:46 INFO mapreduce.Job:  map 100% reduce 69%
16/12/22 00:34:47 INFO mapreduce.Job:  map 100% reduce 72%
16/12/22 00:34:48 INFO mapreduce.Job:  map 100% reduce 78%
16/12/22 00:34:49 INFO mapreduce.Job:  map 100% reduce 79%
16/12/22 00:34:50 INFO mapreduce.Job:  map 100% reduce 88%
16/12/22 00:34:51 INFO mapreduce.Job:  map 100% reduce 90%
16/12/22 00:34:52 INFO mapreduce.Job:  map 100% reduce 92%
16/12/22 00:34:53 INFO mapreduce.Job:  map 100% reduce 94%
16/12/22 00:34:54 INFO mapreduce.Job:  map 100% reduce 96%
16/12/22 00:34:55 INFO mapreduce.Job:  map 100% reduce 99%
16/12/22 00:34:56 INFO mapreduce.Job:  map 100% reduce 100%
16/12/22 00:34:58 INFO mapreduce.Job: Job job_1480307771710_4336 completed successfully
16/12/22 00:34:58 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=263094019
                FILE: Number of bytes written=388073782
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=11027891343
                HDFS: Number of bytes written=56980967
                HDFS: Number of read operations=168
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=12
        Job Counters
                Launched map tasks=50
                Launched reduce tasks=6
                Data-local map tasks=50
                Total time spent by all maps in occupied slots (ms)=4312799
                Total time spent by all reduces in occupied slots (ms)=791455
                Total time spent by all map tasks (ms)=4312799
                Total time spent by all reduce tasks (ms)=791455
                Total vcore-milliseconds taken by all map tasks=4312799
                Total vcore-milliseconds taken by all reduce tasks=791455
                Total megabyte-milliseconds taken by all map tasks=4416306176
                Total megabyte-milliseconds taken by all reduce tasks=1215674880
        Map-Reduce Framework
                Map input records=724222
                Map output records=1009320852
                Map output bytes=15225100073
                Map output materialized bytes=147190776
                Input split bytes=6600
                Combine input records=1009320852
                Combine output records=22630930
                Reduce input groups=9905414
                Reduce shuffle bytes=147190776
                Reduce input records=22630930
                Reduce output records=9905414
                Spilled Records=67239423
                Shuffled Maps =300
                Failed Shuffles=0
                Merged Map outputs=300
                GC time elapsed (ms)=94721
                CPU time spent (ms)=3215940
                Physical memory (bytes) snapshot=44258541568
                Virtual memory (bytes) snapshot=160554237952
                Total committed heap usage (bytes)=44638404608
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=11027884743
        File Output Format Counters
                Bytes Written=56980967
0 Likes

#6

conf file:


mapreduce.output.fileoutputformat.compress
true


mapreduce.output.fileoutputformat.compress.codec
org.apache.hadoop.io.compress.GzipCodec


mapreduce.map.output.compress
true


mapreduce.map.output.compress.codec
org.apache.hadoop.io.compress.GzipCodec


mapreduce.input.fileinputformat.split.minsize
268435456


mapreduce.job.reduces
6

hadoop jar command:
hadoop jar /usr/hdp/2.5.0.0-1245/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount -conf compression_conf.xml /public/randomtextwriter wordcount_compression

Output of hadoop fs -ls on output directory:
hadoop fs -ls wordcount_compression
Found 7 items
-rw-r–r-- 3 sumanthsharma21 hdfs 0 2016-12-22 00:37 wordcount_compression/_SUCCESS
-rw-r–r-- 3 sumanthsharma21 hdfs 9492364 2016-12-22 00:37 wordcount_compression/part-r-00000.gz
-rw-r–r-- 3 sumanthsharma21 hdfs 9501513 2016-12-22 00:37 wordcount_compression/part-r-00001.gz
-rw-r–r-- 3 sumanthsharma21 hdfs 9498540 2016-12-22 00:37 wordcount_compression/part-r-00002.gz
-rw-r–r-- 3 sumanthsharma21 hdfs 9498600 2016-12-22 00:37 wordcount_compression/part-r-00003.gz
-rw-r–r-- 3 sumanthsharma21 hdfs 9500911 2016-12-22 00:37 wordcount_compression/part-r-00004.gz
-rw-r–r-- 3 sumanthsharma21 hdfs 9489039 2016-12-22 00:37 wordcount_compression/part-r-00005.gz

Counters:
File System Counters
FILE: Number of bytes read=263094019
FILE: Number of bytes written=388077422
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=11027891343
HDFS: Number of bytes written=56980967
HDFS: Number of read operations=168
HDFS: Number of large read operations=0
HDFS: Number of write operations=12
Job Counters
Launched map tasks=50
Launched reduce tasks=6
Data-local map tasks=50
Total time spent by all maps in occupied slots (ms)=4269234
Total time spent by all reduces in occupied slots (ms)=801423
Total time spent by all map tasks (ms)=4269234
Total time spent by all reduce tasks (ms)=801423
Total vcore-milliseconds taken by all map tasks=4269234
Total vcore-milliseconds taken by all reduce tasks=801423
Total megabyte-milliseconds taken by all map tasks=4371695616
Total megabyte-milliseconds taken by all reduce tasks=1230985728
Map-Reduce Framework
Map input records=724222
Map output records=1009320852
Map output bytes=15225100073
Map output materialized bytes=147190776
Input split bytes=6600
Combine input records=1009320852
Combine output records=22630930
Reduce input groups=9905414
Reduce shuffle bytes=147190776
Reduce input records=22630930
Reduce output records=9905414
Spilled Records=67239423
Shuffled Maps =300
Failed Shuffles=0
Merged Map outputs=300
GC time elapsed (ms)=87672
CPU time spent (ms)=3214330
Physical memory (bytes) snapshot=44237713408
Virtual memory (bytes) snapshot=160544153600
Total committed heap usage (bytes)=44774195200
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=11027884743
File Output Format Counters
Bytes Written=56980967

0 Likes

#7

–config file

<property>
  <name>mapred.compress.map.output</name>
  <value>true</value>
</property>

<property>
  <name>mapred.map.output.compression.codec</name>
  <value>org.apache.hadoop.io.compress.DefaultCodec</value>
</property>

<property>
  <name>mapred.output.compress</name>
  <value>true</value>
</property>

<property>
  <name>mapred.output.compression.type</name>
  <value>block</value>
</property>

<property>
  <name>mapred.output.compression.codec</name>
  <value>org.apache.hadoop.io.compress.DefaultCodec</value>
</property>

<property>
<name>mapreduce.input.fileinputformat.split.minsize</name>
<value>268435456</value>
mapreduce.job.reduces 2 command: hadoop jar traBG.jar com.tavant.bg.Wordcount --conf /home/aruncse11/mapred-site.xml /public/randomtextwriter /user/aruncse11/arun/wordcountoutput

output:
[aruncse11@gw01 ~]$ hdfs dfs -ls /user/aruncse11/arun/wordcountoutput
Found 3 items
-rw-r–r-- 3 aruncse11 hdfs 0 2016-12-22 00:48 /user/aruncse11/arun/wordcountoutput/_SUCCESS
-rw-r–r-- 3 aruncse11 hdfs 25734808 2016-12-22 00:47 /user/aruncse11/arun/wordcountoutput/part-r-00000.deflate
-rw-r–r-- 3 aruncse11 hdfs 25726269 2016-12-22 00:48 /user/aruncse11/arun/wordcountoutput/part-r-00001.deflate
[

0 Likes

#8

Hadoop Jar Command:

hadoop jar /usr/hdp/2.5.0.0-1245/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount --conf jobconfig.xml /public/randomtextwriter /user/mahesh007/jobconfOut

Output:

[mahesh007@gw01 ~]$ hadoop fs -ls /user/mahesh007/jobconfOut
Found 8 items
-rw-r–r-- 3 mahesh007 hdfs 0 2016-12-22 00:59 /user/mahesh007/jobconfOut/_SUCCESS
-rw-r–r-- 3 mahesh007 hdfs 9492364 2016-12-22 00:59 /user/mahesh007/jobconfOut/part-r-00000.gz
-rw-r–r-- 3 mahesh007 hdfs 9501513 2016-12-22 00:59 /user/mahesh007/jobconfOut/part-r-00001.gz
-rw-r–r-- 3 mahesh007 hdfs 9498540 2016-12-22 00:59 /user/mahesh007/jobconfOut/part-r-00002.gz
-rw-r–r-- 3 mahesh007 hdfs 9498600 2016-12-22 00:59 /user/mahesh007/jobconfOut/part-r-00003.gz
-rw-r–r-- 3 mahesh007 hdfs 9500911 2016-12-22 00:59 /user/mahesh007/jobconfOut/part-r-00004.gz
-rw-r–r-- 3 mahesh007 hdfs 9489039 2016-12-22 00:59 /user/mahesh007/jobconfOut/part-r-00005.gz

Counters:

16/12/22 00:59:45 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=263094019
FILE: Number of bytes written=388071406
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=11027891343
HDFS: Number of bytes written=56980967
HDFS: Number of read operations=168
HDFS: Number of large read operations=0
HDFS: Number of write operations=12
Job Counters
Launched map tasks=50
Launched reduce tasks=6
Data-local map tasks=50
Total time spent by all maps in occupied slots (ms)=4479345
Total time spent by all reduces in occupied slots (ms)=673487
Total time spent by all map tasks (ms)=4479345
Total time spent by all reduce tasks (ms)=673487
Total vcore-milliseconds taken by all map tasks=4479345
Total vcore-milliseconds taken by all reduce tasks=673487
Total megabyte-milliseconds taken by all map tasks=4586849280
Total megabyte-milliseconds taken by all reduce tasks=1034476032
Map-Reduce Framework
Map input records=724222
Map output records=1009320852
Map output bytes=15225100073
Map output materialized bytes=147190776
Input split bytes=6600
Combine input records=1009320852
Combine output records=22630930
Reduce input groups=9905414
Reduce shuffle bytes=147190776
Reduce input records=22630930
Reduce output records=9905414
Spilled Records=67239423
Shuffled Maps =300
Failed Shuffles=0
Merged Map outputs=300
GC time elapsed (ms)=98386
CPU time spent (ms)=3230770
Physical memory (bytes) snapshot=44287664128
Virtual memory (bytes) snapshot=160598126592
Total committed heap usage (bytes)=44850216960
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=11027884743
File Output Format Counters
Bytes Written=56980967

0 Likes

#9

hadoop jar command

hadoop jar \
/usr/hdp/2.5.0.0-1245/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
wordcount \
--conf override.xml \
/public/randomtextwriter \
/user/jasonbourne/confoverride1

Output of hadoop fs -ls on output directory

Found 7 items
-rw-r--r--   3 jasonbourne hdfs          0 2016-12-22 00:57 /user/jasonbourne/confoverride1/_SUCCESS
-rw-r--r--   3 jasonbourne hdfs    9492364 2016-12-22 00:57 /user/jasonbourne/confoverride1/part-r-00000.gz
-rw-r--r--   3 jasonbourne hdfs    9501513 2016-12-22 00:57 /user/jasonbourne/confoverride1/part-r-00001.gz
-rw-r--r--   3 jasonbourne hdfs    9498540 2016-12-22 00:57 /user/jasonbourne/confoverride1/part-r-00002.gz
-rw-r--r--   3 jasonbourne hdfs    9498600 2016-12-22 00:57 /user/jasonbourne/confoverride1/part-r-00003.gz
-rw-r--r--   3 jasonbourne hdfs    9500911 2016-12-22 00:57 /user/jasonbourne/confoverride1/part-r-00004.gz
-rw-r--r--   3 jasonbourne hdfs    9489039 2016-12-22 00:57 /user/jasonbourne/confoverride1/part-r-00005.gz

Counters

File System Counters
                FILE: Number of bytes read=263094019
                FILE: Number of bytes written=388072246
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=11027891343
                HDFS: Number of bytes written=56980967
                HDFS: Number of read operations=168
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=12
        Job Counters
                Launched map tasks=50
                Launched reduce tasks=6
                Data-local map tasks=50
                Total time spent by all maps in occupied slots (ms)=4567998
                Total time spent by all reduces in occupied slots (ms)=713546
                Total time spent by all map tasks (ms)=4567998
                Total time spent by all reduce tasks (ms)=713546
                Total vcore-milliseconds taken by all map tasks=4567998
                Total vcore-milliseconds taken by all reduce tasks=713546
                Total megabyte-milliseconds taken by all map tasks=4677629952
                Total megabyte-milliseconds taken by all reduce tasks=1096006656
        Map-Reduce Framework
                Map input records=724222
                Map output records=1009320852
                Map output bytes=15225100073
                Map output materialized bytes=147190776
                Input split bytes=6600
                Combine input records=1009320852
                Combine output records=22630930
                Reduce input groups=9905414
                Reduce shuffle bytes=147190776
                Reduce input records=22630930
                Reduce output records=9905414
                Spilled Records=67239423
                Shuffled Maps =300
                Failed Shuffles=0
                Merged Map outputs=300
                GC time elapsed (ms)=101925
                CPU time spent (ms)=3211130
                Physical memory (bytes) snapshot=44274192384
                Virtual memory (bytes) snapshot=160586977280
                Total committed heap usage (bytes)=44883771392
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=11027884743
        File Output Format Counters
                Bytes Written=56980967
0 Likes

#10

hadoop jar /usr/hdp/2.5.0.0-1245/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount --conf compressConfig.xml /public/randomtextwriter /user/nagellarajashyam/randomtextwriter_compress_output_2

o/p:

[nagellarajashyam@gw01 ~]$ hdfs dfs -ls hdfs://nn01.itversity.com:8020/user/nagellarajashyam/ran*2 Found 7 items
_-rw-r–r-- 3 nagellarajashyam hdfs 0 2016-12-22 00:39 hdfs://nn01.itversity.com:8020/user/nagellarajashyam/randomtextwriter_output_2/SUCCESS
-rw-r–r-- 3 nagellarajashyam hdfs 9492364 2016-12-22 00:39 hdfs://nn01.itversity.com:8020/user/nagellarajashyam/randomtextwriter_output_2/part-r-00000.gz
-rw-r–r-- 3 nagellarajashyam hdfs 9501513 2016-12-22 00:39 hdfs://nn01.itversity.com:8020/user/nagellarajashyam/randomtextwriter_output_2/part-r-00001.gz
-rw-r–r-- 3 nagellarajashyam hdfs 9498540 2016-12-22 00:39 hdfs://nn01.itversity.com:8020/user/nagellarajashyam/randomtextwriter_output_2/part-r-00002.gz
-rw-r–r-- 3 nagellarajashyam hdfs 9498600 2016-12-22 00:39 hdfs://nn01.itversity.com:8020/user/nagellarajashyam/randomtextwriter_output_2/part-r-00003.gz
-rw-r–r-- 3 nagellarajashyam hdfs 9500911 2016-12-22 00:39 hdfs://nn01.itversity.com:8020/user/nagellarajashyam/randomtextwriter_output_2/part-r-00004.gz
-rw-r–r-- 3 nagellarajashyam hdfs 9489039 2016-12-22 00:39 hdfs://nn01.itversity.com:8020/user/nagellarajashyam/randomtextwriter_output_2/part-r-00005.gz

configfile:

 <configuration>
	<property>
		<name>mapreduce.job.reduces</name>
		<value>6</value>
	</property>
	<property>
		<name>mapreduce.input.fileinputformat.split.minsize</name>
		<value>256000000</value>
	</property>
	<property>
		<name>mapreduce.output.fileoutputformat.compress</name>
		<value>true</value>
	</property>
	<property>
		<name>mapreduce.output.fileoutputformat.compress.codec</name>
		<value>org.apache.hadoop.io.compress.GzipCodec</value>
	</property>
	<property>
		<name>mapreduce.map.output.compress</name>
		<value>true</value>
	</property>
	<property>
		<name>mapreduce.map.output.compress.codec</name>
		<value>org.apache.hadoop.io.compress.GzipCodec</value>
	</property>
</configuration>
0 Likes

#11

hadoop jar /usr/hdp/2.5.0.0-1245/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount --conf Configurstion.xml /public/randomtextwriter /user/parulshine92/randomtextOutPut


mapreduce.job.reduces 6 mapreduce.input.fileinputformat.split.minsize 256000000 io.compression.codecs org.apache.hadoop.io.compress.GzipCodec mapreduce.output.fileoutputformat.compress true mapreduce.output.fileoutputformat.compress.codec org.apache.hadoop.io.compress.GzipCodec mapreduce.map.output.compress true mapreduce.map.output.compress.codec org.apache.hadoop.io.compress.GzipCodec
--------------------------------------------------------------------------------

File System Counters
FILE: Number of bytes read=258301492
FILE: Number of bytes written=382966808
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=11027760271
HDFS: Number of bytes written=56980967
HDFS: Number of read operations=168
HDFS: Number of large read operations=0
HDFS: Number of write operations=12
Job Counters
Launched map tasks=50
Launched reduce tasks=6
Data-local map tasks=50
Total time spent by all maps in occupied slots (ms)=4597512
Total time spent by all reduces in occupied slots (ms)=482695
Total time spent by all map tasks (ms)=4597512
Total time spent by all reduce tasks (ms)=482695
Total vcore-milliseconds taken by all map tasks=4597512
Total vcore-milliseconds taken by all reduce tasks=482695

0 Likes