Plain vanilla hadoop - Taking long time to run hadoop commands/jobs

@itversity

Hi, I’ve successfully setup a plain vanilla hadoop cluster as per your video series. But while practicing, i observed that it is taking long time to execute map reduce jobs or even the regular hadoop commands like ‘ls’.

Please suggest some performance improvement.
I’m using a 8GB RAM lenovo laptop with i7 core processor 2.6 GHz and DDR3 RAM.

@iamteja

Buddy, as you have mentioned hadoop cluster and laptop configuration,are you using virtual box for same?

thanks
Saurabh

@saurabhbhanwala
No buddy, I’m using VMware workstation.

@iamteja Okay, the virtual server you are using for running commands and executing MapReduce,please check its configurations.Slowness on virtual machines usually happen due to resources limitation.

Try allocating some more RAM to it ,see if performance improves.

Also monitor stats of your laptop to see if some process is using RAM,CPU or disk causing spikes.

Hope this will help!

thanks
Saurabh

@saurabhbhanwala
Thanks for the reply. I’ve currently 1GB memory assigned to my virtual machine on which am executing MapReduce. Below is my windows RAM usage screenshot. Do you think I should make it 2GB for my virtual machine? I doubt that would slow my laptop completely as I won’t be left with much RAM for other usage.
Let me know your thoughts.


Thank you!

@iamteja check cache ,try refreshing it…

@iamteja Any luck with performance or still facing issues?

@saurabhbhanwala
I’ve increased VM memory size to 2GB and I can hardly notice any difference :disappointed:

@iamteja the distribution you are using is standalone or pseudo?

if pseudo ,then may you can check what values we have for mapred.child.java.opts property.

Also run hadoop fs -ls / command or similar commands and see how much time they are taking.

@saurabhbhanwala
Thanks for the reply.

Its actually pseudo distribution that I had setup and mapred.child.java.opts=-Xmx200m.
Guess the heap size its using is 200MB. Not sure if am correct. Is that size sufficient? or do I need to increase?

Also, can you just gimme a brief about this property like how its used and in which config file I can increase or decrease it.

Its taking some 5-8 secs when I run hadoop fs -ls command and its taking more time when I issue it while a map reduce job is running in parallel.

@iamteja Ahh,so when you are running a mapreduce job it is becoming slow.

Can be possible ,as mapreduce is using resources at that time with more jvm ,cpu and ram.

I believe ,you can find the mentioned property in mapred-site.xml or mapred-default.xml .

https://stackoverflow.com/questions/24070557/what-is-the-relation-between-mapreduce-map-memory-mb-and-mapred-map-child-jav ,this may be helpful!