I have cluster of 4 nodes, one is master node and 3 are worker nodes. Each have 16 GB RAM and 2 core, 50 GB disk storage on master and 25 GB on each worker node. I have stored 16 GB data on HDFS(CSV files).
I am trying to process this 16 GB data using PIG in map reduce mode.
When i submit a pig job, the application master is assigned on one worker node from three, job starts its execution and gets stuck at 94%. After that i came to know that the user(application) cache(7.9 GB) is full on another 2 nodes. Cache is not created on node where application master is started.
Is there any configuration to automatically clear the cache for running job ?