Getting Started - YARN Quick Preview

As part of this topic, we will explore about YARN

  • In certifications Spark typically runs in YARN mode
  • We should be able to check the memory configuration to understand the cluster capacity
    • /etc/hadoop/conf/yarn-site.xml
    • /etc/hadoop/conf/spark-env.sh
  • Spark default settings
    • Number of executors – 2
    • Memory – 1 GB
  • Quite often we underutilize resources. Understanding memory settings thoroughly and then mapping them with data size we are trying to process we can accelerate the execution of jobs.