Apache Spark - How to determine executors for a given Spark Job?


Following is the question from one of my Self Paced Data Engineering Bootcamp Student.

How does a developer arrive at a decision to pass control arguments to override the executor memory and cores in a spark job ? Is there a decision-making hierarchy in engineering teams that the developer would have to go through?

As part of this live session/pre-recorded video, I will answer the above question. Here are the details which need to be understood.

  • Cluster Capacity - YARN (or Mesos)
  • Static Allocation vs. Dynamic Allocation
  • Determining and use Capacity based on the requirement
  • Setting Properties at Run Time
  • Setting Properties Programmatically
  • Overview of --num-executors, --executor-cores, --executor-memory
  • Decision Making Hierarchy

Demos are given using our state of the art labs. If you are interested you can sign up at https://labs.itversity.com