Difference between spark and map reduce


we generally explain that spark is in memory process and hence it is faster than MR.

here is a scenario:
We need to process 500 gb of data whereas total executors memory(let us assume 10 nodes, each with 10gb memory) is 100gb in which case we will be using 400gb of disk IO in spark as well. Then how it is better than MR?


Even when executing on disk, Spark is faster. The main reasons is Spark uses a DAG, so it does not evaluate until an action is called. Even before calling the DAG, there will be optimizations performed and then the action is called. Wherein Hadoop does not use a DAG.