Spark cluster configuration- Daily data size -- spark job submit configuration parameters (--num-executors, --executor-memory, --driver-memory)

pyspark
spark-shell
apache-spark
scala

#1

Hi, Can anyone help me on "how much daily data size will be processed at spark? If it is 10-12TB on avg, in that case how to set --num-executors, --executor-memory, --driver-memory while submitting the spark job? And also for the above data size assumption, how the cluster configuration will be in real time big data spark applications?
(Note: I’ve assumed 12TB , you can say how much it will be in real world and cluster configuration according to it)


Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster


#2

Hi team,
Any update on this?