How to Reduce size of Schedular



val orders ="/home/cloudera/Public/retail_db/orders/part-00000").getLines.toList

I have created an RDD with 6 partitions :slight_smile

scala> val orderRdd = sc.parallelize(orders, 6)
orderRdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[1] at parallelize at :29

scala> orderRdd.first
18/03/27 16:56:09 WARN scheduler.TaskSetManager: Stage 1 contains a task of very large size (503 KB). The maximum recommended task size is 100 KB.
res9: String = 1,2013-07-25 00:00:00.0,11599,CLOSED

it’s complaining that stage 1 contains a task of very large size. How can we configure a task size to 100 KB.:anguished: I have partitioned RDD to 6 partition while creating orderRDD.



It’s just a ‘WARN’, not ‘ERROR’, you can ignore them. But still if you want to do optimization try below document:


Thank you so much :blush: