val orders = scala.io.Source.fromFile("/home/cloudera/Public/retail_db/orders/part-00000").getLines.toList
I have created an RDD with 6 partitions :slight_smile
scala> val orderRdd = sc.parallelize(orders, 6)
orderRdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD at parallelize at :29
18/03/27 16:56:09 WARN scheduler.TaskSetManager: Stage 1 contains a task of very large size (503 KB). The maximum recommended task size is 100 KB.
res9: String = 1,2013-07-25 00:00:00.0,11599,CLOSED
it’s complaining that stage 1 contains a task of very large size. How can we configure a task size to 100 KB. I have partitioned RDD to 6 partition while creating orderRDD.