Set #of TASK in Spark not working

Hello Team,

I am executing below code to limit the number of TASK to be assigned to execute my program, but I still see that 200 tasks are getting executed for my program:

Code

sqlContext.sql(“set spark.sql.suffle.partition = 10”);

DataFrame[key: string, value: string]

TotalOrdRevPerDay = sqlContext.sql(“select o.order_date, count(o.order_id), sum(oi.order_item_subtotal) from shitansu1_db.orders o, shitansu1_db.order_items oi where o.order_id = oi.order_item_order_id group by o.order_date order by o.order_date”)

17/01/26 16:32:12 INFO ParseDriver: Parsing command: select o.order_date, count(o.order_id), sum(oi.order_item_subtotal) from shitansu1_db.orders o, shitansu1_db.order_items oi where o.order_id = oi.order_item_order_id group by o.order_date order by o.order_date
17/01/26 16:32:12 INFO ParseDriver: Parse Completed

for i in TotalOrdRevPerDay.take(10):
… print(i)

17/01/26 16:33:56 INFO Executor: Finished task 194.0 in stage 61.0 (TID 1287). 2167 bytes result sent to driver
17/01/26 16:33:56 INFO TaskSetManager: Finished task 194.0 in stage 61.0 (TID 1287) in 433 ms on localhost (198/200)
17/01/26 16:33:56 INFO Executor: Finished task 195.0 in stage 61.0 (TID 1288). 2167 bytes result sent to driver
17/01/26 16:33:56 INFO TaskSetManager: Finished task 195.0 in stage 61.0 (TID 1288) in 416 ms on localhost (199/200)
17/01/26 16:33:56 INFO Executor: Finished task 198.0 in stage 61.0 (TID 1291). 2167 bytes result sent to driver
17/01/26 16:33:56 INFO TaskSetManager: Finished task 198.0 in stage 61.0 (TID 1291) in 404 ms on localhost (200/200)
17/01/26 16:33:56 INFO TaskSchedulerImpl: Removed TaskSet 61.0, whose tasks have all completed, from pool
17/01/26 16:33:56 INFO DAGScheduler: ShuffleMapStage 61 (take at :1) finished in 5.924 s

17/01/26 16:33:56 INFO ShuffleBlockFetcherIterator: Getting 168 non-empty blocks out of 200 blocks
17/01/26 16:33:56 INFO ShuffleBlockFetcherIterator: Getting 168 non-empty blocks out of 200 blocks
17/01/26 16:33:56 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/01/26 16:33:56 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/01/26 16:33:56 INFO ShuffleBlockFetcherIterator: Getting 168 non-empty blocks out of 200 blocks
17/01/26 16:33:56 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
17/01/26 16:33:56 INFO ShuffleBlockFetcherIterator: Getting 168 non-empty blocks out of 200 blocks
17/01/26 16:33:56 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/01/26 16:33:56 INFO ShuffleBlockFetcherIterator: Getting 168 non-empty blocks out of 200 blocks
17/01/26 16:33:56 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/01/26 16:33:56 INFO Executor: Finished task 4.0 in stage 65.0 (TID 1298). 2128 bytes result sent to driver
17/01/26 16:33:56 INFO TaskSetManager: Finished task 4.0 in stage 65.0 (TID 1298) in 9 ms on localhost (1/7)
17/01/26 16:33:56 INFO Executor: Finished task 0.0 in stage 65.0 (TID 1294). 2201 bytes result sent to driver
17/01/26 16:33:56 INFO TaskSetManager: Finished task 0.0 in stage 65.0 (TID 1294) in 12 ms on localhost (2/7)
17/01/26 16:33:56 INFO ShuffleBlockFetcherIterator: Getting 168 non-empty blocks out of 200 blocks
17/01/26 16:33:56 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/01/26 16:33:56 INFO Executor: Finished task 1.0 in stage 65.0 (TID 1295). 2201 bytes result sent to driver
17/01/26 16:33:56 INFO Executor: Finished task 3.0 in stage 65.0 (TID 1297). 2201 bytes result sent to driver
17/01/26 16:33:56 INFO Executor: Finished task 6.0 in stage 65.0 (TID 1300). 2201 bytes result sent to driver
17/01/26 16:33:56 INFO TaskSetManager: Finished task 1.0 in stage 65.0 (TID 1295) in 17 ms on localhost (3/7)
17/01/26 16:33:56 INFO Executor: Finished task 2.0 in stage 65.0 (TID 1296). 2201 bytes result sent to driver
17/01/26 16:33:56 INFO TaskSetManager: Finished task 3.0 in stage 65.0 (TID 1297) in 17 ms on localhost (4/7)
17/01/26 16:33:56 INFO TaskSetManager: Finished task 6.0 in stage 65.0 (TID 1300) in 17 ms on localhost (5/7)
17/01/26 16:33:56 INFO TaskSetManager: Finished task 2.0 in stage 65.0 (TID 1296) in 18 ms on localhost (6/7)
17/01/26 16:33:56 INFO ShuffleBlockFetcherIterator: Getting 168 non-empty blocks out of 200 blocks
17/01/26 16:33:56 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/01/26 16:33:56 INFO Executor: Finished task 5.0 in stage 65.0 (TID 1299). 2201 bytes result sent to driver
17/01/26 16:33:56 INFO TaskSetManager: Finished task 5.0 in stage 65.0 (TID 1299) in 24 ms on localhost (7/7)
17/01/26 16:33:56 INFO TaskSchedulerImpl: Removed TaskSet 65.0, whose tasks have all completed, from pool
17/01/26 16:33:56 INFO DAGScheduler: ResultStage 65 (take at :1) finished in 0.024 s
17/01/26 16:33:56 INFO DAGScheduler: Job 27 finished: take at :1, took 0.030155 s
Row(order_date=u’2013-07-25 00:00:00.0’, _c1=339, _c2=68153.82999999997)
Row(order_date=u’2013-07-26 00:00:00.0’, _c1=694, _c2=136520.1700000003)
Row(order_date=u’2013-07-27 00:00:00.0’, _c1=503, _c2=101074.34000000014)
Row(order_date=u’2013-07-28 00:00:00.0’, _c1=438, _c2=87123.08000000013)
Row(order_date=u’2013-07-29 00:00:00.0’, _c1=666, _c2=137287.09000000032)
Row(order_date=u’2013-07-30 00:00:00.0’, _c1=540, _c2=102745.62000000011)
Row(order_date=u’2013-07-31 00:00:00.0’, _c1=641, _c2=131878.06000000006)
Row(order_date=u’2013-08-01 00:00:00.0’, _c1=636, _c2=129001.62000000029)
Row(order_date=u’2013-08-02 00:00:00.0’, _c1=558, _c2=109347.00000000013)
Row(order_date=u’2013-08-03 00:00:00.0’, _c1=485, _c2=95266.89000000022)

Please suggest.

Thanks,
Shitansu.

Shuffle is misspelled in “set spark.sql.suffle.partition = 10”.
It should be “set spark.sql.shuffle.partition = 10”

1 Like