Sorting on list, map, rdd has been so confusing


#1

Hello,

Can anyone recommend a good resource for explaining the sorting in Scala on List, Map, RDD…

I have been working on Sorting and so far I am very frustrated with what it is in the current Scala, so confusing, different to list, map and rdd, working here and not there…

Here is an example:
val orders = sc.textFile("/public/retail_db/orders")

68861,2014-06-13 00:00:00.0,3031,PENDING_PAYMENT
68862,2014-06-15 00:00:00.0,7326,PROCESSING
68863,2014-06-16 00:00:00.0,3361,CLOSED
68864,2014-06-18 00:00:00.0,9634,ON_HOLD

scala> val asc = ordersm.sortBy(_._1)
asc.take(10).foreach(println)
(1,11599)
(2,256)
(3,12111)
(4,8827)
(5,11318)
(6,7130)
(7,4530)
(8,2911)
(9,5657)
(10,5648)

descending:

scala> val desc = ordersm.sortBy(_._1, false)
desc.take(10).foreach(println)
(68883,5533)
(68882,10000)
(68881,2518)
(68880,1117)
(68879,778)
(68878,6753)
(68877,9692)
(68876,4124)
(68875,10637)
(68874,1601)

Now, let’s work on Status column as key, same method will fail

val orders1 = orders.map(x=>(x.split(",")(3),x.split(",")(0)))

val orders1_w_count = orders1.countByKey

scala> orders1_w_count.take(10).foreach(println)
(PAYMENT_REVIEW,729)
(CLOSED,7556)
(SUSPECTED_FRAUD,1558)
(PROCESSING,8275)
(COMPLETE,22899)
(PENDING,7610)
(PENDING_PAYMENT,15030)
(ON_HOLD,3798)
(CANCELED,1428)

Now sorting

on the count asc
scala> val asc1 = orders1_w_count.sortBy(_.1)
:33: error: value sortBy is not a member of scala.collection.Map[String,Long]
** val asc1 = orders1_w_count.sortBy(
._1)**

It seems orders1.countByKey transformed orders1 to a different type and caused the sortBy not available

What command should I use to sort the orders1.countByKey result?

Thank you very much.


#2

Please use the commands as you are trying to sort the immutable ListMap.

import scala.collection.immutable.ListMap
ListMap(orders1_w_count.toSeq.sortBy(_.1):*)

image for your reference: