How to sort a RDD on its date field?


#1

Hello,
During the practice of 96 scenarios, I have an RDD like below:

scala> final_result.take(20).foreach(println)
2828,2013-08-10,129.99
43399,2014-04-20,100.0
43399,2014-04-20,129.99
43399,2014-04-20,49.98
8989,2013-09-19,119.97
8989,2013-09-19,299.97
8989,2013-09-19,299.97
8989,2013-09-19,111.96
20554,2013-11-29,399.98
20554,2013-11-29,119.97
20554,2013-11-29,399.98
20554,2013-11-29,399.96
36070,2014-03-03,100.0
36070,2014-03-03,199.99
36070,2014-03-03,149.94
36070,2014-03-03,299.95
36070,2014-03-03,119.98
17422,2013-11-10,129.99
17422,2013-11-10,239.96
17422,2013-11-10,129.99

scala> final_result.sortBy(_(1)).take(20).foreach(println)

the result is incorrect

I also tried:
scala> final_result.sortBy((1).toDate()).take(20).foreach(println)
:40: error: value toDate is not a member of Char
final_result.sortBy(
(1).toDate()).take(20).foreach(println)

Can someone tell me how to sort by the second field which is a date?

Thank you so much.


#2

This link might help you.
Go to the 2nd last response on this page. I mentioned how you can sort data on RDD.


#3

Thank you Divyakot,

Your solution is based on different RDD than mine where mine is like:
2828,2013-08-10,129.99
and yours is like:
((2014-07-23 00:00:00.0,Footwear),1663.8201)

So it really doesn’t work here.


#4

Convert the date field to int and sort using sort by API
SC.textFile(filename).map(X = (x.split(",")(0),x.split(",")(1).replace("-","").toInt,x.split(",")(2))).sortBy(r = r._2)