Spark Core API's using Scala- Top N Priced Products



I am able to get the Products having top N prices but the other part of the exercise i.e Top N Priced Products is not working for me.

Inputs: Products file separated by pipes(|) and Categories by (,)

Here is my code:

val Products=sc.textFile("/user/cloudera/problem2/products/")
val Categories=sc.textFile("/user/cloudera/categoriesfinal")

val>{var d=r.split(’|’); (d(1).toInt,(d(2),d(4).toDouble))})
val>{var d=r.split(’,’); (d(0).toInt,d(2))})

val PCJoin=ProductsMap.join(CategoriesMap)


var PCMapGBK=PCMap.groupByKey()

def topresults(a: Iterable[(String, Double)],topN: Int): Iterable[(String, Double)]={
a.toList.sortBy(r => -r._2).take(topN)


Not working
def topdensedresults(a: Iterable[(String, Double)],topN: Int): Iterable[(String, Double)]={
** val temp = a.toList.sortBy(r => -r._2).distinct.take(topN)**
** a.toList.sortBy(r => -r._2).filter(r => temp.contains(r._2.toString))**
** }**

Kindly let me know if someone can find what I missed out here…
Thanks in advance.


Try to escape the | in =r.split(’|’) as =r.split(’\|’).


It should be r.split("\|")


Unfortunately, its not working with both front and back slash as well:


why don’t you try it double quote and just pipe symbol…


This is the solution. => rec.split("\|"))

#7 => rec.split("\|")).first()

it should be \| in quotes (double slashes) Here in this blog there is some issue in displaying it.


So, you mean to say double slashes followed by pipe(everything in double quotes)?


Finally This worked as suggested by @venkatwilliams

Thanks alot…


Now, the main code is not working…its returning the empty list…


@nitesh - Check you have filtered product id 685, it’s bad data.


I have posted another query to solve the same using dataframe and spark sql