Why I don't get the content of a rdd


#1

I am watching CCA 175 lecture 68 and comes up with this question, the scripts:

val l = List(“Hello”, “How are you doing”, “Let us perform word count”, “As part of the word count program”, “we will see how many times each word repeat”)

//create RDD by parallelize the list
val l_rdd = sc.parallelize(l)

scala> l_rdd.take(5).foreach(println)
Hello
How are you doing
Let us perform word count
As part of the word count program
we will see how many times each word repeat

//create a new RDD using map on the RDD
val l_map = l_rdd.map(ele => ele.split(" "))
scala> l_map.take(5).foreach(println)
[Ljava.lang.String;@52681909
[Ljava.lang.String;@f5c9316
[Ljava.lang.String;@47bd62db
[Ljava.lang.String;@4083443e
[Ljava.lang.String;@52b1bd76

My question is how do I inspect the actual content of the l_map? as l_map is an RDD, I was thinking l_map.take(5).foreach(println) will print off each element like l_rdd

Thank you.


#2

@paslechoix:

Your l_rdd is below:
Hello
How are you doing
Let us perform word count
As part of the word count program
we will see how many times each word repeat

With map() you are passing each LINE of the above text as a element, so what do you expect here?
You should you flatMap() to find the word count.

Or convert l_rdd to Array and index on that, so you can see your text.

Hope this helps.
Thanks
Venkat