Word count program

pyspark
#1

instead of just using the count() ; I not sure why this logic will not give the results jsonRDD.flatMap(lambda x: tuple(x.split(":"))).map(lambda x: (x,1)).reduce(lambda x,y: x[1]+y[1])

0 Likes

#2

To get the word count you need to use reduceByKey not reduce.

0 Likes

#3

but it will give only word count but if i have count the no of words what needs to be done with reduce function

0 Likes

#4

You should use count. That is it. Why do you want to complicate the things?

0 Likes

#5

lol , just want to understand what i missing hereā€¦ Will use count for the Exam.

0 Likes

#6

Here is the complicated way of getting the count :slight_smile: . Probably you should use more complex examples like min, max, sum etc

wordcountRDD.flatMap(lambda x: x.split(" ")).map(lambda x: 1).reduce(lambda x,y: x + y)

0 Likes

#7

Thanks :slight_smile:

0 Likes