Regarding Split function on Hive Data in PySpark

Hello All ,

I have the data set as below I want to pick the 2nd item as key. I used

" dataMap=Data.split(lambda rec : (rec.split(",")[1] , rec))

But I am getting the error for the same. Please find the below Data and provide the solution for the same.

Row(category=u’Comedy’, _c1=414)
Row(category=u’Nonprofits & Activism’, _c1=42)
Row(category=u’ UNA ‘, _c1=32)
Row(category=u’Science & Technology’, _c1=80)
Row(category=u’Autos & Vehicles’, _c1=77)
Row(category=u’People & Blogs’, _c1=398)
Row(category=u’Music’, _c1=862)
Row(category=u’Sports’, _c1=251)
Row(category=u’News & Politics’, _c1=333)
Row(category=u’Pets & Animals’, _c1=95)
Row(category=u’Education’, _c1=65)
Row(category=u’Film & Animation’, _c1=260)
Row(category=u’Entertainment’, _c1=908)
Row(category=None, _c1=0)
Row(category=u’Travel & Events’, _c1=112)
Row(category=u’Howto & Style’, _c1=137)

Shouldn’t it be rather than data.split?

i want to make count as the key to perform sortByKey() operation to find the top 5 highest records. That’s the reason I want to use map().

Can you please help me , how to change the data to below format.

Autos & Vehicles,77

It’s coming like this because of hive imported data. Please help me on this.

@itversity Please resopnd to this. It’s Quite Urgent for me.

I have found the solution for this today morning on Durga Sir’s Live session. Thanks Durga Sir for the solution.
The below code is the solution for the above issue.

from pyspar.sql import HiveContext()
dataHive = sqlContext.sql(“select category , count(category) from youtube_da.youtubedata group by category”)
for i in x : x[category] , x[count(category)]).collect(): print(i)

Regards ,