Reading and Saving Data using Data Frame

Hello All ,

How to read first 10 lines of data and select first 5lines and save in a particular file format ??

Regards ,

@amit0900 Try to paste the data you want to process and code you have tried. It would be easier for others to help u

1 Like

Rahul , I am looking for the syntax for this scenario. So , I have not posted any code.

I can help with that, i will be writing sample in python u can refer it

lets think the data is comma seperated
data = sc.textFile(‘path in hdfs’).map(lambda x: x.split(’,’).filter(apply a filter operation based on data to get 5 lines)
you can apply filter here or if you have idea of function passing in map you can write your logic over there for example
data = sc.textFile(‘path in hdfs’).map(lambda x: x.split(’,’).map(lambda y: lines(y))
def lines(l):
you need to write the logic you want
return value

if you want to save back that filtered 5 lines to HDFS as text file you can do saveAsTextFile(path)

use take(n) function