How convert and Array[String] to RDD[List[Strng]]

Sir,
I am reading a text file in the spark , i need to covert it to RDD[List[Strng]] , but when i call
input.map(line =>line.split("\n")).collect()

type mismatch; found : Array[Array[String]] required: org.apache.spark.rdd.RDD[List[String]]

What should i do.

@sha_p Hi I have done the same thing , but i am not getting any errors .
Can you post your complete code along with some sample data .

val rawRDD = spark.sparkContext.textFile("/Users/revanthreddy/Project/Spark-2.1/input/product")

rawRDD: org.apache.spark.rdd.RDD[String] = /Users/revanthreddy/Project/Spark-2.1/input/product MapPartitionsRDD[1] at textFile at <console>:23

rawRDD.map(line =>line.split("\n")).collect()

res1: Array[Array[String]] = Array(Array(1,2,Quest Q64 10 FT. x 10 FT. Slant Leg Instant U,,59.98,http://images.acmesports.sports/Quest+Q64+10+FT.+x+10+FT.+Slant+Leg+Instant+Up+Canopy), Array(2,2,Under Armour Men's Highlight MC Football Clea,,129.99,http://images.acmesports.sports/Under+Armour+Men%27s+Highlight+MC+Football+Cleat), Array(3,2,Under Armour Men's Renegade D Mid Football Cl,,89.99,http://images.acmesports.sports/Under+Armour+Men%27s+Renegade+D+Mid+Football+Cleat), Array(4,2,Under Armour Men's Renegade D Mid Football Cl,,89.99,http://images.acmesports.sports/Under+Armour+Men%27s+Renegade+D+Mid+Football+Cleat), Array(5,2,Riddell Youth Revolution Speed Custom Footbal,,199.99,http://images.acmesports.sports/Riddell+Youth+Revolution+Speed+Custom+Football+Helmet), Array(6,2,Jordan...

Thank you Revanth for your prompt reply. …but for some reason it is not working here…i need to do it using …collect.toList then it worked.

Sir, can you please help me on this

0
down vote
favorite
I have below scenario , when I need to get the lines from list and split it.

scala> var nonErroniousBidsMap = rawBids.filter(line => !(line(2).contains(“ERROR_”) || line(5) == null || line(5) == “”))
nonErroniousBidsMap: org.apache.spark.rdd.RDD[List[String]] = MapPartitionsRDD[108] at filter at :33

scala> nonErroniousBidsMap.take(2).foreach(println)
List(0000002, 15-04-08-2016, 0.89, 0.92, 1.32, 2.07, , 1.35)
List(0000002, 11-05-08-2016, 0.92, 1.68, 0.81, 0.68, 1.59, , 1.63, 1.77, 2.06, 0.66, 1.53, , 0.32, 0.88, 0.83, 1.01)

scala> val transposeMap = nonErroniousBidsMap.map( rec => ( rec.split(",")(0) + “,” + rec.split(",")(1) + “,US” + “,” + rec.split(",")(5) ) )
:35: error: value split is not a member of List[String]
val transposeMap = nonErroniousBidsMap.map( rec => ( rec.split(",")(0) + “,” + rec.split(",")(1) + “,US” + “,” + rec.split(",")(5) ) )
^
I am getting an error as showed above. Can you please help me how to solve this ?

Thank you.

@sha_p can you paste some sample data that you are using and code .

Sir,
here you are :

0000006,08-05-08-2016,1.35,1.13,2.02,1.33,1.64,1.70,0.45,2.02,1.87,1.75,0.45,1.28,1.15,
0000002,11-05-08-2016,0.92,1.68,0.81,0.68,1.59,1.63,1.77,2.06,0.66,1.53,0.32,0.88,0.83,1.01

@sha_p can you explain what is your use case and what do you want to achieve .

sir i have my MS college assignment , described as below.

@sha_p Are you trying to do as below … Why do want to hard code Country for each record ?
can you post your complete code ?

import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD

object Scores {
case class Person(id: String, dob: String, country: String, score: Double)

def getResultRecords(scoresRDD: RDD[String]): RDD[Person] = {

scoresRDD.map(_.split(",")).map(p => Person(p(0), p(1), "US", p(2).toDouble))

}

def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName(“Scores”).setMaster(“local[*]”)
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._

val scoresRDD = sc.textFile("input/scores")
getResultRecords(scoresRDD)

}
}

Thank you so much sir, As I told i one assignment to submit. But unfortunately i am still getting same error in my case :frowning:

This one : scoresRDD.map(_.split(",")).map(p => Person(p(0), p(1), “US”, p(2).toDouble)) causing the same issue " type mismatch found : Unit expected : RDD[Person] :frowning:

Moreover when I say scoresRDD.map(_.split(",")).map(p => Person(p(0), p(1), “US”, p(2).toDouble)).first() It gives an error like this type mismatch; found : com.epam.hubd.spark.scala.core.homework.Person required: org.apache.spark.rdd.RDD[com.epam.hubd.spark.scala.core.homework.Person] sir can you help me

can you copy/attach your complete code here ?

@sha_p Try this method:

def getResultRecords(scoresRDD: RDD[String]): RDD[Person] = {

val personRDD = scoresRDD.map(_.split(",")).map(p => Person(p(0), p(1), "US", p(2).toDouble))
println(personRDD.first())
personRDD

}

sir when i execute it on the spark shell it is giving correct results.
scala> personRDD.first()
res94: Person = 0000006,Danny,US,6.77

if put in the method getResultRecords which expects to return RDD[Person] then it is giving type mismatch error.

sir can i have ur skype id…i can show it.

@sha_p

Right now i dont have skype .
Where is your Person case class ? Is it in different package ?

If it is in different package , then please copy it your current object as i did in the example that i had sent .

If dont mind send your complete code to investigate further .

it is in other package …
-----homework
| ----- domanin —Person.scala
| – Main.scala

this method is in Main.

@sha_p , Did you import Person case class to your Main ?

yes, i moved it to the same folder and tried …no luck same issue.