Scala - EmptyString issue

#1

Below code runs fine in Python, just modified the syntax and try to execute in scala getting below error, I am missing anything here?

Get max priced product

val productRDD = sc.textFile("/user/gnanaprakasam/sqoop_import/products")

val productPrice = productRDD.map(rec => rec)

productPrice.reduce((rec1, rec2) =>
if ((rec1.split(",")(4) != “” & rec2.split(",")(4) != “”)
&
(rec1.split(",")(4).toFloat >= rec2.split(",")(4).toFloat))
rec1
else rec2)

16/12/03 23:15:38 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 10)
java.lang.NumberFormatException: empty String
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842)
at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
at java.lang.Float.parseFloat(Float.java:451)
at scala.collection.immutable.StringLike$class.toFloat(StringLike.scala:231)
at scala.collection.immutable.StringOps.toFloat(StringOps.scala:31)
at $line31.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35)
at $line31.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:32)
at scala.collection.TraversableOnce$$anonfun$reduceLeft$1.apply(TraversableOnce.scala:177)
at scala.collection.TraversableOnce$$anonfun$reduceLeft$1.apply(TraversableOnce.scala:172)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:172)
at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1018)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1016)
at org.apache.spark.SparkContext$$anonfun$36.apply(SparkContext.scala:1975)
at org.apache.spark.SparkContext$$anonfun$36.apply(SparkContext.scala:1975)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

0 Likes

#2

Yes, there is one record which have the issue. I think I have highlighted it in both video as well as the course on itversity. Which one are you following?

You can apply filter to filter out that record.

0 Likes

#3

@itversity - Thanks Durga. I have filtered product_id 685, now I am able to execute it. In real time how do we identify which record causing the problem ?

1 Like