Scala-Spark keys() transformation API returns a RDD but Pyspark does not

Hello All,
I would like to share the discrepancy which I am facing between scala-spark and pyspark when I tried to invoke keys() API.


scala> val ordersRdd = sc.parallelize(List(“100,Order1”,“101,Order2”,“103,Order3”,“104,Order4”))
ordersRdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at :24

scala>",")).map(x=> (x(0).toInt,x(1))).keys
res1: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[5] at keys at :27

scala>",")).map(x=> (x(0).toInt,x(1))).keys.collect()
res2: Array[Int] = Array(100, 101, 103, 104)


ordersRdd = sc.parallelize([“100,Order1”,“101,Order2”,“103,Order3”,“104,Order4”])
output = l: l.split(",")).map(lambda l:(int(l[0]),l[1])).keys
<bound method PipelinedRDD.keys of PythonRDD[5] at RDD at PythonRDD.scala:48>

Traceback (most recent call last):
File “”, line 1, in
AttributeError: ‘function’ object has no attribute ‘collect’

Am I missing anything or is the nature of Keys() API in pyspark.

I was looking for an alternative solution Apache Spark - PySpark - distinct() issue