I have been learning Spark using python as language (pyspark).
now this might sound a dumb question
Do we manipulate RDD using pyspark or we just work on Dataframe using pyspark/
I was not able to fine proper documentation which shows the RDD operations in pyspark.
everyone was mentioning you need to know reduceByKey(), JoinByKey() for certification but i am not sure how to use this in pyspark. Any help?