How to find List of RDD's created

How to find List of RDD’s created after i issue a list of spark api commands like i typed

val removeRDD = sc.parallelize(List(“a”,“the”,“an”,“a”,“with”,“this”,“these”,“is”,“are”,“in”,“for”,“to”,“and”,“The”,“of”))

val filtered = trimmedContent.subtract(removeRDD)

then is there a common command in spark-shell which can be used to see above RDD 's namely removeRDD and fitered

try “tab”. Mostly its an auto-complete… and it will list all the variables/functions defined from your session start. But you need to find your RDD/variable Names :slight_smile: :wink:

I am not sure what you are going to do with the autocomplete but I do have a recommendation.

If you could declare all your RDD’s with the specific Prefixes then you can easily list out all the RDD’s or Variables.

For the instance, RDD’s prefixed with my and whenever I do “my” it lists out the RDD’s/variables which I have defined…

1 Like

@Tarun_Das Following code snippet in pyspark can list all rdd variables!

def list_rdds():
    from pyspark import RDD
    return [k for (k, v) in globals().items() if isinstance(v, RDD)]

list_rdds()
# []

rdd = sc.parallelize([])
list_rdds()
# ['rdd']

Credits: Above code is extracted from following SO answer. You can read more here

2 Likes