Parsing large XML files with spark


#1

Hello , i m trying to parse a 2GB XML file with spark and python. the first step is to charge the rdd by typing the command : rdd1=sc.textFile(“path”) the to collect it by typing rdd1.collect()
The problem is that it gave me an error OUT OF MEMORY .
I allowed 12GB of ram to my virtual machine .
Thanks for ur help in advance .