HDPCD: Spark using Python(pyspark)


Originally published at: http://kaizen.itversity.com/courses/hdpcd-spark-using-pythonpyspark/

Spark is in memory distributed computing engine. As part of this lesson we will see how to get started with Spark. Setting up development environment – we will set up development environment on our PC Using Big Data labs or virtual machine images – It is better to use lab or virtual machines to explore…