Architecture of Spark


Originally published at:

Let us see architecture of Spark. Spark is distributed computing engine It works on many file systems – typically distributed ones It uses HDFS APIs for reading files from file system Works seamlessly on HDFS, AWS S3 and Azure Blob etc Run a sample job Validate files in HDFS hadoop fs -ls /public/randomtextwriter/part-m-00000 Launch spark…