Apache Spark 1.6 - Transform, Stage and Store - Introduction to Spark

As part of this topic, we will look into the introduction of Spark

  • Spark is Distributed computing framework
  • Bunch of APIs to process data
  • Higher level modules such as Data Frames/SQL, Streaming, MLLib and more
  • Well integrated with Python, Scala, Java etc
  • Spark uses HDFS API to deal with file system
  • It can run on any distributed or cloud file systems – HDFS, s3, Azure Blob etc
  • Only Core Spark and Spark SQL (including Data Frames) is part of the curriculum for CCA
  • Spark and Hadoop Developer. CCA also requires some knowledge of Spark Streaming.
  • Pre-requisites – Programming Language (Scala or Python)

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster