Data Ingest - real time, near real time and streaming analytics - Spark Streaming – Get department wise traffic – Problem Statement

Spark Streaming – Problem and Solution

  • Problem Statement – Get department wise traffic every 30 seconds.
    • Read data from retail_db logs
    • Compute department traffic every 30 seconds
    • Save the output to HDFS
  • Solution
    • Use Spark Streaming
    • Publish messages from retail_db logs to netcat
    • Create Dstream
    • Process and save the output

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster