Data Ingest - real time, near real time and streaming analytics - Flume and Spark Streaming – Department Wise Traffic – Setup Flume

Let us see how Flume and Spark Streaming can be integrated!!! Here are the high-level steps

  • Define Flume agent with one of the sinks as Spark
  • Define dependencies in build.sbt
  • Develop the program
  • Run the program by passing the appropriate jar files to –jars

Define Flume Agent

Flume agent is configured for

  • Reading data from access.log using exec type as a source
  • Write one copy of unprocessed data to HDFS directly
  • Write another copy of data to 3rd party sink called spark
  • We will use Spark streaming to process data from spark
  • Flume agent can be run as flume-ng agent -n sdc -f sdc.conf

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster