Data Ingest - real time, near real time and streaming analytics - Flume and Kafka in Streaming analytics

Flume and Kafka

  • The life cycle of streaming analytics

    • Get data from the source (Flume and/or Kafka)
    • Process data
    • Store it in target
  • Kafka can be used for most of the applications

  • But existing source applications need to be refactored to publish
    messages

  • Source applications are mission-critical and highly sensitive for any
    changes

  • In that case, if the messages are already captured in web server logs,
    one can use Flume to get messages from logs and publish to Kafka
    Topic


Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster