Data Ingest - real time, near real time and streaming analytics - Flume and Kafka in Streaming analytics

Flume and Kafka

  • The life cycle of streaming analytics

    • Get data from the source (Flume and/or Kafka)
    • Process data
    • Store it in target
  • Kafka can be used for most of the applications

  • But existing source applications need to be refactored to publish

  • Source applications are mission-critical and highly sensitive for any

  • In that case, if the messages are already captured in web server logs,
    one can use Flume to get messages from logs and publish to Kafka

