Data Ingest - real time, near real time and streaming analytics - Flume and Kafka integration – Develop configuration file

Even though Flume and Kafka exist for different purposes they can complement each other, hence understanding both the technologies as well as integrating them is important.

  • Kafka is a lot more reliable and scalable than Flume
  • However, if you have to publish messages from existing application’s web server logs, we have to refactor application to publish to Kafka topic using publisher API
  • Some of the legacy applications are highly sensitive for changes
  • In that case, we can use Flume and create the agent which can
    • Read from web server logs
    • Publish to Kafka topic
  • Once we got data to Kafka topic we can get the benefit of scalability, reliability as well as agility for downstream applications to consume messages

Flume agent to publish to Kafka

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster