Data Ingest - real time, near real time and streaming analytics - Kafka – Anatomy of a topic

Kafka topic – Anatomy

  • A topic is nothing but a file which captures the stream of messages
  • Publishers publish the message to the topic
  • Consumers subscribe to the topic
  • Topics can be partitioned for scalability
  • Each topic partition can be cloned for reliability
  • Offset – position of the last message consumer have read from each
    partition of the topic
  • Consumer group can be created to facilitate multiple consumers
    read from the same topic in a coordinated fashion (offset is tracked at
    group level)

Kafka topic have important properties like

  • partition – for scalability
  • replication – for reliability

We can define the number of partitions and replication factor while creating the topics. Consumers maintain offsets to understand the position of messages that are read in each of the partitions.

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster