Apache Kafka Fundamentals

Before we understand Kafka, lets discuss messaging system. A messaging system allows two applications to communicate withe each other so that dat transfer can be done.Suppose we can have a one to one messaging system between two applications but when number of applications gets increased this direct communication between each application becomes complex to maintain.Therefore Publish subscribe messaging system comes to the rescue.

Publish subscribe messaging system:

In Publish subscribe model, one application(publisher or sender) will send the data to an intermediate third entity called broker.Now each receiving application(subscriber) will get the data from this broker.We can correlate it with TV broadcast where customer can subscribe and channel will be broadcast to all interested subscribers.So in this system, sender will not send the message/data directly to the receiver.Instead a central point of contact(broker) is put in place to ease the communication between sender and receiver.Kafka is a Publish/Subscribe messaging system.


Message: This is the unit of data which gets propagated. We can think of it as a record or row.


Topics: Topics are the way how Kafka organize the events data.We relate to Tables in databases.So we write data to topics and read data from topics.But we need a program to write a data to topic which we call as Producer. and another program to read data from topic which will be a Consumer who will consume the written data.

But whatever data we write to topics are immutable means it can’t be changed/altered or deleted.

Partition: Partition is where actually events(data) gets stored.Data from one topic will be partition on multiple partitions.This also ensure that not all of our topic data will be partitions on single cluster.In case we want to store all of the data from one topic to store on same partition, we need to provide same key.

Since messages with key are stored in same partition, if you need order then use the same key for those messages.If we do not care about order in which it is stored/retrieved then we can even have null key.

Few other good links:

1) https://kafka.apache.org/quickstart

2) https://developer.confluent.io/what-is-apache-kafka/

Leave a comment