kafka 강의 정리

yo·2021년 10월 7일

kafka

Kafka Theory

Topics, Partitions, Offsets

Topics: a particular stream of data

similar to a table in a database (without all the constraints)
You can have as many topics as you want
A topic is identified by its name

Partitions

Topics are split in partitions
each partition is orderd
Each message within a partition gets an incremental id, called offset
Topic을 만들 때 partition개수를 정해야 한다. 물론 나중에 바꿀 수 있다.

Offsets

Offset only have a meaning for a specific partition
- E.g. offset 3 in partition 0 doesn't represent the same data as offset 3 in partition 1
Order is guaranteed only within a partition ( not across partitions)
Data is kept only for a limited time ( default is one week)
Once the data is written to a partition, it can't be changed (immutability)
Data is assigned randomly to a partition unless a key is provided (more on this later)

Brokers and Topics

each broker is a server
A kafka cluster is composed off multiple borkers (servers)
Each broker is identified with its ID (integer)
After connecting to any broker (called a bootstrap broker), you will be connected to the entire cluster
A good number to get started is 3 brokers, but some big clusters have over 100 brokers.
In example broker id starting at 100 (arbitrary)

Example

Topic Replication

머신 하나가 down되도 서비스가 지속될 수 있게 하기 위해 replication이 필요하다.
Toplics Should have replication factor more than 1 (usually between 2 and 3)
This way if a broker is down, another broker can server the data
Example: Topic-A with 2 partitions and replication factor of 2

위 그림처럼 broker 102가 고장나도 replication덕에 서비스에는 문제가 없다.

Leader for a Partition

At any time only ONE broker can be a leader for a given partition
Only that leader can receive and serve data for a partition
The other brokers will synchronize the data
Therefore each partition has one leader and multiple ISR (in-sync reaplica)
위 그럼처럼 각 partition마다 리더가 있다.(별표)
만약 Broker 101이 down되면, Broker 102에 있는 partition 0이 리더가 된다.
broker 101이 회복되면 다시 리더를 되찾기 위한 시도가 일어난다.

Producers and Message Keys

Producers

write data to topics(which is made of partitions)
automatically know to which broker and partition to write to
in case of Broker failures, Producers will automaticaaly recover
Producers can choose to receive acknowledgement of data wirtes:
- acks=0: Producer won't wait for acknowledgement(possible data loss)
- acks=1: Producer will wait for leader acknowledgement(limited data loss)
- acks=all: Leader + replicas acknowledgement(no data loss)

Producers:Message keys

Producers can choose to send a key with the message(string, number, etc)
if key=null, data is sent round robin(broker 101 then 102 then 103...)
if a key is sent, then all msg for that key will always go to the same partition
A key is basically sent if you need msg ordering for a specific field(ex: truck_id)

Consumers & Consumer Groups

Consumers

Consumers read data from a topic(identified by name)
Consumers know which broker to read from
In case of broker failures, consumers know how to recover
Data is read in order within each partitions

Consumer Groups

consumer는 애플리케이션이다. 예를 들면 django application.
Consumers read data in consumer groups
Each consumer within a group reads from exclusive partitions
If you have more consumers than partitions, some consumers will be inactive

What if too many consumers

If consumers num > partitions num, some consumers will be inactive

Consumer Offsets & Delivery Semantics

Consumer Offsets

consumer offsets은 consumer가 어디까지 읽었는지 표시하는 책갈피 같은 기능이다.
이기능을 통해 혹시 consumer가 down되어도 다시 up했을 때 어디부터 읽어야 할지 알 수 있다.
Kafka stores the offsets at which a consumer group has been reading
The offsets committed live in a Kafka topic named __consumer_offsets
When a consumer in a group has processed data received from Kafka, it should be committing the offsets
If a consumer dies, it will be able to read back from where it left off thanks to the committed consumer offsets!

Delivery Semantics

Consumers choose when to commit offsets.
There are 3 delivery semantics:
- At most once:
  - offsets are committed as soon as the msg is received
  - If the processing goes wrong, the msg will be lost(it won't be read again)
- At least once(usually preferred)
  - offsets are committed after the msg is processd
  - If the processing goes wrong, the msg will be read again.
  - This can result in duplicate processing of msg. Make sure your processing is idempotent(i.e. processing again the msg won't impact your systems)
- Exactly once
  - Can be achieved for Kafka -> Kafka workflows using Kafa Streams API
  - For Kafka -> External System workflows, use an idempotent consumer

Kafka Broker Discovery

Every Kafka broker is also called a "boostrap server"
That means that you only need to connect to one broker,
and you will be connected to the entire cluster.
Each broker knows about all brokers, topics and partitions (metadata)

Zookeeper

Zookeeper manages brokers (keeps a list of them)
Zookeeper helps in performing leader election for partitions
Zookeeper sends notifications to Kafka in case of change (e.g. new topic, broker dies, broker comes up, delete topics, etc..)
Kafka can't work without Zookeeper
Zookeeper by design operates with an odd number of servers (3,5,7)
Zookeeper has a leader (handle writes) the rest of the servers are followers (handle reads)
(Zookeeper does NOT store consumer offsets with Kafka > v0.10)

Kafka Guarantees

msgs are appended to a topic-partition in the order they are sent
Consumers read msg in the order stored in a topic-partition
With a replication factor of N, producers and consumers can tolerate up to N-1 brokers being down
This is why a replication factore of 3 is a good idea:
- Allows for one broker to be taken down for maintenace
- Allows for another broker to be taken down unexpectedly
As long as the number of partitions remains constant for a topic (no new partitions), the same key will always go to the same partition

Theory Roundup

Quiz

To produce data to a topic, a producer must provide the Kafka client with any broker from the cluster and the topic name
- Very important: you only need to connect to one broker (any broker) and just provide the topic name you want to write to. Kafka Clients will route your data to the appropriate brokers and partitions for you!
To read data from a topic, the following configuration is needed for the consumers
- Very important: you only need to connect to one broker (any broker) and just provide the topic name you want to read from. Kafka will route your calls to the appropriate brokers and partitions for you!

Setup Kafka

kafka를 실행하기 위해선 java를 먼저 깔아야 한다. java와 kafka를 깔아보자.
참고로 강사가 무조건 java 8을 설치하라고 한다. 9,10,11 말고 8. 이유는 모르겠다만, 일단 그렇게 하자.

install java on MacOS

여기 참고
brew update
`brew tap adoptopenjdk/openjdk``
brew search jdk
brew install --cask adoptopenjdk8 (자바 8 설치)

install kafka on MacOS

kafka공식 사이트에서 다운로드 받는다.
Binary Downlads로, Scala는 아무거나(나는 2.13)을 받았다.
받은 파일을 . 디렉토리로 옮긴 뒤 tar -xvf kafka_2.13-3.0.0.tgz 명령어로 압축을 해제.
해제한 폴더 안에 들어가서 bin/kafka-topics.sh을 쳤을 때 뭐라뭐라 설명문이 나오면 잘 된거다.

start Zookeeper

서버를 실행하기 전 zookeeper와 kafka 서버의 datadir을 바꿔주자.
mkdir data/zookeeper (카프카 루트 디렉토리에서)
mkdir data/kafka (카프카 루트 디렉토리에서)
config/zookeeper.propertie 의 dataDir값을 위에서 만든 data/zookeeper의 절대경로 값으로 바꿔준다.
config/zookeeper.propertie 의 log.dirs값을 위에서 만든 data/kafka의 절대경로 값으로 바꿔준다.
bin/zookeeper-server-start.sh config/zookeeper.properties 명령어로 zookeeper서버를 실행한다. 아래 내용이 터미널에 뜨면 잘된거다.

INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory)
새로운 터미널에 bin/kafka-server-start.sh config/server.properties로 카프라 서버를 실행한다.
아래 내용이 터미널에 뜨면 잘된거다.

/Users/hwanil.kim/kafka_2.13-3.0.0/data/kafka`

CLI

카프카는 토픽으로부터 시작하기 때문에 CLI도 토픽 먼저 배운다.
bin/kafka-topics.sh로 설명을 볼 수 있다.
topic을 만들어보자.

create topic

try 1.

bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic first_topic --create
에러(Missing required argument "[partitions]")
토픽 생성시 partition개수 지정해야 함.

try 2.

bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic first_topic --create --partitions 3
에러(Missing required argument "[replication-factor]")
토픽 생성시 replication 개수 지정해야 함.

try 3.

bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic first_topic --create --partitions 3 --replication-factor 2
에러(Error while executing topic command : Replication factor: 2 larger than available brokers: 1.)
현재 브로커 서버 1대임. replication 수는 브로커 수를 초과할 수 없다.

try 4.

bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic first_topic --create --partitions 3 --replication-factor 1
성공

list topic

bin/kafka-topics.sh --bootstrap-server localhost:9092 --list

describe topic

bin/kafka-topics.sh --bootstrap-server localhost:9092 first_topic --des cribe
Leader는 broker id를 말한다.

delete topic

bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic first-topic --delete

Kafka Consloe Producer CLI

bin/kafka-console-producer.sh: producer에 대한 설명 볼 수 있는 명령어
bin/kafka-console-producer.sh --broker-list 127.0.0.1:9020 --topic first_topic

Kafka Console Consumer CLI

Kafka Consumers in Group

Kafka Consumer Groups CLI

Resetting Offsets

Kafka Java Programming 101

Kafka Twitter Producer & Advanced Configurations

Kafka ElasticSearch Consumer & Advanced Configrations

Kafka Extended APIs for Developers

Real World Insights and Case Studies

Never stop asking why

이전 포스트

GIT 정리

다음 포스트

kafka 강의 정리

Kafka Theory

Topics, Partitions, Offsets

Topics: a particular stream of data

Partitions

Offsets

Brokers and Topics

Example

Topic Replication

Leader for a Partition

Producers and Message Keys

Producers

Producers:Message keys

Consumers & Consumer Groups

Consumers

Consumer Groups

What if too many consumers

Consumer Offsets & Delivery Semantics

Consumer Offsets

Delivery Semantics

Kafka Broker Discovery

Zookeeper

Kafka Guarantees

Theory Roundup

Quiz

Setup Kafka

install java on MacOS

install kafka on MacOS

start Zookeeper

CLI

create topic

try 1.

try 2.

try 3.

try 4.

list topic

describe topic

delete topic

Kafka Consloe Producer CLI

Kafka Console Consumer CLI

Kafka Consumers in Group

Kafka Consumer Groups CLI

Resetting Offsets

Kafka Java Programming 101

Kafka Twitter Producer & Advanced Configurations

Kafka ElasticSearch Consumer & Advanced Configrations

Kafka Extended APIs for Developers

Real World Insights and Case Studies

GIT 정리

python Inheritance vs Composition

0개의 댓글

관련 채용 정보