카프카 스트림즈 - Windowing

함궈·2023년 9월 8일
0

아파치카프카

목록 보기
3/10

Windowing lets you control how to group records that have the same key for stateful operations such as aggregations or joins into so-called windows. Windows are tracked per record key.

For example, in join operations, a windowing state store is used to store all the records received so far within the defined window boundary.

In aggregating operations, a windowing state store is used to store the latest aggregation results per window.

Old records in the state store are purged after the specified window retention period.

Kafka Streams guarantees to keep a window for at least this specified time; the default value is one day and can be changed

The DSL supports the following types of windows:

Window nameBehaviorShort description
Tumbling time windowTime-basedFixed-size, non-overlapping, gap-less windows
Hopping time windowTime-basedFixed-size, overlapping windows
Sliding time windowTime-basedFixed-size, overlapping windows that work on differences between record timestamps
Session windowSession-basedDynamically-sized, non-overlapping, data-driven windows

Tumbling time windows

img

windows based on time intervals

They model fixed-size, non-overlapping, gap-less windows.

A tumbling window is defined by a single property: the window’s size.

A tumbling window is a hopping window whose window size is equal to its advance interval.

Since tumbling windows never overlap, a data record will belong to one and only one window.

Hopping time windows

img

Hopping time windows are windows based on time intervals.

They model fixed-sized, (possibly) overlapping windows.

A hopping window is defined by two properties: the window’s size and its advance interval (aka “hop”).

The advance interval specifies by how much a window moves forward relative to the previous one.

For example, you can configure a hopping window with a size 5 minutes and an advance interval of 1 minute.

Since hopping windows can overlap, a data record may belong to more than one such window.

Sliding time windows

img

Sliding windows are actually quite different from hopping and tumbling windows.

In Kafka Streams, sliding windows are used only for join operations, and can be using the JoinWindows class, and windowed aggregations, specified by using the SlidingWindows class.

A sliding window models a fixed-size window that slides continuously over the time axis.

In this model, two data records are said to be included in the same window if (in the case of symmetric windows) the difference of their timestamps is within the window size.

As a sliding window moves along the time axis, records may fall into multiple snapshots of the sliding window, but each unique combination of records appears only in one sliding window snapshot.

Sliding windows are aligned to the data record timestamps, not to the epoch.

Session Windows

Session windows are used to aggregate key-based events into so-called sessions, the process of which is referred to as sessionization.

Sessions represent a period of activity separated by a defined gap of inactivity (or “idleness”).

Any events processed that fall within the inactivity gap of any existing sessions are merged into the existing sessions.

If an event falls outside of the session gap, then a new session will be created.

Session windows are different from the other window types in that:

  • all windows are tracked independently across keys – e.g., windows of different keys typically have different start and end times

  • their window sizes sizes vary – even windows for the same key typically have different sizes

The prime area of application for session windows is user behavior analysis.

Session-based analyses can range from simple metrics (e.g., count of user visits on a news website or social platform) to more complex metrics (e.g., customer conversion funnel and event flows).

img

When the first three records arrive (upper part of in the diagram below), we’d have three sessions (see lower part) after having processed those records: two for the green record key, with one session starting and ending at the 0-minute mark (only due to the illustration it looks as if the session goes from 0 to 1), and another starting and ending at the 6-minute mark; and one session for the blue record key, starting and ending at the 2-minute mark.

img

If we then receive three additional records (including two out-of-order records), what would happen is that the two existing sessions for the green record key will be merged into a single session starting at time 0 and ending at time 6, consisting of a total of three records.

The existing session for the blue record key will be extended to end at time 5, consisting of a total of two records. And, finally, there will be a new session for the blue key starting and ending at time 11.

출처 :
Confluent Doc

0개의 댓글