6. Pattern Mining

Eunji·2026년 5월 21일

Data Mining

목록 보기

10/12

1. Pattern Mining

1.1 Frequent Patterns

Definition

frequent patterns
they reveal hidden regularities and relationship in data

Why studying frequent pattern mining?

association, correlation, casuality
sequential patterns
partial periodicity, cyclic/temporal associations

foundation for several essential data mining tasks

Type

frequent items
- items that occur together
- e.g., {milk, bread}
sequential patterns
- items that occur in a specific order
- e.g., smartphone $\rightarrow$ TV $\rightarrow$ IoT device
structed patterns
- patterns in complex data structures
- e.g., subgraphs, subtrees, fractal

2. Association Rule Mining Fundamentals

2.1 Example: Market Basket Analysis

used to understand customer purchasing behavior by analyzing transaction data
identify which items are frequently purchased together during a single shopping

Business Application

매장 배치: 자주 함께 구매되는 상품을 가까이 배치하여 매출 증대 유도 또는 멀리 배치하여 고객이 더 많은 상품을 훑어보게 함
프로모션: 관련 상품을 묶어서 할인 판매하는 교차 판매 전략 수립

2.2 Key Metrics for Association Rules

1. Support $s$

how frequently both items appear together in all transactions
$support(A \Rightarrow B) = P(A \cup B)$

합집합 기호이지만, 의미상으로는 co-occurrence를 뜻한다.

2. Confidence $c$

indicates how likely a customer who buys a computer will also buy antivirus software
$confidence(A \Rightarrow B) = P(B|A) = \displaystyle\frac{support(A \cup B)}{support(A)} = \frac{support\_count(A \cup B)}{support\_count(A)}$

2.3 Frequent Itemset

Notations

$\mathcal{I} = \{I_1, I_2, ..., I_m\}$ : itemset
$D$ : task-relevant data
$T$ : nonempty itemset such that $T\subseteq \mathcal{I}$
- each transactions is associated with an identifier, called TID

Association Rule

$A \Rightarrow B$
an implication where $A\subseteq \mathcal{I}, B\subseteq \mathcal{I}, A\subseteq \empty, B\subseteq \empty$ and $A \cap B = \empty$

두 집합이 완전히 다른 아이템들로만 구성돼야 "A를 샀을 때 B를 살 확률이 높다"는 의미 있는 인사이트가 나온다.

Two Key Measures

support
- proportion of transactions in $D$ that contain $A \cap B$
- $support(A \Rightarrow B) = P(A \cup B)$
- = relative support(ratio)
- occurrence frequency = absolute support(count)

if the relative support $\ge$ minimum support threshold, then $\mathcal{I}$ is a frequent itemset

the set of frequent 𝑘-itemsets is commonly denoted by $𝐿_𝑘$

confidence
- proportion of transactions containing $A$ that also contain $B$
- $confidence(A \Rightarrow B) = P(B|A)$
- = conditional probability of $B$ given $A$

K-itemset

an interest that contain exactly k items
e.g., {computer, antivirus} is a 2-itemset

3. Association Rule Mining

3.1 Two-Step Process

1. Find all frequent itemsets

each of these itemsets will occur at least $min\_sup$

2. Generate strong association rules

these rules must satisfy $min\_sup$ and $min\_conf$

why union is association rules?

$A \cap B = {}$ 공집합이므로 A와 B가 같이 등장하는 트랜잭션을 찾으려면 intersection $(\cap)$ 이 아니라 union $(\cup)$ 을 써야 한다.

3.2 Challenges

Combinatorial Explosion

if an itemset is frequent, all of the subsets are also frequent

a long itemset will contain a combinatorial number of shorter frequent subitemsets
- e.g., a frequent itemset of length 100
- $2^{100} - 1 \approx 1.27 \times 10^{30}$
to overcome this difficulty,
closed frequent itemset and maximal frequent itemset

폭발적인 조합을 줄이기 위해 줄일 수 있는 건 쳐내는 prunning을 해야 한다.

4. Apriori Algorithm

4.1 Strategy

Key Idea

the name "Aprior" comes from the fact the algorithm uses prior knowledge of frequent properties

Iterative Level-wise Search

where k-itemsets are used to generate (k+1)-itemsets
first, the database is scanned to count the occurrences of each item,
- and those satisfying the $min\_sup$ are collected to form the set of frequent $L_1$
- then, $L_1$ is used to generate $L_2$ , which is then used to generate $L_3$ , and so on, until no more frequent items can be found
- each iteration requires a full scan of the database

4.2 Apriori Property: Antimonotonicity

Definition

all nonempty subsets of a frequent itemset must also be frequent
this property based on the following observations

Observation

if an itemset $\mathcal{I}$ does not satisfy $min\_sup$ , then $\mathcal{I}$ is not frequent
- $P($ \mathcal{I} $) < min\_sup$
if an item $A$ is added to $\mathcal{I}$ , then the resulting itemset cannot occur more frequently than $\mathcal{I}$

아이템이 늘어날수록 조건이 더 까다로워지니까 등장 횟수는 줄어들거나 같을수밖에 없다. $\mathcal{I}$ 가 이미 not frequent하다면, 거기에 아이템을 추가해도 not frequent하다.

Antimonotonicity

if a set cannot pass a test, all of its supersets will fail the same test as well
e.g., {milk, bread} $\rightarrow$ support = 0.1 ( $min\_sup$ = 0.3) $\rightarrow$ not frequent

4.3 Core Steps of Apriori

새로운 후보군을 만들고(join), 불필요한 것을 쳐내는(prune) 과정이다.

1. Join Step $\bowtie$

generate candidate k-itemsets $C_k$ by self-joining the frequent (k-1)-itemsets $L_{k-1}$
- their first (k-2) items are identical

두 개의 셋을 합칠 때, 앞부분은 똑같고 마지막 한 아이템만 서로 달라야 한다.

key point

itemsets are sorted in lexicographic order

only valid combinations are generated

duplicate candidates are avoided

2. Prune Step

candidate set $C_k$ may be large
- many candidates are not frequent
for each candidate k-itemset (using antimonotonicity)
- check all (k-1)-subsets
- if any (k-1) subset is not frequent $\rightarrow$ remove the candidate

if a subset is not frequent $\rightarrow$ the superset cannot be frequent

Eunji

이전 포스트

5. Linear Model (4)

다음 포스트

6. Pattern Mining

Data Mining

1. Pattern Mining

1.1 Frequent Patterns

Definition

Why studying frequent pattern mining?

Type

2. Association Rule Mining Fundamentals

2.1 Example: Market Basket Analysis

Business Application

2.2 Key Metrics for Association Rules

1. Support $s$

2. Confidence $c$

2.3 Frequent Itemset

Notations

Association Rule

Two Key Measures

K-itemset

3. Association Rule Mining

3.1 Two-Step Process

1. Find all frequent itemsets

2. Generate strong association rules

3.2 Challenges

Combinatorial Explosion

4. Apriori Algorithm

4.1 Strategy

Key Idea

Iterative Level-wise Search

4.2 Apriori Property: Antimonotonicity

Definition

Observation

Antimonotonicity

4.3 Core Steps of Apriori

1. Join Step $\bowtie$

2. Prune Step

5. Linear Model (4)

7. Clustering (1)

0개의 댓글

6. Pattern Mining

Data Mining

1. Pattern Mining

1.1 Frequent Patterns

Definition

Why studying frequent pattern mining?

Type

2. Association Rule Mining Fundamentals

2.1 Example: Market Basket Analysis

Business Application

2.2 Key Metrics for Association Rules

1. Support sss

2. Confidence ccc

2.3 Frequent Itemset

Notations

Association Rule

Two Key Measures

K-itemset

3. Association Rule Mining

3.1 Two-Step Process

1. Find all frequent itemsets

2. Generate strong association rules

3.2 Challenges

Combinatorial Explosion

4. Apriori Algorithm

4.1 Strategy

Key Idea

Iterative Level-wise Search

4.2 Apriori Property: Antimonotonicity

Definition

Observation

Antimonotonicity

4.3 Core Steps of Apriori

1. Join Step ⋈\bowtie⋈

2. Prune Step

5. Linear Model (4)

7. Clustering (1)

0개의 댓글

1. Support $s$

2. Confidence $c$

1. Join Step $\bowtie$