The VC Dimension

김민재·2024년 6월 5일

ML

목록 보기

11/17

저번에는 growth function을 polynomial하게 만들수있는지와, 아주 얕게 VC Inequality에 대해 보았다.

이번에는 가장 중요한 VC dimension에 대해 보겠다.

Definition of VC dimension

The VC dimension of a hypothesis set $\mathcal{H}$ , denoted by ${\color{Red} d_{VC}}(\mathcal{H})$ , is
the largest value of $N$ for which $m_\mathcal{H}(N) = 2^N$

즉, growth function을 설명해주는 것이다.
다르게 말하면, hypothesis가 분류할수있는 가장 많은 점의 갯수를 뜻한다.

아래의 식을 보면 이해가 빠르게 될것이다.

N \leq {\color{Red} d_{VC}}(\mathcal{H}) \Rightarrow \mathcal{H} \text{ can shatter }N points\\

{\color{Blue} k} > {\color{Red} d_{VC}}(\mathcal{H}) \Rightarrow {\color{Blue} k}\text{ is a break point for }\mathcal{H}

이러면 당연하게도, $d_{VC} = k-1$ 이다.

그렇다면 저번에 우리가 growth function을 break point ${\color{Blue} k}$ 에 대해

m_\mathcal{H}(N)\leq\sum_{i=1}^{{\color{Blue} k-1}}\binom{N}{i}

표현을 하였는데, 이 식을 VC dimension에 대해 바꾸고 싶어졌다.

m_\mathcal{H}(N)\leq\sum_{i=1}^{{\color{Red} d_{VC}}}\binom{N}{i}

왜 이렇게 되냐면, 위에서 보았듯이 $k-1$ 이 $d_{VC}$ 이기 때문이다.

그러면 이제는 익숙한 그 세가지 예제를 살펴보자.

$\mathcal{H}$ is positive rays:
${\color{Red} d_{VC}} = 1$
$\mathcal{H}$ is 2D perceptrons:
${\color{Red} d_{VC}} = 3$
$\mathcal{H}$ is convex sets:
${\color{Red} d_{VC}} = \infin$

VC dimension and learning

그렇다면 VC dimension은 우리에게 어떤것을 말해주는것일까?
VC dimension을 통해 bound된다는것은

{\color{Red} d_{VC}}(\mathcal{H})\text{ is finite}\Rightarrow g\in\mathcal{H}\text{ will generalize}

다음과 같은 의미를 지닌다.

근데 VC dimnesion에 대해서 중요한 몇가지를 생각해야된다.

VC dimenstion은

Independent of the ${\color{Blue} \text{learning algorithm}}$
Independent of the ${\color{Blue} \text{input distribution}}$
Independent of the ${\color{Blue} \text{target function}}$

하고, 그림에서 진하게 표시된 부분에만 dependent하다.

VC dimension of perceptrons

이제 퍼셉트론으로 예시를 들면서 한가지 생각을 해볼것이다.

이러한 2D perceptron이 있을때, $d_{VC}$ 는 k-1이므로 3이다.
여기까지는 설명했기때문에 명확하다.

근데, 차원 $d=2$ 인데, ${\color{Red} d_{VC}}=3$ 이다.
그렇다면, 과연 ${\color{Red}d_{VC}} = d+1$ 일까?라는 생각을 할수있다.

이를 증명하기 위해서,

{\color{Red}d_{VC}}\leq d+1\\ {\color{Red}d_{VC}} \geq d+1

이 두식을 모두 만족한다면, 우리는 ${\color{Red}d_{VC}} = d+1$ 이라고 말할수있다!

이제 증명을 해보자.

먼저, d차원 벡터 x가 존재한다 가정하면, 그림과 같이 $d+1$ 개의 N이 된다.

그리고 이 $X$ 에 어떤 weight를 곱해서, 우리가 $y$ 를 다음처럼 $+1, -1$ 로 구분한다고 하자.

그러면 이 $\mathbf{w}$ 를 찾을수있고, $\mathbf{w} = X^{-1}y$ 로 나타낼수있다.

이를 통해 $d+1$ point를 나눌수있으므로, 실제로 내 ${\color{Red}d_{VC}} \geq d+1$ 이 될것이다.

그다음, ${\color{Red}d_{VC}} \leq d+1$ 를 증명해보자

${\color{Red}d_{VC}}$ 가 $d+2$ 의 점을 분류하지 못한다는 것을 증명하면, 위의 식이 당연해진다.

For any $d+2$ points,

\mathbf{x}_1,\cdots,\mathbf{x}_{d+1},\mathbf{x}_{d+2}

차원보다 더 많은 점들이 주어지면,

\mathbf{x}_j = \sum_{i\neq j}{\color{Red}a_i}\mathbf{x}_i

이고, $a_i$ are not all zeors일때, 당연하게도 모든 점들은 linearly independent하지 않는다.
따라서 $d+2$ 개의 점에 대해서는 표현할수없다!

Proof

\mathbf{x}_j = \sum_{i\neq j}a_i\mathbf{x}_i \Rightarrow {\color{Purple} \mathbf{w}^T}\mathbf{x}_j = \sum_{i\neq j}a_i{\color{Purple} \mathbf{w^T}}\mathbf{x}_i

If $y_i = \text{sign}({\color{Purple} \mathbf{w^T}}\mathbf{x}_i) = \text{sign}(a_i)$ , then $a_i{\color{Purple} \mathbf{w}^T}\mathbf{x}_i > 0$

This forces ${\color{Purple} \mathbf{w}^T}\mathbf{x}_j = \sum_{i\neq j}a_i{\color{Purple} \mathbf{w}^T}\mathbf{x}_i > 0$

Therefore, $y_j = \text{sign}({\color{Purple} \mathbf{w}^T}\mathbf{x}_j) = +1$

즉, 두 조건을 모두 만족하기때문에, ${\color{Red}d_{VC}} = d+1$ 이다.
perceptron에서 VC dimension이 $d+1$ 이라는게 무엇을 의미하는것일까?
물론 VC dimension의 내가 가를수있는 점의 갯수라는 점도 있고, 더하기
$\to$ Number of parameters!