Lecture 04. Perceptron & Generalized Linear Model

cryptnomy·2022년 11월 22일

CS229: Machine Learning

목록 보기

4/18

\theta_j \leftarrow \theta_j + \alpha(y^{(i)} - h_\theta(x^{(i)}))x^{(i)}_j

where $h_\theta(x)=g(\theta^T x)$ and $g(z)=\begin{cases} 1 & z\geq 0 \\ 0 & z \lt 0 \end{cases}$ .

whose pdf is written as

p(y; \eta) = b(y)\exp{\left(\eta^T T(y) - a(\eta)\right)}

where

$y$ : Data

$\eta$ : Natural parameter

$T(y)$ : Sufficient statistic

$b(y)$ : Base measure

$a(\eta)$ : Log-partition

Bernoulli

$\phi =$ probability of event

\begin{aligned} p(y;\phi) &= \phi^y (1-\phi)^{1-y} \\ &= \exp{\left(\log{\left(\phi^y (1-\phi)^{1-y}\right)}\right)} \\ &= 1 \cdot \exp{\left(\log{\left(\frac{\phi}{1-\phi}\right)y+\log{(1-\phi)}}\right)} \end{aligned}

where

\begin{aligned} b(y) &= 1 \\ T(y) &= y \\ \eta &= \log{\left(\frac{\phi}{1-\phi}\right)} \Rightarrow \phi = \frac{1}{1+e^{-\eta}} \\ a(\eta) &= -\log{(1-\phi)} \Rightarrow -\log{\left(1-\frac{1}{1+e^{-\eta}}\right)}=\log{(1+e^\eta)}. \end{aligned}

Gaussian (with fixed variance)

Assume $\sigma^2=1$ .

\begin{aligned} p(y;\mu) &= \frac{1}{\sqrt{2\pi}}\exp{\left(-\frac{(y-\mu)^2}{2}\right)} \\ &= \frac{1}{\sqrt{2\pi}} \exp{\left(-\frac{y^2}{2}\right)}\exp{\left(\mu y-\frac{1}{2}\mu^2\right)}\end{aligned}

where

\begin{aligned} b(y) &= \frac{1}{\sqrt{2\pi}}\exp{\left(-\frac{y^2}{2}\right)} \\ T(y) &= y \\ \eta &= \mu \\ a(\eta) &= \frac{\mu^2}{2} = \frac{\eta^2}{2}. \end{aligned}

Properties

cf. Probability distributions

Real - Gaussian

Binary - Bernoulli

Count - Poisson

$R^2$ - Gamma, Exponential

Distribution - Beta, Dirichlet … Bayesian

Assumptions / Design choices

$y|x;\theta\sim\text{Exponential Family}(\eta)$
$\eta=\theta^T x \hspace{1cm} \theta\in\mathbb{R^n}, x\in\mathbb{R^n}$
Test time: output $\mathbb{E}[y|x;\theta]$

$\Rightarrow h_\theta(x) = \mathbb{E}[y|x;\theta]$

Train time

Test time

$\max\limits_\theta\ \log{p(y^{(i)}; \theta^T x^{(i)})}$

$\mathbb{E}[y;\eta]=\mathbb{E}[y|x;\theta]=h_\theta(x)$

GLM Training

Learning update rule

\theta_j \leftarrow \theta_j + \alpha\left(y^{(i)}-h_\theta(x^{(i)})\right)x_j^{(i)}

Terminology

$\eta$ : natural parameter

$\mu=\mathbb{E}[y;\eta]=g(\eta)$ : canonical response function

$\eta = g^{-1}(\mu)$ : canonical link function

$g(\eta)=\frac{\partial}{\partial \eta}a(\eta)$

3 parameterizations

Model param.	Natural param.	Canonical param.
$\theta$	$\eta$	$\phi\sim$ Bernoulli
$\uparrow$ Learn	$-g$ $\rightarrow$	$\mu,\sigma^2\sim$ Gaussian
Design choice $-\theta^Tx\rightarrow$	$\leftarrow g^{-1}-$	$\lambda\sim$ Poisson