Basic properties
1. Sum rule (Marginal probability)
P(X=xi)=j∑mP(X=xi,Y=yj)
p(X=xi)=∫jp(X=xi,Y=yj)dy
2. Product rule
P(X=xi,Y=yj)=P(X=xi∣Y=yj)P(Y=yj)
3. Product rule (if they are independent)
P(X=xi,Y=yj)=P(X=xi)P(Y=yj)
4. Bayes theorem
P(X=x∣Y=y)=P(Y=y)P(Y=y∣X=x)P(X=x)
P(X=x∣Y=y)=∑imP(Y=y∣X=xi)(X=xi)P(Y=y∣X=x)P(X=x)
Posterior=NormalisationfactorLikelihood∗Prior
5. Expectation
E[f(x)]=x∑f(x)P(x)
E[f(x)]=∫xf(x)p(x)dx
E[f(x)]≈N1n=1∑Nf(xn)
- This last approximation become exact as it goes to infinity.
6. Conditional Expectation
- It's important to check expectation is calculated over which rv.
Ex[x∣y]=x∑xP(x∣y)
Ex[f(x)∣y]=x∑f(x)P(x∣y)
7. Variance
σ2[f(x)]=Var[f(x)]=E[{f(x)−E(x)}2]
E[{f(x)−E(x)}2]=E[(f(x)2]−E[f(x)]2
σ2[X]=E[X2]−E[X]2
8. Covariance
σ[X,Y]=Cov[X,Y]=EX,Y[(X−E[X])(Y−E[Y])
=EX,Y[X,Y]−E[X]E[Y]
-
If X,Y are independent, Cov[X,Y]=0
-
Cov[X,X]=Var[X]
9. Calculation of Expectation
- when a,b is constant
E[aX+b]=aE[X]+b
EX,Y[X+Y]=E[X]+E[Y∣
- If X,Y are independent.
EX,Y[XY]=E[X]E[Y]
- Trivial thing
E[X2]=σ2[X]+E[X]2
10. Calculation of (Co)variances
Var[aX+b]=a2Var[X]
=σ2[aX+b]=a2σ2[X]
Var[X+Y]=Var[X]+Var[Y]+2⋅Cov[X,Y]
Var[XY]=E[X2]⋅Var[Y]+E[Y2]⋅Var[X]+2⋅E[XY]⋅Cov[X,Y]
- If X,Y are independent
Var[X+Y]=Var[X]+Var[Y]
Var[XY]=E[X2]⋅Var[Y]+E[Y2]⋅Var[X]
11. Correlation coefficient
ρX,Y=Var[X]Var[Y]Cov[X,Y]=σ[X]σ[Y]σ[X,Y]
12. Law of iterated expectation
EY[EX[X∣Y]]=E[X]
13. Independence
P(X,Y)=P(X)P(Y)
- Independece → not correlated
- Not correlated = not independent
- Because they could be non-linearly correlated.
Distribution properties
1. Univariate Gaussian
- x is data point (scalar)
- μ is mean (scalar)
- σ2 is variance (scalar)
N(x∣μ,σ2)=2πσ21exp{−2σ21(x−μ)2}
2. Multivariate Gaussian
- xd×1 is random vector
- μd×1 is mean vector of all training data set Xd×N={x1,x2,...,xn}
- Σd×d is covariance matrix
N(x∣μ,Σ)=(2π)d/2Σ1/21exp{−21(x−μ)TΣ−1(x−μ)}
3. Bernoulli
- RV x∈{0,1}
- μ is probability for x=1
Bern(x∣μ)=μx(1−μ)1−x
- E[x]=μ
- σ2[x]=μ(1−μ)
4. Multinoulli
- Bernoulli for Category >2
- RV y∈{0,…,c} for C classes
- One hot vector y=[0,…,yc=1,…,0] is used for convenience.
- pc is probability of p(y=c∣p)
- p is pdf for class c, ∑c=1Cpc=1
Multinouli(y∣p)=c=1∏Cpcyc
5. Binomial
- RV x∈{0,1}
- N: the number of observation
- m: RV, the number of getting x=1 out of N trial
- p: the probability for x=1
Bin(m∣N,p)=(mN)pm(1−p)N−m
- E[m]=Nμ
- σ2[m]=Nμ(1−μ)
6. Multinomial
- Binomial for Category >2
- RV y∈{0,…,c} for C classes
- One hot vector y=[0,…,yc=1,…,0] is used for convenience.
- pc is probability of p(y=c∣p)
- p is pdf for class c, ∑c=1Cpc=1
- mc: RV, the number of getting class c out of N trial
Multinomial(m1,…,mC∣p,N)=(m1…mCN)c=1∏CpcMc
(m1…mCN)=m1!m2!…mC!N!