Lecture Week 3. Logistic Regression for Classification
Logistic Regression
: a classification algorithm, used when the value of the target variable is categorical in nature. used when the data has binary output, belongs to one class or another, either a 0 or 1 y∈{0,1}
Mathematical Representation 0≤hθ(x)≤1hθ(x)=g(θTx)=1+eθTx1 g(z)=1+e−z1: logistic function (sigmoid) P(y∣x;θ)=(hθ(x))y(1−hθ(x))1−y if y=0 or 1;1
Max Likelihood x is independent, so the likelihood of all data = the product of the likelihood of each data L(θ)=P(y∣X;θ)=i=1∏mP(yi∣xi;θ) : likelihood l(θ)=logL(θ)=i=1∑myilogh(xi)+(1−yi)log(1−h(xi)) maximize the log likelihood until l′(θ)=0 θj:=θj+α∂θj∂l(θ)=θj+α(yi−hθ(xi))xji : gradient ascent
Min Cost Function cost(hθ(x),y)=−y(log(hθ(x)))−(1−y)log(1−hθ(x))={−log(hθ(x)),ify=1−log(1−hθ(x)),ify=0
Newton's Method
: optimize θ such that f′(θ)=0
θ:=θ−f′(θ)f(θ)
maximize l(θ) taking bigger steps, converge earlier than gradient ascent
let f(θ)=l′(θ)θ(t+1)=θ(t)−l′′(θ(t))l′(θ(t))