Lecture video link: https://youtu.be/MfIjxPh6Pys
Outline
- Logistic Regression
- Neural Networks
Deep Learning
- computational power
- data available
- algorithms
Logistic Regression
Goal: Find cats in image {1→presence of a cat0→absence of a cat
(Source: https://youtu.be/MfIjxPh6Pys 7 min. 15 sec.)
↑(1,1)y^=σ(θTx)=σ(w(1,12288)↓↑(64×64×3,1)x+b).
x … flattened input.
-
Initialize w,b (weights and bias).
-
Find the optimal w,b.
→ L=−[ylogy^+(1−y)log(1−y^)]
{w←w−α∂w∂Lb←b−α∂b∂L
-
Use y^=σ(wx+b) to predict.
To remember
- neuron = linear + activation.
- model = architecture + parameters.
Goal 2.0: Find cat / lion / iguana in images.
(Source: https://youtu.be/MfIjxPh6Pys 17 min. 56 sec.)
Notation
a[⋅] - a layer number
a(⋅) - a neuron
Q. What dataset do you need when you train this logistic regression?
A. Images and labels of the column vector form. E.g. ⎣⎢⎡catlioniguana⎦⎥⎤→⎣⎢⎡110⎦⎥⎤.
Q. Is this network robust if different animals are present in the same picture?
A. Yes. Three neurons don’t communicate with each other. So we can totally train them independently from each other.
You don’t need to tell them everything. If you have enough data, they’re going to figure it out.
MyQ. Can images overlap each other?
A. Probably yes.
Goal 3.0: add a constraint that there is a unique animal on an image.
(Source: https://youtu.be/MfIjxPh6Pys 27 min. 44 sec.)
(called “softmax multi-class regression.”)
The loss function:
L3N=−k=1∑3[yklogy^k+(1−yk)log(1−y^k)].
Note. The softmax regression needs a different loss function and a different derivative.
Cross-entropy loss:
LCE=−k=1∑3yklogy^k.
Neural Networks
Goal: image → cat (1) vs. no cat (0)
(Source: https://youtu.be/MfIjxPh6Pys 42 min. 37 sec.)
Q. How many parameters does this network have?
A. (3N+3)+(2×3+2)+(2×1+1).
Definition
Layer … a cluster of neurons that are not connected to each other.
Hidden layer (2nd layer in the picture above)
Q. Why word “hidden”?
A. The inputs and outputs are hidden from this layer.
Interpretation of layers
- Neurons in the 1st layer … to understand the fundamental concepts of the image such as the edges.
- Neurons in the 2nd layer … to use the edges from the previous layer to figure out ears or a mouth that are more structurally complex objects.
- Neuron in the 3rd layer … to identify cat image.
House price prediction
- number of bedrooms
- size
- zip code
- wealth
(Source: https://youtu.be/MfIjxPh6Pys 48 min. 46 sec.)
Rather than explicitly representing relations between features, we construct the first layer as a fully-connected layer.
(Source: https://youtu.be/MfIjxPh6Pys 50 min. 12 sec.)
cf. neural network ~ black box model ~ end-to-end learning
Propagation equations
z[1]a[1]z[2]a[2]z[3]a[3]=w[1]x+b[1]=σ(z[1])=w[2]a[1]+b[2]=σ(z[2])=w[3]a[2]+b[3]=σ(z[3])
Q. What happens for an input batch of m examples?
X=⎝⎜⎛∣x[1]∣∣x[2]∣⋯∣x[m]∣⎠⎟⎞
→ parallelize equations.
↑(3,m)z[1]=w[1]x+↑(3,1)b[1]
→ Problem: Size mismatch
→ Solution? Broadcasting - duplicate b[1] in column-wise by m times.
cf. NumPy library automatically support broadcasting.
Q. How is this network different from principal component analysis?
A. This is a kind of supervised learning algorithm used to predict housing prices, whereas principal component analysis doesn’t predict anything.
Q. Day-night classification vs. cat classification. Which one is harder?
A. Cat. Because there are many breeds of cats while there are not many breeds of nights. 😝
cf. Challenge in the day and night classification? When you are to figure it out indoors. You can imagine there’s a tiny window somewhere in the picture, and the model should be able to tell that it is the day or night.
→ More data → The more data you need in order to figure out the output, the deeper the network should be.
Optimizing w[1],w[2],w[3],b[1],b[2],b[3].
Define loss/cost function.
J(y^,y)=m1i=1∑mL(i)withL(i)=−[y(i)logy^(i)+(1−y(i))log(1−y^(i))].
Backward Propagation
∀l=1,⋯,m,{w[l]←w[l]−α∂w[l]∂Jb[l]←b[l]−αb[l]∂J
, e.g.,
∂w[3]∂J∂w[2]∂J∂w[1]∂J=∂a[3]∂J∂z[3]∂a[3]∂w[3]∂z[3]=∂z[3]∂J∂a[2]∂z[3]∂z[2]∂a[2]∂w[2]∂z[2]=∂z[2]∂J∂a[1]∂z[2]∂z[1]∂a[1]∂w[1]∂z[1].