Lecture 11. Introduction to Neural Networks

cryptnomy·2022년 11월 24일

ML cs229 machine learning

CS229: Machine Learning

목록 보기

11/18

Lecture video link: https://youtu.be/MfIjxPh6Pys

Outline

Logistic Regression
Neural Networks

Deep Learning

computational power
data available
algorithms

Logistic Regression

Goal: Find cats in image $\begin{cases}1\rightarrow\text{presence of a cat}\\0\rightarrow\text{absence of a cat}\end{cases}$

(Source: https://youtu.be/MfIjxPh6Pys 7 min. 15 sec.)

\begin{aligned}\underset{\mathclap{\substack{\uparrow\\(1,1)}}}{\hat y}&=\sigma(\theta^Tx)\\&=\sigma(\overset{\mathclap{\substack{(1,12288)\\\downarrow}}}w\underset{\mathclap{\substack{\uparrow\\(64\times64\times3,1)}}}x+b).\end{aligned}

$x$ … flattened input.

Initialize $w, b$ (weights and bias).
Find the optimal $w, b$ .

→ $\mathcal{L}=-\left[y\log\hat y+(1-y)\log(1-\hat y)\right]$

$\begin{cases}w\leftarrow w-\alpha\frac{\partial\mathcal{L}}{\partial w}\\b\leftarrow b-\alpha\frac{\partial\mathcal{L}}{\partial b}\end{cases}$
Use $\hat y=\sigma(wx+b)$ to predict.

To remember

neuron = linear + activation.
model = architecture + parameters.

Goal 2.0: Find cat / lion / iguana in images.

(Source: https://youtu.be/MfIjxPh6Pys 17 min. 56 sec.)

Notation

$a^{[\cdot]}$ - a layer number

$a_{(\cdot)}$ - a neuron

Q. What dataset do you need when you train this logistic regression?

A. Images and labels of the column vector form. E.g. $\begin{bmatrix}\text{cat}\\\text{lion}\\\text{iguana}\end{bmatrix}\rightarrow\begin{bmatrix}1\\1\\0\end{bmatrix}$ .

Q. Is this network robust if different animals are present in the same picture?

A. Yes. Three neurons don’t communicate with each other. So we can totally train them independently from each other.

You don’t need to tell them everything. If you have enough data, they’re going to figure it out.

MyQ. Can images overlap each other?

A. Probably yes.

Goal 3.0: add a constraint that there is a unique animal on an image.

(Source: https://youtu.be/MfIjxPh6Pys 27 min. 44 sec.)

(called “softmax multi-class regression.”)

The loss function:

\mathcal{L}_{3N}=-\sum_{k=1}^3\left[y_k\log\hat y_k+(1-y_k)\log(1-\hat y_k)\right].

Note. The softmax regression needs a different loss function and a different derivative.

Cross-entropy loss:

\mathcal{L}_{CE}=-\sum_{k=1}^3y_k\log\hat y_k.

Neural Networks

Goal: image → cat (1) vs. no cat (0)

(Source: https://youtu.be/MfIjxPh6Pys 42 min. 37 sec.)

Q. How many parameters does this network have?

A. $(3N+3)+(2\times3+2)+(2\times1+1)$ .

Definition

Layer … a cluster of neurons that are not connected to each other.

Hidden layer (2nd layer in the picture above)

Q. Why word “hidden”?

A. The inputs and outputs are hidden from this layer.

Interpretation of layers

Neurons in the 1st layer … to understand the fundamental concepts of the image such as the edges.
Neurons in the 2nd layer … to use the edges from the previous layer to figure out ears or a mouth that are more structurally complex objects.
Neuron in the 3rd layer … to identify cat image.

House price prediction

number of bedrooms
size
zip code
wealth

(Source: https://youtu.be/MfIjxPh6Pys 48 min. 46 sec.)

Rather than explicitly representing relations between features, we construct the first layer as a fully-connected layer.

(Source: https://youtu.be/MfIjxPh6Pys 50 min. 12 sec.)

cf. neural network ~ black box model ~ end-to-end learning

Propagation equations

\begin{aligned}z^{[1]}&=w^{[1]}x+b^{[1]}\\a^{[1]}&=\sigma\left(z^{[1]}\right)\\z^{[2]}&=w^{[2]}a^{[1]}+b^{[2]}\\a^{[2]}&=\sigma\left(z^{[2]}\right)\\z^{[3]}&=w^{[3]}a^{[2]}+b^{[3]}\\a^{[3]}&=\sigma\left(z^{[3]}\right)\end{aligned}

Q. What happens for an input batch of $m$ examples?

X=\begin{pmatrix}\vert&\vert&&\vert\\x^{[1]}&x^{[2]}&\cdots& x^{[m]}\\\vert&\vert&&\vert\end{pmatrix}

→ parallelize equations.

\underset{\mathclap{\substack{\uparrow\\(3,m)}}}{z^{[1]}}=w^{[1]}x+\underset{\mathclap{\substack{\uparrow\\(3,1)}}}{b^{[1]}}

→ Problem: Size mismatch

→ Solution? Broadcasting - duplicate $b^{[1]}$ in column-wise by $m$ times.

cf. NumPy library automatically support broadcasting.

Q. How is this network different from principal component analysis?

A. This is a kind of supervised learning algorithm used to predict housing prices, whereas principal component analysis doesn’t predict anything.

Q. Day-night classification vs. cat classification. Which one is harder?

A. Cat. Because there are many breeds of cats while there are not many breeds of nights. 😝

cf. Challenge in the day and night classification? When you are to figure it out indoors. You can imagine there’s a tiny window somewhere in the picture, and the model should be able to tell that it is the day or night.

→ More data → The more data you need in order to figure out the output, the deeper the network should be.

Optimizing $w^{[1]}, w^{[2]}, w^{[3]}, b^{[1]}, b^{[2]}, b^{[3]}$ .