logistic regression
is a very "shallow" model.5 hidden layers
is a much "deeper" model.One of the ways to increase your odds of having a bug-free implementation is to think very systematic and carefully about the matrix dimension's you're working with.
So first,
what is a deep network computing?
If you're building a system for face recognition,
here's what a deep neural network could be doing.
Perhaps you input a picture of a face
Then the first layer of the neural network
you can think of as maybe being a feature detector or an edge detector.
Now, let's think about where the edges in this picture by grouping together pixels to form edges.
It can then detect the edges and group edges together to from parts of faces.
And then, finally, by putting together different parts of faces like an eye or a nose or an ear or a chin,
it cat then try to recognize or detect different types of faces.
So intuitively, you can think of the earlier layers of the neural network
as detecting simple functions, like edges.
And then composing them together in the later layers of a neural network
so that it can learn more and more complex functions.
it's hard to visualize speech but
So deep neural network with multiple hidden layers
might be able to have the earlier layers learn these lower level simple features of the input such as where the edge is,
and then have the later deeper layers then put together the simpler things it's detected
in order to detect more complex things such as detect faces or detect words or phrases or sentences.
Informally : There are functions you can compute with a "small(the number of hidden units is small)" L-layer deep neural network.
But if you try to compute the same functino with a shallower network,
so if there aren't enough hidden layers,
then you might require exponentially more hidden units to compute.
Let's say you're trying to compute the exlusive OR.
But now,
if you are not allowed to use a neural network with multiple hidden layers with,
if you're forced to compute this function with just one hidden layer,
Then in order to compute this XOR function, hidden layer will need to be exponentially large, because essentially, you need to exhaustively enumerate our possible configurations.
We've already seen the basic building blocks of forward propagation and back propagation
, the key components you need to implement a deep neural network.
Let's see how you can put these components to build our deep net.
Here's a network of a few layers.
Let's pick one layer.
And look into the computations focusing on just that layer for now.
So just to summarize
So if you can implement these two functions
then the basic computation of the neural network will be as follows.
So one iteration of gradient descent for our neural network involves :
One more informational detail.
Conceptually, it will be useful to think of the cache
as storing the value of for the backward functions.
When you implement this,
you find that the cache may be a convenient way to get to this value of the parameters of into the backward function as well.
this exercise you actually store in your cache to as well as
Ng just finds it a convenient way to just get the parameters,
copy to where you need to use them later when you're computing back propagation.
(that's an implementational detail that we see when we do the programming exercise.)
Parameters
: and
Hyperparameters
So when you're training a deep net for your own application you find that there may be a lot of possible settings for the hyperparameters that you need to just try out.
So applying deep learning today is a very empirical process where often you might have an idea.
It turns out that when you're starting on the new application, you should find it very difficult to know in advance exactly what is the best value of the hyperparameters.
So, what often happens is you just have to try out many different values and go around this cycle your try out some values, really try five hidden layers.
With this many number of hidden units implement that, see if it works,
and then iterate.
This is one area where deep learning research is still advancing.
(In the second course, we'll also give some suggestions for how to systematically explore the space of hyper parameters)
In this picture of a biological neuron,
this neuron, which is a cell in your brain,
receives electric signals from other neurons, or may from other neurons
does a simple thresholding computation
and then if this neuron fires,
it sends a pulse of electricity down the axon(축삭), perhaps to other neurons.
There is a very simplistic analogy between a single neuron in a neural network,
and a biological neuron.
But Ng thinks that today even neuroscientists have almost no idea
what even a single neuron is doing.
A single neuron appears to be much more complex than we are able to characterize with neuroscience,
and while some of what it's doing is a little bit like logistic regression,
there's still a lot about what even a single neuron does that no one human today understands.
logistic regression의 accuracy가 왜 작은가?
그러면 곡선으로 그어야 할 것이다.
라는 함수에
2 hidden layers 이상 + nonlinear activation function이면,
더욱 powerful한 NN이 될 수 있다.
== planar data를 더 잘 classification 할 수 있다.
np.random.rand vs np.random.randn
만약 weight initialization으로 np.random.rand() * 0.01을 사용한다면
0 ~ 0.01 사이의 값만 나오기 때문에 learning이 제대로 되지 않을 것이다.
nonlinear function을 사용했는데 왜 직선으로 classification 되었을까?