Dimensions of Dense Layer
Cascaded structure.
Mxnet is the framework which controls multi GPUs at once. Because there are Include-Top classifier which is parameter in Mxnet, the drawings of the Dense Layers is better to be drawn from bottom to top.
The D
( a → [ l L ] ) T ∈ R 1 × l ν [ l L ] ⋮ ( a → [ 2 ] ) T ∈ R 1 × l ν [ 2 ] ↑ L 2 = ( … ν i [ 2 ] … ) ∣ ( a → [ 1 ] ) T ∈ R 1 × l ν [ 1 ] ↑ L 1 = ( … ν i [ 1 ] … ) ∣ ( x → ) T ∈ R 1 × l x (\overrightarrow{a}^{[l_L]})^{T} \in \R^{1 \times {l_{\nu^{[l_L]}}}}\\ \:\\ \vdots\\ \:\\ (\overrightarrow{a}^{[2]})^{T} \in \R^{1 \times {l_{\nu^{[2]}}}}\\ \:\\ \uparrow \\ L_2 = (\dots \: \nu^{[2]}_{i} \: \dots) \\ |\\ \: \\ (\overrightarrow{a}^{[1]})^{T} \in \R^{1 \times {l_{\nu^{[1]}}}}\\ \: \\ \uparrow \\ L_1 = (\dots \: \nu^{[1]}_{i} \: \dots) \\ | \\ \: \\ (\overrightarrow{x})^{T} \in \R^{1 \times {l_x}} ( a [ l L ] ) T ∈ R 1 × l ν [ l L ] ⋮ ( a [ 2 ] ) T ∈ R 1 × l ν [ 2 ] ↑ L 2 = ( … ν i [ 2 ] … ) ∣ ( a [ 1 ] ) T ∈ R 1 × l ν [ 1 ] ↑ L 1 = ( … ν i [ 1 ] … ) ∣ ( x ) T ∈ R 1 × l x
The number of parameter is
( l x × l ν ) + ( 1 × l ν ) (l_x \times l_{\nu}) + (1 \times l_{\nu}) ( l x × l ν ) + ( 1 × l ν )
( l x + 1 ) × l ν (l_x+1) \times l_{\nu} ( l x + 1 ) × l ν
In Dense Layer, as the network keep passing through the layers, the number of parameter considerably inceases.
Forward Propagation of The Second Dense Layer
The Second Dense Layer
Generalized Dense Layer
[ Generalized Dense Layer ]
⋮ ( a → [ i ] ) T ∈ R 1 × l ν [ i ] ↑ L i = ( … ν i [ i ] … ) ∣ ( a → [ i − 1 ] ) T ∈ R 1 × l ν [ i − 1 ] ⋮ \vdots\\ (\overrightarrow{a}^{[i]})^{T} \in \R^{1 \times {l_{\nu^{[i]}}}}\\ \:\\ \uparrow \\ L_i = (\dots \: \nu^{[i]}_{i} \: \dots) \\ |\\ \: \\ (\overrightarrow{a}^{[i-1]})^{T} \in \R^{1 \times {l_{\nu^{[i-1]}}}}\\ \vdots\\ ⋮ ( a [ i ] ) T ∈ R 1 × l ν [ i ] ↑ L i = ( … ν i [ i ] … ) ∣ ( a [ i − 1 ] ) T ∈ R 1 × l ν [ i − 1 ] ⋮