Parameters of Dense Layer
( x → ) T = ( x 1 … x i ) (\overrightarrow{x})^T = (x_1 \dots x_i) ( x ) T = ( x 1 … x i )
goes through
… L i … \dots \: {L}_i \: \dots … L i …
, which is composed of
… ν i [ i L ] … \dots \: {\nu}_{i}^{\:[i_L]} \: \dots … ν i [ i L ] …
, and which is composed of
… w → i , i ν [ i L ] ⋯ ∈ R l w × 1 \dots \: \overrightarrow{w}_{i,\: i_{\nu}}^{[i_L]} \: \dots \in \R^{l_w \times 1} … w i , i ν [ i L ] ⋯ ∈ R l w × 1
… b i ν [ i L ] ⋯ ∈ R 1 × 1 \: \dots \: {b}_{\: \: i_{\nu}}^{[i_L]} \: \dots \in \R^{1 \times 1} … b i ν [ i L ] ⋯ ∈ R 1 × 1
There are two reasons why the weights is arrayed in column vector : One is that Algebra sets column vecters as the default, the other is that especially dense layer read the vector of weights in the column type.
Weighted Matrix and Bias Vector
combine the column vectors of the weights as matrix.
The shape of the weight matrix is that the length of Input times the length of Output.
W [ i L ] ∈ R l w × l ν {W}^{[i_L]} \in \R^{l_w \times l_{\nu}} W [ i L ] ∈ R l w × l ν
b → [ i L ] ∈ R 1 × l ν \: \overrightarrow{b}^{[i_L]} \in \R^{1 \times l_{\nu}} b [ i L ] ∈ R 1 × l ν
Forward Propagation of Dense Layer
a i [ i L ] = ν i [ i L ] ( ( x → ) T ; w → i , i ν [ i L ] , b i ν [ i L ] ) {a}_i^{[i_L]} = \nu_i^{\:[i_L]}((\overrightarrow{x})^T; \: \overrightarrow{w}_{i,\: i_{\nu}}^{[i_L]}, \: b_{\: \: i_{\nu}}^{[i_L]}) a i [ i L ] = ν i [ i L ] ( ( x ) T ; w i , i ν [ i L ] , b i ν [ i L ] )
( a → [ i L ] ) T = ( x → ) T ⋅ W [ i L ] + b → [ i L ] (\overrightarrow{a}^{[i_L]})^T = (\overrightarrow{x})^T \cdot {W}^{[i_L]} + \overrightarrow{b}^{[i_L]} ( a [ i L ] ) T = ( x ) T ⋅ W [ i L ] + b [ i L ]
( x → ) T ∈ R 1 × l x (\overrightarrow{x})^T \in {\R}^{1 \times {l_x}} ( x ) T ∈ R 1 × l x
becomes
( a → ) T ∈ R 1 × l ν (\overrightarrow{a})^T \in {\R}^{1 \times {l_{\nu}}} ( a ) T ∈ R 1 × l ν