Convolutional Neural Networks (CNNs) are a class of deep neural networks that are primarily used to analyze visual imagery. They employ a mathematical operation known as convolution and have proven extremely effective for various image and video recognition, recommender systems, image classification, medical image analysis, and natural language processing tasks.
The core building block of a CNN is the convolutional layer that applies a convolution operation to the input passing the result to the next layer. Mathematically, a convolution is a combined integration of two functions and expresses how the shape of one is modified by the other. In the context of a CNN, it's used to blend the input image with a filter (or kernel) to extract features such as edges:
Here, is the input image and is the kernel. The output is known as the feature map.
Following convolution, an activation function like ReLU (Rectified Linear Unit) is applied to introduce non-linearity into the model, allowing it to learn more complex patterns. ReLU is defined as:
Pooling (or subsampling or downsampling) reduces the dimensionality of each feature map but retains the most important information. Max pooling, for example, outputs the maximum value of the portion of the image covered by the kernel, a simple way to down-sample the input:
where is the window and are the elements of the input feature map within that window.
Finally, CNNs end with fully connected layers where neurons have full connections to all activations in the previous layer, as seen in regular Neural Networks. Their outputs are computed via matrix multiplication with the input and a bias offset:
In the forward pass, each convolutional, activation, and pooling layer processes the input and passes it to the next layer. The final output is obtained after the fully connected layer.
During backpropagation, the network adjusts its parameters proportionally to the error in its predictions. The process involves:
Loss Function Calculation: Compute the loss, which measures the prediction error. Common loss functions include mean squared error for regression tasks and cross-entropy loss for classification tasks.
Gradient Calculation: Use the chain rule to recursively calculate the derivatives of the loss function with respect to each weight in the network, moving backward from the output layer to the input layer.
Parameter Update: Update the weights using a learning rate . A typical update rule is:
Iteration: Repeat the forward and backward pass for each batch of training data until the network performs satisfactorily.
ConvBlock2D
-> Flatten
-> DenseBlock
-> Dense
in_channels_list
: list[int]
int
in_features_list
: list[int]
int
out_channels
: int
out_features
: int
filter_size
: int
activation
: Activation
optimizer
: Optimizer
loss
: Loss
initializer
: InitStr
, default = Nonepadding
: Literal['same', 'valid']
, default = ‘same’stride
: int
, default = 1do_batch_norm
: bool
, default = Truemomentum
: float
, default = 0.9do_pooling
: bool
, default = Truepool_filter_size
: int
, default = 2pool_stride
: int
, default = 2pool_mode
: Literal['max', 'avg']
, default = ‘max’do_dropout
: bool
, default = Truedropout_rate
: float
, default = 0.5batch_size
: int
, default = 100n_epochs
: int
, default = 100learning_rate
: float
, default = 0.001valid_size
: float
, default = 0.1lambda_
: float
, default = 0.0early_stopping
: bool
, default = Falsepatience
: int
, default = 10shuffle
: bool
, default = TrueCNNs are applied in numerous domains including:
- LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86.11 (1998): 2278-2324.
- Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet classification with deep convolutional neural networks." NIPS. 2012.
- Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. "Deep Learning." MIT Press, 2016. link