RNN consists of three modules.
1) Region Proposal generation of category-independent region proposals
2) Feature extraction large convolutional neural network that extracts a fixed-length feature vector form each region
3) a set of classspecific linear linear SVMs
2. Object detection with R-CNN
Region proposals : methods for generagting category-indepoendent region proposal
Feature extraction : fixed 227x227 pixel size, regardless of the size or aspect ratio of the candidate region, we warp all pixels in a tight bounding box around it to the required size
for each class, score each extracted feature vector using the SVM trained for the class.
apply a greedy non-maximum suppression (for each class independently) that rejects a region if it has an intersection-over-union(IoU) overlap with a higher scoring selected region larger than a alearned threshold
Run-time analysis
1) all CNN parameters are shared across all categories
2) the feature vectors computed by the CNNare low-dimensional when compared to other common approaches, such as spatial pyramids with bag-of-visual-word encodings