-
Network in network
-
Using 1x1 convolutions
- Usage: To increase or decrease the dimension of
-
Bottleneck layer
- Advantages: saving computational cost
- Computational cost: times of multiplications
- Without bottleneck layer, total times of multiplications = 120M
- With bottleneck layer, total times of multiplications = 2.4M + 10M = 12.4M
(around 10% of above)
-
Inception V1
-
Background
-
CNN has a standard structure
- stacked convolutional layers (optionally followed by contrast normalization and maxpooling) are followed by one or more fully-connected layers
- Network in network
- Object Detection: R-CNN
-
Problem
-
Increase the model's size for better performance => Two major drawbacks
-
1. Model is easier to overfit
- Two distinct classes from the 1000 classes of the ILSVRC 2014 classification challenge.
- e.g. Siberian husky & Eskimo dog
-
2. Dramatically increasing the use of computational resources
- If the added capacity is used inefficiently (for example, if most weights end up to be close to zero), then a lot of computation is wasted.
-
Contribution
-
Classification performance (2014)
- GoogLeNet (1st place): 6.67%
- VGG (2nd place): 7.32%
-
Dataset
-
ImageNet 1000 classes
- training: 1.28 million images
- validation: 50k images
- test: 100k images
-
Architecture
- # Parameters: around 6.8M
- Inception module with dimension reductions
- Graph visualization of inception (3a)
-
Auxiliary classifiers
-
Connected to intermediate layers
- 1. Encourage discrimination in the lower stages in the
classifier
- 2. Increase the gradient signal that gets propagated back
- 3. Provide additional regularization
- Only used in training process for training a better model.
Do not use in the test process
-
Question
-
How to compute times of multiplications in CNN?
- For each number in the 28x28x32 output (feature map), it's computed by 5x5x192 times multiplications.
- So the total times of multiplications is (28x28x32) x (5x5x192), where 28x28x32 is the shape of output and 5x5x192 is the shape of one filter.