Inception Mind Map

Network in network
1. Using 1x1 convolutions
2. Usage: To increase or decrease the dimension of
Bottleneck layer
1. Advantages: saving computational cost
2. Computational cost: times of multiplications
3. Without bottleneck layer, total times of multiplications = 120M
4. With bottleneck layer, total times of multiplications = 2.4M + 10M = 12.4M (around 10% of above)
Inception V1
1. Background
  1. CNN has a standard structure
    1. stacked convolutional layers (optionally followed by contrast normalization and maxpooling) are followed by one or more fully-connected layers
  2. Network in network
  3. Object Detection: R-CNN
2. Problem
  1. Increase the model's size for better performance => Two major drawbacks
    1. 1. Model is easier to overfit
      1. Two distinct classes from the 1000 classes of the ILSVRC 2014 classification challenge.
      2. e.g. Siberian husky & Eskimo dog
    2. 2. Dramatically increasing the use of computational resources
      1. If the added capacity is used inefficiently (for example, if most weights end up to be close to zero), then a lot of computation is wasted.
3. Contribution
  1. Classification performance (2014)
    1. GoogLeNet (1st place): 6.67%
    2. VGG (2nd place): 7.32%
4. Dataset
  1. ImageNet 1000 classes
    1. training: 1.28 million images
    2. validation: 50k images
    3. test: 100k images
5. Architecture
  1. # Parameters: around 6.8M
  2. Inception module with dimension reductions
  3. Graph visualization of inception (3a)
  4. Auxiliary classifiers
    1. Connected to intermediate layers
      1. 1. Encourage discrimination in the lower stages in the classifier
      2. 2. Increase the gradient signal that gets propagated back
      3. 3. Provide additional regularization
    2. Only used in training process for training a better model. Do not use in the test process
Question
1. How to compute times of multiplications in CNN?
  1. For each number in the 28x28x32 output (feature map), it's computed by 5x5x192 times multiplications.
  2. So the total times of multiplications is (28x28x32) x (5x5x192), where 28x28x32 is the shape of output and 5x5x192 is the shape of one filter.