Convolutional neural networks

Applications
1. Images
  1. Image classification
  2. Image semantic segmentation
  3. Image retrieval
  4. Object detection
2. Language
  1. Text classification
3. Software mining
  1. Software flaw detection
History
1. Receptive field in a single neuron, 1959
2. Neocognitron, about 1980
3. Gradient-based CNN for hand-written character recognition, 1998
4. AlexNet in ImageNet competition, 2012
Hierarchical structure
1. Physical components
  1. Layers of neurons
  2. Learnable weights
  3. Learnable biases
2. Input: raw data
  1. Types
    1. RGB images
    2. Raw audio data
  2. Properties
    1. 3D tensor
      1. H-row
      2. W-height
      3. 3-channels
3. Feed-forward
  1. ConvNet layers
    1. Convolutional layer
      1. Objective: Convolve the filters with the input
      2. Physical components
      3. Input (W*W*depth): 3D volume
      4. Parameters: a set of learnable 3D filters/kernels
      5. Output: 3D volume
      6. General size: [ (W-F+2P)/S+1 ] * depth_out
      7. Unchanged size: P=(F-1)/2 & S=1
      8. Jobs
      9. Local connectivity
      10. Receptive field
      11. w.r.t one neuron
      12. 3D: width*height*depth (=input.depth)
      13. Filter size (F) = receptive field size
      14. Entries in the filter + 1 bias = params for one neuron
      15. Spatial arrangement
      16. Depth: number of filters
      17. Stride (S): how we slide the filter
      18. Zero-padding (P): pad the input volume with zeros around the border
      19. Parameter sharing
      20. Objective: reduce the number of params
      21. depth slice
      22. Same weights and bias sharing per depth slice
    2. Pooling layer
      1. Objectives: Pregressively reduce the spatial size of the representation
      2. Reduce the amound of params
      3. Reduce the computation in the network
      4. Control overfitting
      5. Job
      6. Accept a volume of size (W1 * H1 * D1)
      7. Spatial extent: F
      8. Stride: S
      9. Produce a volume of size (W2 * H2 * D2)
      10. W2 = (W1-F)/S+1
      11. H2 = (H1-F)/S+1
      12. D2=D1
      13. Types
      14. (Common) Max pooling
      15. (Common) F=2, S=2
      16. F=3, S=2, overlapping pooling
      17. Average pooling
      18. L2-norm pooling
      19. Features
      20. No additional params introduced
      21. Not common to pad the input with zero-padding
      22. Gradient routing is efficient in BP
    3. Last layer (FCN): objective function
    4. Non-linear activation function
  2. ConvNet architectures
4. Backpropogation
Optimizing methods
1. Gradient descent
2. Batch gradient descent
3. Stochastic gradient descent
4. (Common) Mini-batch SGD
5. Momentum
6. RMSProp
7. Adam
Hyperparameters