1. Applications
    1. Images
      1. Image classification
      2. Image semantic segmentation
      3. Image retrieval
      4. Object detection
    2. Language
      1. Text classification
    3. Software mining
      1. Software flaw detection
  2. History
    1. Receptive field in a single neuron, 1959
    2. Neocognitron, about 1980
    3. Gradient-based CNN for hand-written character recognition, 1998
    4. AlexNet in ImageNet competition, 2012
  3. Hierarchical structure
    1. Physical components
      1. Layers of neurons
      2. Learnable weights
      3. Learnable biases
    2. Input: raw data
      1. Types
        1. RGB images
        2. Raw audio data
      2. Properties
        1. 3D tensor
          1. H-row
          2. W-height
          3. 3-channels
    3. Feed-forward
      1. ConvNet layers
        1. Convolutional layer
          1. Objective: Convolve the filters with the input
          2. Physical components
          3. Input (W*W*depth): 3D volume
          4. Parameters: a set of learnable 3D filters/kernels
          5. Output: 3D volume
          6. General size: [ (W-F+2P)/S+1 ] * depth_out
          7. Unchanged size: P=(F-1)/2 & S=1
          8. Jobs
          9. Local connectivity
          10. Receptive field
          11. w.r.t one neuron
          12. 3D: width*height*depth (=input.depth)
          13. Filter size (F) = receptive field size
          14. Entries in the filter + 1 bias = params for one neuron
          15. Spatial arrangement
          16. Depth: number of filters
          17. Stride (S): how we slide the filter
          18. Zero-padding (P): pad the input volume with zeros around the border
          19. Parameter sharing
          20. Objective: reduce the number of params
          21. depth slice
          22. Same weights and bias sharing per depth slice
        2. Pooling layer
          1. Objectives: Pregressively reduce the spatial size of the representation
          2. Reduce the amound of params
          3. Reduce the computation in the network
          4. Control overfitting
          5. Job
          6. Accept a volume of size (W1 * H1 * D1)
          7. Spatial extent: F
          8. Stride: S
          9. Produce a volume of size (W2 * H2 * D2)
          10. W2 = (W1-F)/S+1
          11. H2 = (H1-F)/S+1
          12. D2=D1
          13. Types
          14. (Common) Max pooling
          15. (Common) F=2, S=2
          16. F=3, S=2, overlapping pooling
          17. Average pooling
          18. L2-norm pooling
          19. Features
          20. No additional params introduced
          21. Not common to pad the input with zero-padding
          22. Gradient routing is efficient in BP
        3. Last layer (FCN): objective function
        4. Non-linear activation function
      2. ConvNet architectures
    4. Backpropogation
  4. Optimizing methods
    1. Gradient descent
    2. Batch gradient descent
    3. Stochastic gradient descent
    4. (Common) Mini-batch SGD
    5. Momentum
    6. RMSProp
    7. Adam
  5. Hyperparameters