Single Stage Object detection

Head
1. one-stage
  1. anchor-based
    1. (2016) YOLO V2~
      1. [[tx,ty,tw,th,to ]+ C] * K
    2. (2016) SSD
      1. Anchor boxes on multi-scale feature maps
    3. (2017) RetinaNet
      1. Focal Loss + FPN
  2. anchor-free
    1. center - based
      1. (2015) YOLOV 1
      2. [[x,y,w,h,o]*B + C ]* cell_number
      3. (2019) FCOS
      4. [dx,dy,dw,dh]+ [C,centerness]
    2. key point-based
      1. (2019) CenterNet
      2. (2018) CornerNet
2. two-stage
  1. (2015) FRCNN (anchor-based)
  2. (2019) RepPoints (anchor-free)
Neck
1. (2016) FPN
  1. multi-scale feature fusion
  2. divide-and-conquer
2. (2018) PANet
  1. all scales matter for objects of different sizes
3. (2019) NAS-FPN, BiFPN
  1. repeated feature fusion
4. (2021) YOLOF
  1. Uniform Matching
  2. dilated convolutions
Backbone
1. ResNet, ResNeXt
  1. first choice as baseline
2. EfficientNet
  1. model scaling
3. HRNet
  1. High resolutioin for localization tasks
4. MobileNet, ShuffleNet
  1. light weight for Mobile device
5. Details
  1. Normalization
    1. Batch Norm.
      1. CmBN
      2. Frozen BN
      3. SyncBN
      4. NFNet : Replace BN
    2. Layer Norm.
      1. feature direction
      2. remove dependency on batches
    3. Group Norm.
      1. performs better than layer Norm.
    4. Weight Standardization
  2. Activation
    1. ReLU
      1. Swish,Leaky ReLU, SiLU, GELU
  3. Convolution operators
    1. dilated convolution
      1. enlarge receptive field and remain feature map size
    2. depth-wise separable convolution
    3. 1*1 convolution
    4. deconvolution
    5. stride
    6. padding
6. speed-accuracy trade-off
Bag of Specials
1. attention mechanism
  1. channel-wise
    1. Squeeze and Excitation
  2. point-wise
    1. Spatial Attention Module
2. feature integration
  1. addition
    1. skip-connection
  2. concatation
3. enlarge receptive field
  1. dilated convolutions
  2. down sampling
  3. large kernel
Bag of Freebies
1. style transfer
2. MixUp, CutMix
3. photometric distortion
4. geometric distortion
5. DropOut/ DropBlock
6. label smoothing
7. training tricks
8. Augmentation
9. Regularization
Multi-task Loss
1. classification
  1. CE
    1. BCE (+sigmoid)
      1. H(P, Q) = – (P(class0) * log(Q(class0)) + P(class1) * log(Q(class1)))
    2. softmax+NLL
      1. 不适用于开放集 AMSoftMax
  2. KL-divergence
  3. Focal Loss
    1. GFL
  4. seesaw Loss
    1. Subtopic 1
  5. class Imbalance
2. localization
  1. MSE, SSE
  2. IoU ( GIoU, DIoU,CIoU )
3. combination
  1. (weighted) sum
  2. generalized Focal Loss
Positive/Negative samples
1. ATSS