1. Introduction
    1. Types
      1. Supervised Learning
        1. Classification
          1. binary classification
          2. multiclass classification
        2. Regression
      2. Unsupervised Learning
      3. Reinforcement Learning
    2. Concepts
      1. Parametric vs non-parametric models
      2. The curse of dimensionality
      3. Overfitting
      4. Model selection
        1. cross validation (CV)
      5. No free lunch theorem
  2. Probability
    1. Interpretations
      1. Frequentist
        1. probabilities represent long run frequencies of events
      2. Bayesian
        1. probability is used to quantify our uncertainty about something
        2. can model uncertainty about events with short term frequencies
    2. Concepts
      1. Discrete random variables
        1. Probability mass function, pmf
        2. state space
        3. indicator function
      2. Fundamental rules
        1. product rule
        2. sum rule
        3. Bayes rule
      3. Independence and conditional independence
      4. Continuous random variables
        1. cumulative distribution function, cdf
        2. probability density function, pdf
      5. Quantiles
      6. Mean and variance
    3. Some common discrete distributions
      1. Binomial
        1. Bin(n, θ)
      2. Bernoulli
        1. Ber(θ)
      3. Multinomial
        1. Mu(n, θ)
      4. Multinoulli
        1. Cat(θ)
      5. The empirical distribution
    4. Some common continuous distributions
      1. Gaussian (normal) distribution
        1. N(μ,σ2)
      2. Laplace distribution
        1. Lap(μ, b)
      3. The gamma distribution
        1. Ga(a,b)
        2. gamma function, Γ(a)
      4. The beta distribution
        1. Beta(a, b)
      5. Pareto distribution
        1. Pareto(k, m)
        2. long tails
    5. Joint probability distributions
      1. Covariance and correlation
      2. Multivariate Gaussian, Multivariate Normal (MVN)
      3. Multivariate Student t distribution
      4. Dirichlet distribution
        1. Dir(x|α)
    6. Transformations of random variables
    7. Monte Carlo approximation
    8. Information theory
      1. Entropy
        1. a measure of the random variable's uncertainty
      2. KL divergence/Relative Entropy
        1. a measure of the dissimilarity of two probability distributions
        2. Cross Entropy
      3. Mutual information
        1. Conditional Entropy
  3. Generative Models for Discrete Data
    1. Bayesian concept learning
      1. Likelihood
      2. Prior
      3. Posterior
      4. MLE
      5. MAP
    2. The beta-binomial model
    3. The Dirichlet-multinomial model
    4. Naive Bayes classifiers
      1. Feature selection using mutual information
  4. Gaussian models
  5. Bayesian statistics
  6. Frequentist statistics
  7. Linear regression
  8. Logistic Regression
  9. Generalized linear models and the exponential family
  10. Directed graphical models (Bayes nets)
  11. Mixture models and the EM algorithm
  12. Latent linear models
  13. Sparse linear models
    1. feature selection/ sparsity
  14. Kernels
    1. Introduction
      1. not clear how to best represent some kinds of objects as fixed-sized feature vectors
      2. deep learning
        1. define a generative model for the data, and use the inferred latent representation and/or the parameters of the model as features
      3. kernel function
        1. measuring the similarity between objects, that doesn’t require preprocessing them into feature vector format
    2. Support vector machines (SVMs)
  15. Gaussian processes
    1. Introduction
      1. before, infer p(θ|D) instead of p(f|D)
      2. Bayesian inference over functions themselves
      3. Gaussian processes or GPs
        1. defines a prior over functions, which can be converted into a posterior over functions once we have seen some data
  16. Adaptive basis function models
    1. adaptive basis- function model (ABM)
      1. dispense with kernels altogether, and try to learn useful features φ(x) directly from the input data
    2. Boosting
    3. Ensemble learning
  17. Markov and hidden Markov models
    1. probabilistic models for sequences of observations
    2. Markov models
    3. Hidden Markov models
  18. State space models
    1. state space model or SSM
      1. just like an HMM, except the hidden states are continuous
  19. Undirected graphical models (Markov random fields)
    1. Introduction
      1. undirected graphical model (UGM), also called a Markov random field (MRF) or Markov network
      2. Advantages
        1. they are symmetric and therefore more “natural” for certain domains
        2. discrimi- nativel UGMs which define conditional densities of the form p(y|x), work better than discriminative DGMs
      3. Disadvantages
        1. he parameters are less interpretable and less modular
        2. parameter estimation is com- putationally more expensive
    2. Markov random field (MRF)
    3. Conditional random fields (CRFs)
    4. Structural SVMs
  20. Exact inference for graphical models
    1. Introduction
      1. forwards-backwards algorithm
      2. generalize these exact inference algorithms to arbitrary graphs
  21. Variational inference
    1. Introduction
      1. approximate inference methods
      2. variational inference
        1. reduces inference to an optimization problem
        2. often gives us the speed benefits of MAP estimation but the statistical benefits of the Bayesian approach
  22. More variational inference
  23. Monte Carlo inference
    1. Introduction
      1. Monte Carlo approximation
        1. generate some (unweighted) samples from the posterior
        2. compute any quantity of interest
      2. non-iterative methods
      3. iterative method
  24. Markov chain Monte Carlo (MCMC) inference
    1. Gibbs sampling
  25. Clustering
    1. Introduction
      1. Clustering
        1. the process of grouping similar objects together.
      2. flat clustering, also called partitional clustering
      3. hierarchical clustering
  26. Graphical model structure learning
  27. Latent variable models for discrete data
    1. Introduction
      1. symbols or tokens
      2. bag of words
    2. Distributed state LVMs for discrete data
    3. Latent Dirichlet allocation (LDA)
      1. Quantitatively evaluating LDA as a language model
        1. Perplexity
      2. Fitting using (collapsed) Gibbs sampling
      3. Fitting using batch variational inference
      4. Fitting using online variational inference
      5. Determining the number of topics
    4. Extensions of LDA
      1. Correlated topic model
      2. Dynamic topic model
      3. LDA-HMM
      4. Supervised LDA
  28. Deep Learning
    1. Introduction
    2. Deep generative models
    3. Deep neural networks