1. View
    1. Control View
      1. Resource Rellocation
        1. Reinforcement Learning and Dynamic Programming Using Function
      2. Finite Horizon & Limited Lookahead scheme
      3. Sensitivity & Policy Gradient
      4. BeT96
    2. Convergence
      1. Bor08
        1. Stochastic Approximation: A Dynamical Systems Viewpoint
    3. AI view
      1. SuB98
      2. Pow07
        1. Approximate Dynamic Programming: Solving the Curses of Dimensionality
  2. Method
    1. Direct Approximation
      1. Gor95
        1. Stable Function Approximation in DP
      2. LoS01
        1. Valuing American Options by Simulation: A Simple Least-Squares Approach
      3. OrS02
        1. Kernel-Based Reinforcement Learning
      4. EGW06
        1. Tree-Based Batch Mode Reinforcement Learning
    2. Simulation Based DP
      1. TD(\lambda)
        1. BSA83
          1. Neuron- like Elements that Can Solve Difficult Learning Control Problems
        2. Sut88
          1. Learning to Predict by the Methods of Tem- poral Differences,” Machine Learning
        3. Convergence
          1. GLH94
          2. Incremen- tal Learning of Evaluation Functions for Absorbing Markov Chains: New Methods and Theorems
          3. Day92
          4. The Convergence of TD(λ) for General λ
          5. JJS94
          6. On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
          7. Pin97
          8. Mean-Field Analysis for Batched TD(λ)
          9. TsV97
          10. Based on Contraction Propety
      2. TD
        1. Sam59
          1. Some Studies in Machine Learning Using the Game of Checkers
        2. Sam67
          1. Some Studies in Machine Learning Using the Game of Checkers. II – Recent Progress
      3. LSTD(\lambda)
        1. BrB96
          1. Linear Least-Squares Algorithms for Temporal Difference Learning
        2. Boy02
          1. Technical Update: Least-Squares Temporal Difference Learning
        3. Convergence
          1. NeB03
          2. Least-Squares Policy Eval- uation Algorithms with Linear Function Approximation
        4. regression/regularization
          1. WPB09
          2. Approximate Simulation-Based Solution of Large-Scale Least Squares Problems
      4. LSPE
        1. For SSP Problem
        2. Tetris
          1. BeI96
        3. convergence
          1. NeB03
          2. Least-Squares Policy Eval- uation Algorithms with Linear Function Approximation
      5. Q-learning
        1. Wat89
          1. Learning from Delayed Rewards
        2. WaD92
        3. Convergence
          1. Tsi94
          2. Asynchronous Stochastic Approximation and Q-Learning
          3. ABB02
          4. Stochastic Approximation for Non-Expansive Maps: Q-Learning Algorithms
        4. BBS95
      6. Advantage Updating
        1. Bai93
          1. Advantage Updating
        2. Bai94
          1. Reinforcement Learning in Continuous Time: Advantage Updating
        3. Bai95
          1. Residual Algorithms: Reinforcement Learning with Function Approximation
        4. HBK94
          1. Advan- tage Updating Applied to a Differential Game
      7. Differential Training
        1. BerT97
      8. Comparison
        1. BBN04
          1. Improved Temporal Difference Methods with Linear Function Approximation
        2. YuB06b
          1. Convergence Results for Some Temporal Difference Methods Based on Least Squares
    3. Bellman Equation Error Approach
      1. ScS85
        1. Generalized Polyno- mial Approximations in Markovian Decision Problems
      2. HBK94
        1. A Reinforcement Learning Method for Maxi- mizing Undiscounted Rewards
      3. OrS02
        1. Kernel-Based Reinforcement Learning
      4. SzS04
        1. Interpolation-Based Q- Learning
      5. ASM08
        1. Learning Near- Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path
      6. BHO08
        1. Approximate Dy- namic Programming Using Support Vector Regression
      7. Sch10
        1. Should One Compute the Temporal Difference Fix Point or Minimize the Bellman Residual? The Unified Oblique Projec- tion View
    4. Policy Gradient Method
      1. CaC97
        1. Perturbation Realization Po- tentials and Sensitivity Analysis of Markov Processes
      2. CaW98
        1. Algorithms for Sensitivity Analysis of Markov Systems Through Potentials and Perturbation Realiza- tion
      3. Cao99
        1. Single Sample Path Based Optimization of Markov Chains
      4. Cao05
        1. A Basic Formula for Online Policy Gradient Algorithms
      5. FuH94
        1. Smoothed Perturbation Analysis Deriva- tive Estimation for Markov Chains
      6. Gly87
        1. Likelihood Ratio Gradient Estimation: An Overview
      7. JSJ95
        1. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems
      8. L'Ec91
      9. Wil92
        1. Simple Statistical Gradient Following Algo- rithms for Connectionist Reinforcement Learning
      10. MaT01
        1. Simulation-Based Opti- mization of Markov Reward Processes
      11. KoT99
        1. Actor-Critic Algorithms
      12. KoT03
        1. Actor-Critic Algorithms
      13. SMS99
        1. Policy Gradient Methods for Reinforcement Learning with Func- tion Approximation
      14. Cost Approximation
        1. Cao04
          1. Learning and Optimization from a System The- oretic Perspective
        2. GrU04
          1. Reinforcement Learning in Large, High-Dimensional State Spaces
        3. He02
          1. Simulation-Based Algorithms for Markov Decision Processes
        4. HFM05
          1. A Two-Timescale Simulation-Based Gradient Algorithm for Weighted Cost Markov Decision Processes
        5. Kak02
          1. A Natural Policy Gradient
        6. Kon02
          1. Actor-Critic Algorithms
        7. KoB99
          1. Actor-Critic Like Learn- ing Algorithms for Markov Decision Processes
        8. KoT99
          1. Actor-Critic Algo- rithms
        9. KoT03
          1. Actor-Critic Algo- rithms
        10. MaT01
          1. Simulation-Based Opti- mization of Markov Reward Processes
        11. MaT03
          1. Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes
        12. SMS99
          1. Policy Gradient Methods for Reinforcement Learning with Func- tion Approximation
        13. Wil92
          1. Simple Statistical Gradient Following Algo- rithms for Connectionist Reinforcement Learning
    5. Random Search
      1. cross-entropy method
        1. RuK04
          1. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization
        2. RuK08
          1. Simulation and the Monte Carlo Method
        3. BKM05
          1. A Tutorial on the Cross-Entropy Method
        4. Teris
          1. SzL06
          2. Learning Tetris Using the Noisy Cross-Entropy Method
          3. ThS09
      2. CFH07
        1. Simulation- Based Algorithms for Markov Decision Processes
    6. Statistical Inference Method
      1. Att03
        1. Planning by Probabilistic Inference
      2. ToS06
        1. Probabilistic Inference for Solving Discrete and Continuous State Markov Decision Processes
      3. VeR06
        1. Planning and Acting in Un- certain Environments Using Probabilistic Inference
  3. Application
    1. VBL97
  4. POMDP
    1. Aggregation/Interpolation
      1. ZhL97
      2. ZhH01
      3. YuB04
    2. Finite State Controller
      1. Hau00
      2. PoB04
      3. YuB06a
      4. BaB01
      5. AbB00
    3. Actor-Critic
      1. Yu05
      2. SJJ94