View
Control View
Resource Rellocation
Reinforcement Learning and Dynamic Programming Using Function
Finite Horizon & Limited Lookahead scheme
Sensitivity & Policy Gradient
BeT96
Convergence
Bor08
Stochastic Approximation: A Dynamical Systems Viewpoint
AI view
SuB98
Pow07
Approximate Dynamic Programming: Solving the Curses of Dimensionality
Method
Direct Approximation
Gor95
Stable Function Approximation in DP
LoS01
Valuing American Options by Simulation: A Simple Least-Squares Approach
OrS02
Kernel-Based Reinforcement Learning
EGW06
Tree-Based Batch Mode Reinforcement Learning
Simulation Based DP
TD(\lambda)
BSA83
Neuron- like Elements that Can Solve Difficult Learning Control Problems
Sut88
Learning to Predict by the Methods of Tem- poral Differences,” Machine Learning
Convergence
GLH94
Incremen- tal Learning of Evaluation Functions for Absorbing Markov Chains: New Methods and Theorems
Day92
The Convergence of TD(λ) for General λ
JJS94
On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
Pin97
Mean-Field Analysis for Batched TD(λ)
TsV97
Based on Contraction Propety
TD
Sam59
Some Studies in Machine Learning Using the Game of Checkers
Sam67
Some Studies in Machine Learning Using the Game of Checkers. II – Recent Progress
LSTD(\lambda)
BrB96
Linear Least-Squares Algorithms for Temporal Difference Learning
Boy02
Technical Update: Least-Squares Temporal Difference Learning
Convergence
NeB03
Least-Squares Policy Eval- uation Algorithms with Linear Function Approximation
regression/regularization
WPB09
Approximate Simulation-Based Solution of Large-Scale Least Squares Problems
LSPE
For SSP Problem
Tetris
BeI96
convergence
NeB03
Least-Squares Policy Eval- uation Algorithms with Linear Function Approximation
Q-learning
Wat89
Learning from Delayed Rewards
WaD92
Convergence
Tsi94
Asynchronous Stochastic Approximation and Q-Learning
ABB02
Stochastic Approximation for Non-Expansive Maps: Q-Learning Algorithms
BBS95
Advantage Updating
Bai93
Advantage Updating
Bai94
Reinforcement Learning in Continuous Time: Advantage Updating
Bai95
Residual Algorithms: Reinforcement Learning with Function Approximation
HBK94
Advan- tage Updating Applied to a Differential Game
Differential Training
BerT97
Comparison
BBN04
Improved Temporal Difference Methods with Linear Function Approximation
YuB06b
Convergence Results for Some Temporal Difference Methods Based on Least Squares
Bellman Equation Error Approach
ScS85
Generalized Polyno- mial Approximations in Markovian Decision Problems
HBK94
A Reinforcement Learning Method for Maxi- mizing Undiscounted Rewards
OrS02
Kernel-Based Reinforcement Learning
SzS04
Interpolation-Based Q- Learning
ASM08
Learning Near- Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path
BHO08
Approximate Dy- namic Programming Using Support Vector Regression
Sch10
Should One Compute the Temporal Difference Fix Point or Minimize the Bellman Residual? The Unified Oblique Projec- tion View
Policy Gradient Method
CaC97
Perturbation Realization Po- tentials and Sensitivity Analysis of Markov Processes
CaW98
Algorithms for Sensitivity Analysis of Markov Systems Through Potentials and Perturbation Realiza- tion
Cao99
Single Sample Path Based Optimization of Markov Chains
Cao05
A Basic Formula for Online Policy Gradient Algorithms
FuH94
Smoothed Perturbation Analysis Deriva- tive Estimation for Markov Chains
Gly87
Likelihood Ratio Gradient Estimation: An Overview
JSJ95
Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems
L'Ec91
Wil92
Simple Statistical Gradient Following Algo- rithms for Connectionist Reinforcement Learning
MaT01
Simulation-Based Opti- mization of Markov Reward Processes
KoT99
Actor-Critic Algorithms
KoT03
Actor-Critic Algorithms
SMS99
Policy Gradient Methods for Reinforcement Learning with Func- tion Approximation
Cost Approximation
Cao04
Learning and Optimization from a System The- oretic Perspective
GrU04
Reinforcement Learning in Large, High-Dimensional State Spaces
He02
Simulation-Based Algorithms for Markov Decision Processes
HFM05
A Two-Timescale Simulation-Based Gradient Algorithm for Weighted Cost Markov Decision Processes
Kak02
A Natural Policy Gradient
Kon02
Actor-Critic Algorithms
KoB99
Actor-Critic Like Learn- ing Algorithms for Markov Decision Processes
KoT99
Actor-Critic Algo- rithms
KoT03
Actor-Critic Algo- rithms
MaT01
Simulation-Based Opti- mization of Markov Reward Processes
MaT03
Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes
SMS99
Policy Gradient Methods for Reinforcement Learning with Func- tion Approximation
Wil92
Simple Statistical Gradient Following Algo- rithms for Connectionist Reinforcement Learning
Random Search
cross-entropy method
RuK04
The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization
RuK08
Simulation and the Monte Carlo Method
BKM05
A Tutorial on the Cross-Entropy Method
Teris
SzL06
Learning Tetris Using the Noisy Cross-Entropy Method
ThS09
CFH07
Simulation- Based Algorithms for Markov Decision Processes
Statistical Inference Method
Att03
Planning by Probabilistic Inference
ToS06
Probabilistic Inference for Solving Discrete and Continuous State Markov Decision Processes
VeR06
Planning and Acting in Un- certain Environments Using Probabilistic Inference
Application
VBL97
POMDP
Aggregation/Interpolation
ZhL97
ZhH01
YuB04
Finite State Controller
Hau00
PoB04
YuB06a
BaB01
AbB00
Actor-Critic
Yu05
SJJ94