WEEK 1
|
Review of ML fundamentals – Classification, Regression. Review of probability theory and optimization concepts.
|
WEEK 2
|
RL Framework; Supervised learning vs. RL; Explore-Exploit Dilemma; Examples.
|
WEEK 3
|
MAB: Definition, Uses, Algorithms, Contextual Bandits, Transition to full RL, Intro to full RL problem
|
WEEK 4
|
Intro to MDPs: Definitions , Returns, Value function, Q-function.
|
WEEK 5
|
Bellman Equation, DP, Value Iteration, Policy Iteration, Generalized Policy Iteration.
|
WEEK 6
|
Evaluation and Control: TD learning, SARSA, Q-learning, Monte Carlo, TD Lambda, Eligibility Traces.
|
WEEK 7
|
Maximization-Bias & Representations: Double Q learning, Tabular learning vs. Parameterized, Q-learning with NNs
|
WEEK 8
|
Function approximation: Semi-gradient methods, SGD, DQNs, Replay Buffer.
|
WEEK 9
|
Policy Gradients: Introduction, Motivation, REINFORCE, PG theorem, Introduction to AC methods
|
WEEK 10
|
Actor-Critic Methods, Baselines, Advantage AC, A3C
Advanced Value-Based Methods: Double DQN, Prioritized Experience Replay, Dueling Architectures, Expected SARSA.
|
WEEK 11
|
Advanced PG/A-C methods: Deterministic PG and DDPG, Soft Actor-Critic (SAC)
HRL: Introduction to hierarchies, types of optimality, SMDPs, Options, HRL algorithms
POMDPS: Intro, Definitions, Belief states, Solution Methods; History-based methods, LSTMS, Q-MDPs, Direct Solutions, PSR.
|
WEEK 12
|
Model-Based RL: Introduction, Motivation, Connections to Planning, Types of MBRL, Benefits, RL with a Learnt Model, Dyna-style models, Latent variable models, Examples, Implicit MBRL.
Case study on design of RL solution for real-world problems.
|