強化学習

AI
Author

司馬 博文

Published

2/06/2024

概要
強化学習の考え方を数学的に理解する

信頼領域ポリシー最適化 (TRPO: Trust Region Policy Optimization) (Schulman et al., 2015) から PPO Algorithm (Schulman et al., 2017)

Model-based RL

References

Schulman, J., Levine, S., Moritz, P., Jordan, M. I., and Abbeel, P. (2015). Trust region policy optimization.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms.