A Blog Entry on Bayesian Computation by an Applied Mathematician
$$
$$
信頼領域ポリシー最適化 (TRPO: Trust Region Policy Optimization) (Schulman et al., 2015) から PPO Algorithm (Schulman et al., 2017)
Model-based RL
References
Schulman, J., Levine, S., Moritz, P., Jordan, M. I., and Abbeel, P. (2015). Trust region policy optimization.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms.