Hikaru Sasaki
生駒 : 奈良先端科学技術大学院大学, 2022.6
Lecture ArchiveNo. | Printing year | Location | Call Number | Material ID | Circulation class | Status | Waiting |
---|---|---|---|---|---|---|---|
1 |
|
|
M020770 |
|
|
|
Policy search reinforcement learning has been drawing much attention as a method for learning robot control. In particular, policy search using Gaussian process regression as the policy model can learn optimal actions. However, it is difficult to naively apply Gaussian process policy search to real-world tasks because real-world task environments, such as robotics, often involve various uncertainties. This is because the uncertainty of the environment often requires a very complex state-to-action mapping to obtain high-performance actions. To overcome complexity, this study focuses on the idea of latent variable modeling. This study explores a latent variable modeling approach in Gaussian process policy search to capture the data complexity sampled in uncertain environments. Then an algorithm is derived and simultaneously performs latent variable inference and policy learning and aims to make policy search applicable to various real-world tasks with uncertainty. We focus on two complexities: 1) multiple optimal actions emerging from a reward function with ambiguous specification, and 2) weak observations from the environment that contain little information about the state. We designed a policy model for each complexity by introducing latent variables into the Gaussian process and derived the policy update schemes based on variational Bayesian learning. The performance of the proposed policy search method was verified by simulation and a task using a robot manipulator. Finally, a policy learning framework based on Bayesian optimization with latent variables is proposed for application to actual heavy machinery, and its performance was verified using a real waste crane.
2022
電子化映像資料(分秒)
情報科学領域・コロキアム ; 2022年度
講演者所属: 情報科学領域
講演日: 2022年6月24日 3限
講演場所: 情報科学棟 エーアイ大講義室(L1)
Japan
English (eng)
English (eng)