By Aviv Tamar, Dotan Di Castro, Ron Meir | published 2012-06-01 |
1 |
Share:
Report a problem
In reinforcement learning an agent uses online feedback from the environment in order to adaptively select an effective policy. Model free approaches address this task by directly mapping environmental states to actions, while model based methods attempt to construct a model of the environment, followed by a selection of optimal actions based on that model. Given the complementary advantages of both approaches, we suggest a novel procedure which augments a model free algorithm with a partial model. The resulting hybrid algorithm switches between a model based and a model free mode, depending on the current state and the agent's knowledge. Our method relies on a novel definition for a partially
... http://jmlr.csail.mit.edu/papers/volume13/tamar12a/tamar12a.pdf