Abstract
Reinforcement learning has been achieving great successes in increasingly complex environments, such as AlphaStar at StarCraft II, or in robotics. However, these agents are not very robust to changes in the environment, or in their objectives. In order to achieve lifelong learning of an RL agent, we propose a hierarchical architecture combined with a human input interface which aims to introduce expert knowledge based on human observation of the changes to the environment or the objectives in order to guide the trained/learned agent to make better planning decisions and such guided planning experience can in turn be used in the learning process. The hierarchical architecture isolates the subtasks which the agent has already learnt. The human input interface allows expert to introduce guidance on how best to combine the skills of the agent on those subtasks. The RL task is modelled as an (PO)MDP and we explore adopting both conventional RL methods as well as designing new ones more suitable in this situation.
Overview of Approach
Hierarchical architecture
Command Center (in blue box) is the module responsible for high-level decision of how to combine and schedule the actions decided by the individual submodules (in yellow boxes). The submodules are responsible for lower-level decision making such as which unit to produce or build (Build Order), whether to advance or retreat (Tactics) or the precise manoeuvre of units (Micro-management).
Human command interface
We introduce the human command interface to allow human input. The reason is to provide some guidance and human intuition for the agent in some completely unseen scenario, so as to help the agent both make the specified decision in this unfamiliar scenario, and learning the desirability of such decision by specifically exploiting it.
Heterogeneous
The submodules can be constructed by scripts, or learning agents, or a hybrid of both. Not only do they serve different functionalities, they can also be designed using different constructs. For instance, in typical RTS games Build Order is usually well studied and a well-written script may be sufficient for this submodule. However, the Tactics is much more complicated to written in well-contained logic and might be implemented using a reinforcement learning agent.