AI Warriors

Abstract

Reinforcement learning has been achieving great successes in increasingly complex environments, such as AlphaStar at StarCraft II, or in robotics. However, these agents are not very robust to changes in the environment, or in their objectives. In order to achieve lifelong learning of an RL agent, we propose a hierarchical architecture combined with a human input interface which aims to introduce expert knowledge based on human observation of the changes to the environment or the objectives in order to guide the trained/learned agent to make better planning decisions and such guided planning experience can in turn be used in the learning process. The hierarchical architecture isolates the subtasks which the agent has already learnt. The human input interface allows expert to introduce guidance on how best to combine the skills of the agent on those subtasks. The RL task is modelled as an (PO)MDP and we explore adopting both conventional RL methods as well as designing new ones more suitable in this situation.

Overview of Approach

Hierarchical architecture

Command Center (in blue box) is the module responsible for high-level decision of how to combine and schedule the actions decided by the individual submodules (in yellow boxes). The submodules are responsible for lower-level decision making such as which unit to produce or build (Build Order), whether to advance or retreat (Tactics) or the precise manoeuvre of units (Micro-management).

Human command interface

We introduce the human command interface to allow human input. The reason is to provide some guidance and human intuition for the agent in some completely unseen scenario, so as to help the agent both make the specified decision in this unfamiliar scenario, and learning the desirability of such decision by specifically exploiting it.

Heterogeneous

The submodules can be constructed by scripts, or learning agents, or a hybrid of both. Not only do they serve different functionalities, they can also be designed using different constructs. For instance, in typical RTS games Build Order is usually well studied and a well-written script may be sufficient for this submodule. However, the Tactics is much more complicated to written in well-contained logic and might be implemented using a reinforcement learning agent.

Demo

Reference

Lee, D., Tang, H., Zhang, J. O., Xu, H., Darrell, T., & Abbeel, P. (2018). Modular architecture for starcraft II with deep reinforcement learning. Proceedings of the 14th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE 2018, 187–193. www.aaai.org/ocs/index.php/AIIDE/AIIDE18/paper/viewFile/18084/17241

Hu, H., Yarats, D., Gong, Q., Tian, Y., & Lewis, M. (2019). Hierarchical Decision Making by Generating and Following Natural Language Instructions. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), 248. http://papers.neurips.cc/paper/9193-hierarchical-decision-making-by-generating-and-following-natural-language-instructions.pdf

Singh, A., Yang, L., Finn, C., & Levine, S. (2019). End-To-End Robotic Reinforcement Learning without Reward Engineering. Robotics: Science and Systems. https://doi.org/10.15607/rss.2019.xv.073

Vinyals, O., Ewalds, T., Bartunov, S., Georgiev, P., Vezhnevets, A. S., Yeo, M., Makhzani, A., Küttler, H., Agapiou, J., Schrittwieser, J., Quan, J., Gaffney, S., Petersen, S., Simonyan, K., Schaul, T., van Hasselt, H., Silver, D., Lillicrap, T., Calderone, K., … Tsing, R. (2017). StarCraft II: A New Challenge for Reinforcement Learning. http://arxiv.org/abs/1708.04782

Vinyals, O., Babuschkin, I., Czarnecki, W.M. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019). https://doi.org/10.1038/s41586-019-1724-z

Silver, D.; Huang, A.; Maddison, C. J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; Dieleman, S.; Grewe, D.; Nham, J.; Kalchbrenner, N.; Sutskever, I.; Lillicrap, T.; Leach, M.; Kavukcuoglu, K.; Graepel, T. & Hassabis, D. (2016), 'Mastering the Game of Go with Deep Neural Networks and Tree Search', Nature 529 (7587), 484--489.