Welsh, Noel (2011)
Ph.D. thesis, University of Birmingham.
This dissertation addresses the problem of learning to act in an unknown and uncertain world. This is a difficult problem. Even if a world model is available, an assumption not made here, it is known to be intractable to learn an optimal policy for controlling behaviour (Littman 1996). Assuming no world model is known leads to two approaches: model-free learning, which attempts to learn to act without a model of the environment, and model learning, which attempts to learn a model of the environment from interactions with the world. Most earlier approaches make a priori assumptions about the complexity of the model or policy required, the upshot of which is that a fixed amount of memory is available to the agent. It is well known that in a noisy environment, the type assumed within, an environment specific amount of memory is required to act optimally. Fixing the capacity of memory before any interactions have occurred is thus a limiting assumption. The theme of this dissertation is that representing multiple policies or environment models of varying size enables us to address this problem. Both model-free learning and model learning are investigated. For the former, I present a policy search method (usable with a wide range of algorithms) that maintains a population of policies of varying size. By sharing information between policies I show that it can learn near optimal policies for a variety of challenging problems, and that performance is significantly improved over using the same amount of computation without information sharing. I investigate two approaches to model learning. The first is a variational Bayesian method for learning POMDPs. I show that it achieves superior results to the Bayes-adaptive algorithm (Ross, Chaib-draa and Pineau 2007) using their experimental setup. However, this experimental setup makes strong assumptions about prior information, and I show that weakening these assumptions leads to poor performance. I then address model learning for a simpler model, a topological map. I develop a novel non-parametric Bayesian map that sets no limit of the model size, and show experimentally that maps can be learned from robot data with weak prior knowledge.
This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder.
Repository Staff Only: item control page