Wang, Xiaoxia (2010)
Ph.D. thesis, University of Birmingham.
With the advent of the information technology, the amount of data we are facing today is growing in both the scale and the dimensionality dramatically. It thus raises new challenges for some traditional machine learning tasks. This thesis is mainly concerned with manifold aligned density estimation problems. In particular, the work presented in this thesis includes efficiently learning the density distribution on very large-scale datasets and estimating the manifold aligned density through explicit manifold modeling. First, we propose an efficient and sparse density estimator: Fast Parzen Windows (FPW) to represent the density of large-scale dataset by a mixture of locally fitted Gaussians components. The Gaussian components in the model are estimated in a "sloppy" way, which can avoid very time-consuming "global" optimizations, keep the simplicity of the density estimator and also assure the estimation accuracy. Preliminary theoretical work shows that the performance of the local fitted Gaussian components is related to the curvature of the true density and the characteristic of Gaussian model itself. A successful application of our FPW on principled calibrating the galaxy simulations is also demonstrated in the thesis. Then, we investigate the problem of manifold (i.e., low dimensional structure) aligned density estimation through explicit manifold modeling, which aims to obtain the embedded manifold and the density distribution simultaneously. A new manifold learning algorithm is proposed to capture the non-linear low dimensional structure and provides an improved initialization to Generative Topographic Mapping (GTM) model. The GTM models are then employed in our proposed hierarchical mixture model to estimate the density of data aligned along multiple manifolds. Extensive experiments verified the effectiveness of the presented work.
This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder.
Repository Staff Only: item control page