Cluster-based semi-supervised ensemble learning

Soares, Rodrigo Gabriel Ferreira (2014). Cluster-based semi-supervised ensemble learning. University of Birmingham. Ph.D.

PDF - Accepted Version

Download (2MB)


Semi-supervised classification consists of acquiring knowledge from both labelled and unlabelled data to classify test instances. The cluster assumption represents one of the potential relationships between true classes and data distribution that semi-supervised algorithms assume in order to use unlabelled data. Ensemble algorithms have been widely and successfully employed in both supervised and semi-supervised contexts. In this Thesis, we focus on the cluster assumption to study ensemble learning based on a new cluster regularisation technique for multi-class semi-supervised classification. Firstly, we introduce a multi-class cluster-based classifier, the Cluster-based Regularisation (Cluster- Reg) algorithm. ClusterReg employs a new regularisation mechanism based on posterior probabilities generated by a clustering algorithm in order to avoid generating decision boundaries that traverses high-density regions. Such a method possesses robustness to overlapping classes and to scarce labelled instances on uncertain and low-density regions, when data follows the cluster assumption. Secondly, we propose a robust multi-class boosting technique, Cluster-based Boosting (CBoost), which implements the proposed cluster regularisation for ensemble learning and uses ClusterReg as base learner. CBoost is able to overcome possible incorrect pseudo-labels and produces better generalisation than existing classifiers. And, finally, since there are often datasets with a large number of unlabelled instances, we propose the Efficient Cluster-based Boosting (ECB) for large multi-class datasets. ECB extends CBoost and has lower time and memory complexities than state-of-the-art algorithms. Such a method employs a sampling procedure to reduce the training set of base learners, an efficient clustering algorithm, and an approximation technique for nearest neighbours to avoid the computation of pairwise distance matrix. Hence, ECB enables semi-supervised classification for large-scale datasets.

Type of Work: Thesis (Doctorates > Ph.D.)
Award Type: Doctorates > Ph.D.
College/Faculty: Colleges (2008 onwards) > College of Engineering & Physical Sciences
School or Department: School of Computer Science
Funders: Other, European Commission
Other Funders: Coordenação de Aperfeiçoamento de Pessoal de Nível Superior, Brazil
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science


Request a Correction Request a Correction
View Item View Item


Downloads per month over past year