Chiu, Chun Wai ORCID: 0000-0002-3157-8943 (2022). Online data stream classification in the presence of Concept Drift and Class Imbalance. University of Birmingham. Ph.D.
|
Chiu2022PhD.pdf
Text - Accepted Version Available under License All rights reserved. Download (35MB) | Preview |
Abstract
Online data stream learning requires to process each training example upon arrival. Machine learning algorithms designed to process data in this manner are particularly suitable for this digital era, as the volume and the incoming speed of data are growing everyday. However, the environments of most real-world applications in online data stream learning are usually non-stationary, meaning that their underlying distribution may change over time (concept drift). Also, in real-world classification task, the number of examples from different classes are unlikely to be equal (class imbalance). Existing literature of concept drift adaptation and class imbalanced learning have been well-established in the last years. Yet, there still exists little work to address the joint challenge of them. This thesis contributes to this area of study by investigating how to deal with concept drift from a memory perspective and investigating how to create synthetic examples to aid in learning the skewed class while able to adapt to concept drift in class imbalanced data streams. The main contributions of this thesis are:
- A study of the impact of different memory management strategies to the predictive performance.
- A novel diversity-based memory management strategy that can maximise the chance of having relevant past knowledge to exploit in dealing with concept drift.
- A novel framework called Concept Drift Handling Based on Clustering in the Model Space (CDCMS), which can deal with multiple types of concept drifts in class balanced data streams by exploiting relevant past knowledge.
- A set of novel memory strategies for class imbalanced learning that can maintain and recover relevant past knowledge to deal with concept drift in class imbalanced data stream. From that, a novel framework called Concept Drift Handling Based on Clustering in the Model Space for Class Imbalanced Learning (CDCMS.CIL) is proposed.
- A novel approach called Synthetic Minority Oversampling based on Stream Clustering (SMOClust), which creates synthetic examples based on stream clustering to aid in learning the skewed class, while able to adapt to concept drifts.
Type of Work: | Thesis (Doctorates > Ph.D.) | ||||||
---|---|---|---|---|---|---|---|
Award Type: | Doctorates > Ph.D. | ||||||
Supervisor(s): |
|
||||||
Licence: | All rights reserved | ||||||
College/Faculty: | Colleges (2008 onwards) > College of Engineering & Physical Sciences | ||||||
School or Department: | School of Computer Science | ||||||
Funders: | Other | ||||||
Other Funders: | PhD scholarship, School of Computer Science, University of Birmingham | ||||||
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science | ||||||
URI: | http://etheses.bham.ac.uk/id/eprint/12853 |
Actions
Request a Correction | |
View Item |
Downloads
Downloads per month over past year