Online data stream classification in the presence of Concept Drift and Class Imbalance

Chiu, Chun Wai ORCID: 0000-0002-3157-8943 (2022). Online data stream classification in the presence of Concept Drift and Class Imbalance. University of Birmingham. Ph.D.

[img]
Preview
Chiu2022PhD.pdf
Text - Accepted Version
Available under License All rights reserved.

Download (35MB) | Preview

Abstract

Online data stream learning requires to process each training example upon arrival. Machine learning algorithms designed to process data in this manner are particularly suitable for this digital era, as the volume and the incoming speed of data are growing everyday. However, the environments of most real-world applications in online data stream learning are usually non-stationary, meaning that their underlying distribution may change over time (concept drift). Also, in real-world classification task, the number of examples from different classes are unlikely to be equal (class imbalance). Existing literature of concept drift adaptation and class imbalanced learning have been well-established in the last years. Yet, there still exists little work to address the joint challenge of them. This thesis contributes to this area of study by investigating how to deal with concept drift from a memory perspective and investigating how to create synthetic examples to aid in learning the skewed class while able to adapt to concept drift in class imbalanced data streams. The main contributions of this thesis are:

- A study of the impact of different memory management strategies to the predictive performance.
- A novel diversity-based memory management strategy that can maximise the chance of having relevant past knowledge to exploit in dealing with concept drift.
- A novel framework called Concept Drift Handling Based on Clustering in the Model Space (CDCMS), which can deal with multiple types of concept drifts in class balanced data streams by exploiting relevant past knowledge.
- A set of novel memory strategies for class imbalanced learning that can maintain and recover relevant past knowledge to deal with concept drift in class imbalanced data stream. From that, a novel framework called Concept Drift Handling Based on Clustering in the Model Space for Class Imbalanced Learning (CDCMS.CIL) is proposed.
- A novel approach called Synthetic Minority Oversampling based on Stream Clustering (SMOClust), which creates synthetic examples based on stream clustering to aid in learning the skewed class, while able to adapt to concept drifts.

Type of Work: Thesis (Doctorates > Ph.D.)
Award Type: Doctorates > Ph.D.
Supervisor(s):
Supervisor(s)EmailORCID
Minku, Leandro LeiUNSPECIFIEDorcid.org/0000-0002-2639-0671
Licence: All rights reserved
College/Faculty: Colleges (2008 onwards) > College of Engineering & Physical Sciences
School or Department: School of Computer Science
Funders: Other
Other Funders: PhD scholarship, School of Computer Science, University of Birmingham
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
URI: http://etheses.bham.ac.uk/id/eprint/12853

Actions

Request a Correction Request a Correction
View Item View Item

Downloads

Downloads per month over past year