Detecting changes in offline and online classification tasks

Zhang, Shuyi (2023). Detecting changes in offline and online classification tasks. University of Birmingham. Ph.D.

[img]
Preview
Zhang2023PhD.pdf
Text - Accepted Version
Available under License All rights reserved.

Download (7MB) | Preview

Abstract

In machine learning, an essential assumption to build a well-performing classification model is that it should be trained and tested against data that come from the same distribution. However, in the real-world, once a model is in the deployment stage, the control over incoming data is limited. Accurately and efficiently detecting changes violating the fundamental assumption for classification tasks is crucial to ensure the reliability and performance of the artificial intelligence systems.

Different types of changes can arise in offline and online classification tasks. The goals and methods for change detection in the two scenarios are also different. As a starting point, this thesis first focuses on the detection of out-of-distribution examples in the testing data set in offline classification tasks. A purely unsupervised detector Label-Assisted Memory Auto-Encoder (LAMAE), and its refined version LAMAE+, are proposed to improve the detection of a wider range of out-of-distribution examples. Afterwards, this thesis progresses to the online classification scenario. In a streaming data environment, concept drift, which is a change in the underlying data distribution may occur. Instead of detecting single examples as in the offline scenario, online scenario requires sophisticated algorithms to identify if and when a change occurs in the underlying data distribution. This thesis proposes a novel concept drift detection framework named Hierarchical Reduced-space Drift Detection framework (HRDD) to meet this goal. HRDD not only recognizes a wider range of drifts regardless of their effects on classification performance, but also does so with an improved efficiency than existing methods. Another challenge faced by existing concept drift detectors is the assumption of data independence on data streams. To further approximate the reality, this thesis also attempts to investigate the new challenges brought by the relaxation of the independence assumption. A novel problem formulation is constructed taking into account temporal dependency, under which a greater variety of drift forms can possibly emerge. Afterwards, a simple and effective solution named Concept Drift detection for Temporally Dependent data streams (CDTD) to detect drifts, especially the ones that are being neglected by existing detectors, is presented.

In summary, this thesis tackles the detection of change in offline and online classification tasks. The approaches taken in the thesis are both efficient and effective, and have important significance in minimizing the disparity between the simulated environment and the physical reality.

Type of Work: Thesis (Doctorates > Ph.D.)
Award Type: Doctorates > Ph.D.
Supervisor(s):
Supervisor(s)EmailORCID
Tino, PeterUNSPECIFIEDUNSPECIFIED
Yao, XinUNSPECIFIEDUNSPECIFIED
Licence: All rights reserved
College/Faculty: Colleges (2008 onwards) > College of Engineering & Physical Sciences
School or Department: School of Computer Science
Funders: Other
Other Funders: Southern University of Science and Technology
Subjects: T Technology > TA Engineering (General). Civil engineering (General)
URI: http://etheses.bham.ac.uk/id/eprint/13897

Actions

Request a Correction Request a Correction
View Item View Item

Downloads

Downloads per month over past year