Machine learning in the model space for metabolomics time series in an adrenal steroid hormone study

Chen, Xinyue (2023). Machine learning in the model space for metabolomics time series in an adrenal steroid hormone study. University of Birmingham. Ph.D.

Preview

Chen2023PhD.pdf
Text - Accepted Version
Available under License All rights reserved.
Download (7MB) | Preview

Abstract

Learning in the model space (LiMS) aims to represent each complex data subject such as sparse and/or noisy time series with an appropriate model, or a full posterior distribution over models. LiMS approaches include mechanistic information on how the data is generated in the machine learning model-building stage. Hence, it can improve the interpretability of chosen machine learning tools. This thesis proposes a new topographic mapping approach as well as a time series classification application in the model space. Both of them are demonstrated on a real-world data set of measurements taken on subjects in an adrenal steroid hormone study.

Topographic visualisation methods such as self-organisation maps are important tools in data mining. In order to cluster and visualise sparse and/or noisy time series data, a novel self-organising map directly formulated in the model space termed as SOMiMS is proposed, together with an extension of generative topographic mapping (GTM) to the model space. Both maps are demonstrated on the adrenal steroid hormone data set with a good degree of separation of conditions. Compared to classic approaches in the signal space, they take the mechanistic information into account by providing interpretable readily data visualisations and parameter plots in the form of heat maps.

In biomedical settings, time series classification is one of the most important techniques to improve the accuracy of disease diagnosis. The time series classification in the model space is developed not only to improve the diagnosis accuracy but also to provide mechanistic and biomedical model interpretability. It is applied to the adrenal steroid hormone data set showing satisfying classification performance in both signal and model space. Two classifier models, support vector machine (SVM) and logistic regression are employed. In addition, a hybrid model which significantly improves the accuracy is also created. Through feature selection, important time periods (signal space) and model parameters (model space) are extracted, which are crucial and valuable information from the biomedical point of view. In the data preprocessing stage, the missing value and initial value problems, which are two common problems of biomedical data are solved by using the univariate Gaussian process and adjoint method. Analyses and evaluations are concluded along with mechanistic and biomedical knowledge and case studies of some additional subjects.

Type of Work:

Thesis (Doctorates > Ph.D.)

Award Type:

Doctorates > Ph.D.

Supervisor(s):