Investigating prediction performance on regressions models using methods with multiple decision trees

Liao, Yuyue (2023). Investigating prediction performance on regressions models using methods with multiple decision trees. University of Birmingham. Ph.D.

Text - Accepted Version
Available under License All rights reserved.

Download (649kB) | Preview


With the development of machine learning techniques, the prediction power of classical regression methods has been challenged. This thesis applies decision tree learning to design new algorithms for regression analysis. The objective of this thesis was to improve the prediction power of regression models by adding interaction terms that were generated by decision trees. More specifically, we designed new algorithms that allowed multi decision trees to be created and applied in constructing a regression model. These new algorithms were applied to analyse data from three different research fields.
Since the CART algorithm was developed in 1984 (Breiman et al., 2017), decision trees have become widely applied in both classification analysis and regression analysis. An early hybrid tree-logit regression method was designed by Stainberg et al. (1998), followed by other attempts to design hybrid tree-regression methods. In Chapter 1, we introduced both decision trees and existing hybrid tree-regression methods. We also introduced the overall research plan and the datasets applied in the study.
In Chapter 2, we applied hybrid tree-regression methods in meta-regression analysis and compared them with linear meta-regression. The results from model comparison demonstrate the capability of decision trees in optimising prediction performances of regression models, when all independent variables are binary. Random-effects meta-regression and weighted least squares (WLS) meta-regression are utilised in comparison. From both an analysis of the results of the distinct models and the results of previous studies, we have concluded that trade openness is beneficial for economic growth.
In Chapter 3, we applied linear and hybrid regression methods to analyse factors that affect fundraising performances of crowdfunding projects. A new hybrid tree-regression method, called hybrid forest-linear regression (HFLR), is found to have much higher prediction power than other models applied in the study. By analysing both Monte-Carlo simulations and the crowdfunding data, it is proven that the HFLR method, which only applies categorical variables to construct decision trees, is able to deal with datasets that include both continuous and binary variables. From the results of the models, we discovered various factors that are influential to crowdfunding success.
In Chapter 4, we applied linear and hybrid regression methods to analyse a survey data about people's willingness to pay (WTP) to an environmental project. The HFLR method is proven to be capable with discovering joint effects between not only binary variables, but also non-binary ordinal variables. Meanwhile, a multi bounded model for the contingent valuation (CV) method is designed and compared with the single bounded model. From the results of the models, we estimated the scale of the starting-point bias of the CV method, as well as the median WTP.
Chapters 2, 3, and 4 contain summaries of their target studies, respectively. Chapter 5 presents the overall summary and discusses ideas for possible future developments of hybrid tree-regression methods.

Type of Work: Thesis (Doctorates > Ph.D.)
Award Type: Doctorates > Ph.D.
Licence: All rights reserved
College/Faculty: Colleges (2008 onwards) > College of Engineering & Physical Sciences
School or Department: School of Mathematics
Funders: None/not applicable
Subjects: Q Science > QA Mathematics


Request a Correction Request a Correction
View Item View Item


Downloads per month over past year