Investigation of over-fitting and optimism in prognostic models

Richardson, Matthew (2010). Investigation of over-fitting and optimism in prognostic models. University of Birmingham. Ph.D.


Download (3MB)


This work seeks to develop a high quality prognostic model for the CARE-HF data; see (Richardson et al. 2007). The CARE-HF trial was a major study into the effects of cardiac resynchronization. Cardiac resynchronization has been shown to reduce mortality in patients suffering heart failure due to electrical problems in the heart. The prognostic model presented in this work was motivated by the question as to which patient characteristics may modify the effect of cardiac resynchronization. This is a question of great importance to clinicians. Efforts are made to produce a high quality prognostic model in part through the application of methods to reduce the risk of over-fitting. One method discussed in this work is the strategy proposed by Frank Harrell Jr. The various aspects of Harrell’s approach are discussed. An attempt is made to extend Harrell’s strategy to frailty models. Key issues such as missing data and imputation, specification of the functional form of the model, and validation are examined in relation to the prognostic model for the CARE-HF data. Material is presented covering survival analysis, maximum likelihood methods, model selection criteria (AIC, BIC), specification of functional form (cubic splines and fractional polynomials) and validation methods (cross-validation, bootstrap methods). The concepts of over-fitting and optimism are examined. The author concludes that whilst Harrell’s strategy is valuable it is still quite possible to produce models that are over-fitted. MDL (Minimum Description Length) is suggested as potentially useful methods by which statistical models can be obtained that have an in built resistance to over-fitting. The author also recommends that concepts such as over-fitting, optimism and model validation are introduced earlier in more elementary courses on statistical modelling.

Type of Work: Thesis (Doctorates > Ph.D.)
Award Type: Doctorates > Ph.D.
College/Faculty: Colleges (2008 onwards) > College of Medical & Dental Sciences
School or Department: School of Health and Population Sciences
Funders: None/not applicable
Subjects: R Medicine > R Medicine (General)
R Medicine > RC Internal medicine


Request a Correction Request a Correction
View Item View Item


Downloads per month over past year