Efficient strategies for epistasis detection in genome-wide data

Russ, Dominic (2023). Efficient strategies for epistasis detection in genome-wide data. University of Birmingham. Ph.D.

[img]
Preview
Russ2023PhD.pdf
Text - Accepted Version
Available under License All rights reserved.

Download (5MB) | Preview

Abstract

Genome-Wide Association Studies have been carried out with SNP array technology since 2005, identifying thousands of loci for a great many traits and diseases. There are now large data sources, such as UK biobank, that provide medical and genetic data of hundreds-of-thousands of people. However, there is a shortfall in the heritability explained for the phenotypes that have been assessed. One of the explanations for this deficit is interactions between genes, called epistasis, that are not detected and so part of the causation missed. In this thesis, I carry out a comprehensive review of the large number of available epistasis detection tools in the literature. This is followed by a simulation benchmarking study to assess the ability of a representative group of these tools to detect epistatic interactions. From these tools, BOOST, MDR and MPI3SNP found the most interactions in this simulation study.

Next, I set out three possible strategies for searching in biobank scale data in order to find a best practices workflow. These were exhaustive searching, an approach tailored to the tools' strengths and by splitting the data into linkage disequilibrium-based haplotype blocks and reducing the computational load. A simulation study was devised that found a mixed approach, using both BOOST and MDR for different types of interactions. The final pipeline initially uses the BOOST algorithm to find pure epistatic interactions and filter out insignificant pairs of SNPs. Those remaining variants with large single-locus effect sizes are assessed with MDR for impure interactions. Those interactions that are identified are assessed for significance, effect size and heritability explained. Finally, validation is carried out across each interacting pair, incorporating numerous sources of a priori knowledge. This was applied to Atrial Fibrillation, Alzheimer's Disease and Parkinson's Disease, three diseases that have previously been assessed for interactions. Although no statistically significant results were identified, this approach demonstrated an increased amount of heritability explained, showing that some of the missing heritability could be accounted for this way. A downstream analysis method was devised, finding genes in linkage with the interacting loci, applying a number of functional annotations and searching STRING-db for evidence of known interactions.

Finally, the study was extended to examine rare variants in rare disease congenital hypothyroidism. As a systemic disorder, it could potentially have pathological interacting mutations. After variant calling, four de novo variants were identified, potentially explaining the condition. Six related interactions were found, with one not present in the parents, so possibly explaining the condition. The mutations, present in TG and PDIA4 have evidence of an interaction in STRING-db and both being involved in thyroid hormone synthesis in the KEGG database.

These contributions provide a novel, tested pipeline for identifying epistasis from GWAS data, as well as a corpus of simulated data for future researchers. A robust methodology is applied for testing resulting interactions statistically, as well as an approach for validating interactions by incorporating numerous data sources to find significant commonalities between variants.

Type of Work: Thesis (Doctorates > Ph.D.)
Award Type: Doctorates > Ph.D.
Supervisor(s):
Supervisor(s)EmailORCID
Gkoutos, GeorgiosUNSPECIFIEDUNSPECIFIED
Brown, James BentleyUNSPECIFIEDUNSPECIFIED
Licence: All rights reserved
College/Faculty: Colleges (2008 onwards) > College of Medical & Dental Sciences
School or Department: Institute of Cancer and Genomic Sciences
Funders: Other
Other Funders: University of Birmingham
Subjects: Q Science > Q Science (General)
Q Science > QH Natural history > QH426 Genetics
URI: http://etheses.bham.ac.uk/id/eprint/13968

Actions

Request a Correction Request a Correction
View Item View Item

Downloads

Downloads per month over past year