Improving protein structure prediction using amino acid contact & distance prediction

Ji, Shuangxi ORCID: 0000-0001-6554-1041 (2019). Improving protein structure prediction using amino acid contact & distance prediction. University of Birmingham. Ph.D.

[img]
Preview
Ji2019PhD.pdf
Text - Accepted Version
Available under License All rights reserved.

Download (10MB) | Preview

Abstract

With more and more protein sequences generated, one of the most pressing tasks in bioinformatics has become to interpret these data. This thesis concerns how to predict the 3D structure of a protein relying on its sequence only, which is a long-standing problem in computational biology. A commonly adopted intermediate step for this task is to predict pairwise amino acid contacts based on the query sequence. Due to the simplicity of the current algorithms, which include statistical models and machine learning techniques, the accuracy of contact prediction is still low for many proteins. Also, these available algorithms are unable to predict amino acid distances (distance longer than contact). Thus, the lack of high quality and enough geometry constraints make it difficult for 3D structure prediction for many proteins. To deal with the current limitations of amino acid constraint and structure prediction, a state-of-the-art deep neural network based amino acid contact & distance prediction algorithm, DeepCDpred, is proposed in this thesis. For a given query protein sequence, the geometry constraints predicted by DeepCDpred are fed into a Rosetta ab initio modelling protocol for protein structure prediction. In addition, a neural network-based method is proposed to evaluate the quality of predicted structures.
The accuracies of amino acid contact and distance predictions, the quality of structure predictions and the accuracy of confidence score predictions were evaluated by a test set of 108 protein chains whose experimental structures are known. Any sequence in the test set shares no greater than 25% sequence identity with any sequence in the training set, which was used to train DeepCDpred. The accuracy of amino acid contact predictions of DeepCDpred is just slightly worse than a newly published method, RaptorX; but exceeds all others mentioned in this thesis. Thanks to the predicted extra distance constraints and the Rosetta ab initio modelling protocol, the structure prediction quality based on the algorithms proposed in this study is better than that from the RaptorX server. A blind test, which was done with a yet to be released protein, was also used to validate the effectiveness of DeepCDpred. The protein classes of structures predicted with amino acid contact constraints from MetaPSICOV (the amino acid contact predictor, which DeepCDpred is most often compared within this thesis), are analysed and compared to the predictions based on contact constraints from DeepCDpred, and also to the predictions based on both contact and distance constraints from DeepCDpred. An online server, http://proteincoevolution.bham.ac.uk, is programmed and released to make the proposed methods for amino acid contact and distance predictions, structure prediction and structure confidence prediction accessible to average users, and it is expected beneficial to the research community.

Type of Work: Thesis (Doctorates > Ph.D.)
Award Type: Doctorates > Ph.D.
Supervisor(s):
Supervisor(s)EmailORCID
Winn, Peter JP.J.Winn@bham.ac.ukUNSPECIFIED
Butterworth, Samsam.butterworth@manchester.ac.ukUNSPECIFIED
Licence: All rights reserved
College/Faculty: Colleges (2008 onwards) > College of Life & Environmental Sciences
School or Department: School of Biosciences
Funders: None/not applicable
Subjects: Q Science > QH Natural history > QH301 Biology
URI: http://etheses.bham.ac.uk/id/eprint/9044

Actions

Request a Correction Request a Correction
View Item View Item

Downloads

Downloads per month over past year