Abstract:
Protein structures provide valuable insights into their roles and functions inside living organisms. However, experimental approaches to determining protein structures are time-consuming and expensive, resulting in the development of computational methods to predict them. In the post- AlphaFold2 era, single-sequence-based protein structure prediction is a new challenge, allowing reliable estimation of protein 3D structures solely based on their primary sequences and without depending on multiple- sequence-alignments (MSA) of their sequence homologs. Accurate single- sequence-based prediction of protein structural properties, such as 8- state (Q8) secondary structure as well as backbone torsion ϕ and ψ angles, will pave the way for highly precise sequence-based prediction of protein structures. We present two multitask learning-based methods:
(i) evolutionary-feature-based SAINT-Evolve and (ii) single-sequence-based SAINT-Single to accurately predict protein Q8 secondary structure (SS) and backbone torsion angles (ϕ and ψ). We developed them based on the previously proposed single-task learning-based prediction methods SAINT and SAINT-Angle for Q8-SS and backbone torsion angle, respectively. We attempted to leverage simultaneous learning of Q8-SS and backbone torsion angle prediction to boost predictive performance. Besides, we took advantage of extracted sequence embeddings from state-of-the-art protein language models to obtain better prediction results, particularly for single- sequence-based models. We compared the predictions from our methods extensively with respect to other competing protein structural property predictors on a wide range of benchmark datasets. The experimental results indicate that our proposed methods produce reliable predictions for proteins regardless of whether they have few sequence homologs or abundant homologous sequences.