Abstract:
CRISPR Cas-9 is a groundbreaking gene-editing tool that harnesses a bacterial defense system to accurately alter DNA sequences. This innovative technology holds vast promise in multiple domains, such as biotechnology, agriculture, and medicine. It enables targeted gene editing with high precision on the desired DNA sequences. However, such power does not come without its own peril and one such issue is the potential for unintended modifications (Off-Target), which highlights the need for accurate prediction and mitigation strategies. Previous studies have demonstrated improvement in Off-Target prediction capability with the application of deep learning. Despite the improved accuracy of these models they often struggle with the precision-recall trade-off, limiting their effectiveness. Besides, the interpretation of the complex decision-making process of these models has not been studied thoroughly.
To address these limitations, we have thoroughly explored recurrent neural network (RNN) and transformer-based models, leveraging their established success in handling sequence data. Our experiment included RNN and its variants like LSTM, GRU, and their bidirectional and stacked compositions. Furthermore, we have employed genetic algorithm for hyperparameter tuning to optimize these model’s performance. We also pretrained a transformed-based ELECTRA model with the whole human genome sequence and finetuned the model for Off-Target prediction. Besides, we utilized integrated gradients to gain a deeper understanding of the mechanisms and decision-making processes of the best-performing models and interpreted their predictions.
The results from our experiments demonstrate improvement in performance compared to the existing best-performing studies in Off-Target prediction, highlighting the efficacy of our approach. The application of genetic algorithm yielded models with better performance by intelligently exploring only a fraction of the whole hyperparameter search space. The ELECTRA mode, though pretrained in a resource-constrained environment, achieved comparable performance in comparison to previous studies.
One notable aspect of our research is the comprehensive interpretation of our models which provided a detailed analysis and understanding of the underlying factors that contribute to Off-Target predictions. This in-depth interpretation enhances the transparency and reliability of our models and also reveals a few interesting observations which extend the established biological hypothesis of Off- Target effects. Our study would allow researchers and practitioners to gain valuable insights and future directions for advancing the CRISPR Cas-9 technology.