Abstract:
DNA or deoxyribonucleic acid is a double helix polymer molecule that
carries the genetic information and controls every biochemical process in the body of all organisms. By altering the DNA artificially, it is possible to correct genetic problems, treat diseases, and even eradicate diseases. Genome editing technologies have been developed in this regard to correct the DNA. Such technologies often work by cutting the DNA double helix at an intended location. These double-helix breaks can later be repaired by modifying the sequence at those places. CRISPR/Cas9 is one such genome editing technology, which has been recognized to be highly specific, cost- effective, and less time-consuming compared to other technologies such as ZFN and TALEN. CRISPR/Cas9 system introduces a DNA cleavage by splicing the double-helix with the Cas9 enzyme. This enzyme is guided to a particular genomic location by a single guide RNA (sgRNA), or simply, guide RNA (gRNA). Genome editing using the CRISPR/Cas9 system, therefore, requires designing sgRNAs that are efficient and specific. These RNAs are usually designed using reference genomes, by scanning the genome for probable locations where a double-helix break could result. The requirement to have to scan the reference genome limits their use in organisms with incomplete reference genomes. We show that it is possible to design sgRNAs without a reference genome. We do this by directly utilizing genome sequencing reads and estimating the number of cuts introduced by a particular sgRNA by counting k-mers in the reads. Using this estimation, we give an alternative definition of the specificity score of a sgRNA. We also show that sgRNAs filtered and sorted using this score are highly specific by separately scanning the reference. We further show that our list of sgRNAs is similar to those of other sgRNA designer tools that work with a reference genome.