Burak Demiralay, Efficient Primer Design for Genotype and Subtype Detection of Highly Divergent Viruses in Large Scale Genome Datasets

Ph.D. Candidate: Burak Demiralay
Program: Health Informatics
Date: 11.09.2023 / 17:00
Place: A-212

Abstract: Identification of microorganisms is a crucial step in diagnostics, pathogen screening, biomedical research, evolutionary studies, agriculture, and biological threat assessment. While progress has been made in studying larger organisms, there is a need for an efficient and scalable method that can handle thousands of whole genomes for organisms with high mutation rates and genetic diversity such as single stranded viruses. In this study, we developed a method to extract sequences that would detect the presence of a given species/subspecies using the PCR method. Species detection in any analysis depends highly on the measurement method and since thermodynamic interactions are critical in PCR, thermodynamics is the main driving force in the proposed methodology. We applied our method to three highly divergent viruses; 1) HCV, where the subtypes differ in 31%-33% of nucleotide sites on the average, 2) HIV, for which, 25-35% between-subtype and 15-20% within-subtype variation is observed, and 3) the Dengue virus, whose respective genomes (only DENV 1–4) share 60% sequence identity to each other. Using our method, we were able to select oligonucleotides that can identify in silico 99.9% of 1657 HCV genomes, 99.7% of 11838 HIV genomes, and 95.4% of 4016 Dengue genomes. We also show subspecies identification on genotypes 1-6 of HCV and genotypes 1-4 of the Dengue virus with >99.5% true positive and <0.05% false positive rate, on average. None of the state- of-the-art methods can produce oligonucleotides with this specificity and sensitivity on highly divergent viral genomes like the ones we studied in this thesis.