Onur Erdoğan, EnSCAN: “En”semble “S”coring for Prioritizing “CA”usative Varia“N”ts Across Multi-Platform GWAS for Late-Onset Alzheimer's Disease

Ph.D. Candidate: Onur Erdoğan
Program: Medical Informatics
Date: 06.09.2024 / 16:00
Place: B-116

Abstract: Late-Onset Alzheimer Disease (LOAD) represents a progressive and complex neurodegenerative condition prevalent among the elderly demographic. Manifesting through cognitive deterioration, including memory impairment and diminished intellectual faculties, LOAD's etiology often intertwines with traumatic brain injuries. The genetic underpinnings of Alzheimer's Disease (AD) remain elusive, impeding early and differential diagnosis of LOAD. While Genome-Wide Association Studies (GWAS) enable the examination of statistical interactions among individual genetic variants within specific loci, traditional univariate analysis may overlook intricate relationships among these genetic elements. Conversely, machine learning (ML) algorithms prove indispensable in uncovering latent, novel, and clinically relevant patterns by accommodating nonlinear interactions among genetic variants, thereby augmenting our understanding of the genetic predisposition inherent to complex disorders. Nevertheless, conventional majority voting is inapplicable across diverse platforms due to disparate SNV’s attribute configurations. Hence, a novel post-ML ensemble methodology is devised to discern significant Single Nucleotide Variants (SNVs) across multiple genotyping platforms. Introducing the EnSCAN framework, we propose a pioneering algorithm to consolidate selected variants even across distinct platforms, thereby prioritizing candidate causative loci and enhancing ML outcomes through combining prior information captured from each multi-model of each dataset. The proposed ensemble algorithm utilizes chromosomal locations of SNVs by mapping to cytogenetic bands, along with the proximities between pairs and multi-model via Random Forest validations to prioritize SNVs and candidate causative genes for Alzheimer Disease. The scoring method is scalable and can be applied to any multi-platform genotyping study. We present how the proposed EnSCAN scoring algorithm prioritizes the candidate causative variants related to LOAD among three GWAS datasets.