Ph.D. Candidate: Elif Güney Tamer
Program: Medical Informatics
Date: 15.01.2026 / 11:00
Place: A-212
Abstract: Accurate identification of splice-altering genetic variants is critical for understanding disease mechanisms and improving clinical variant interpretation. Although deep learning–based splice prediction tools perform well for canonical splice-site variants, their ability to detect exonic splice-altering variants remains limited. This limitation is primarily due to the scarcity of experimentally validated exonic variants and model architectures optimized for canonical splice motifs rather than regulatory exonic regions. In this thesis, we systematically evaluate state-of-the-art splice prediction tools using independent, experimentally validated datasets, with a specific focus on exonic variant performance. We curated and assembled the largest validated exonic splice-altering variant dataset reported to date, including both pathogenic and benign variants. Benchmarking analyses revealed consistent performance degradation across tools for exonic variants compared to canonical splice-site mutations. To address this gap, we retrained the Pangolin deep learning model by explicitly incorporating validated exonic splice variants into its training data. While the retrained model did not surpass the overall performance of the original Pangolin model, it demonstrated improved sensitivity, stability, and reduced false-negative rates for exonic splice-altering variants, particularly at higher prediction thresholds. Notably, the model showed improved detection of variants located in exonic splicing enhancer and silencer regions (ESE/ESS). Overall, this study provides a comprehensive evaluation of current splice prediction tools, demonstrates the benefit of targeted retraining for exonic variant detection, and establishes a foundation for developing more accurate and clinically relevant splice-altering variant prediction models.
