Sana Basharat, Prediction of Non-coding Driver Mutations Using Ensemble Learning
We employ the XGBoost algorithm to predict driver non-coding mutations based on multiple engineered features, augmented with features from existing annotation and effect prediction tools. The resulting dataset is passed through a feature selection and engineering pipeline and then trained to predict driver versus passenger non-coding mutations. We also use this model within the architecture of a known driver discovery model from existing literature. We then use non-coding driver mutations found in previous studies and predict their driver-ness using our models. Furthermore, we use Explainable AI methodologies to perform an in-depth analysis of the generated predictions.
Date: 07.06.2024 / 10:00 Place: A-212