Global climate change induces lake level fluctuations, impacted by evolving meteorological factors and water use. Input or output changes swiftly affect the water balance equation. This study explores predictive models for climatic and hydrologic variables, assessing their correlations with lake water level and water quality. Using diverse algorithms—Naive Method, ANN, and RNN—LSTM excels in accuracy by RMSE. Comparisons with the Naïve Method confirm ANN and RNN predictive prowess, especially with extended horizons. Correlations with temperature and evaporation highlight lake water quality impacts. Models and metrics construct a decision support tool for water managers.
Date: 19.12.2023 / 13:30 Place: A-212
This thesis applies machine learning to predict outcomes of men’s singles tennis matches from 2009-2022, utilizing a standardized data mining framework, namely SRP-CRISP-DM, for replicable results. Employing six feature extraction techniques, three models, and two feature selection methods with time-based cross-validation and hyperparameter tuning, the Extreme Gradient Boosting model emerged as the top performer, scoring a Brier score of 0.1913 and an accuracy of 70.5%, with bookmakers' odds as the top predictive feature.
Date: 07.12.2023 Place: A-212
We developed an efficient and scalable method for identification of signature sequences that can handle thousands of whole genomes for organisms with high mutation rates and genetic diversity. Thermodynamics is the main driving force in our method, which is tested on three highly divergent viruses. The oligonucleotides found can identify 99.9% of 1657 HCV genomes, 99.7% of 11838 HIV genomes, and 95.4% of 4016 Dengue genomes. We also show subspecies identification on genotypes 1-6 of HCV and genotypes 1-4 of the Dengue virus with >99.5% true positive and <0.05% false positive rate. None of the state-of-the-art methods achieve this performance.
Date: 11.09.2023 / 17:00 Place: A-212
With the help of large language models, prompt engineering enables easy access to vast knowledge for various applications. However, limited research has been done on multi-hop question answering using this approach. This thesis introduces a new semi-automatic prompting method for answering two-hop questions. The method involves creating a prompt with automatically selected examples by grouping answer-named entities from the training set and using a chain-of-thought principle. The results demonstrate comparable performance to fine-tuned models on the MuSiQue dataset. Ablation studies further validate the effectiveness of each component in the proposed method. The approach has the potential to be applied to more complex multi-hop question-answering systems while upholding performance on par with other state-of-the-art techniques.
Date: 07.09.2023/13:00 Place: A-212
Most of the variants in the genome are at the non-coding region. While variations in the coding region effect the protein, variations in non-coding region effect the regulatory mechanism. Therefore, observation of non-coding variations may ensure to identify variations that effect gene expression. eQTL is a popular method used for the purpose to determine the SNPs that effect the gene expression. We have implemented a python based, easy-to-use tool to understand the relationship between the somatic SNPs and gene expression based on eQTL analysis.
Date: 11.09.2023 / 14:00 Place: A-212