Mustafa Erolcan Er, A Modular Framework for PDTB-Style Multilingual Discourse Parsing
This thesis addresses the inherent complexity of discourse parsing in Natural Language Processing (NLP) by developing a multilingual framework implemented for Penn Discourse TreeBank (PDTB) datasets. Leveraging the advancements of Large Language Models (LLMs) and transformer architectures, the thesis proposes a hybrid methodology that integrates fine-tuned BERT models for Discourse Connective (DC) detection and argument span labeling with in-context learning strategies for Discourse Relation Recognition (DRR). The study bridges the gap between isolated sub-tasks and end-to-end processing by defining interconnected modules that link detection, labeling, and recognition phases. Evaluating this pipeline across seven datasets in English, Portuguese, and Turkish, the framework achieves performance on par with state-of-the-art models. Additionally, the thesis contributes a novel lightweight DC detection model and introduces a method to enhance implicit discourse relation recognition using machine translation techniques, demonstrating the efficacy of these approaches in both high- and low-resource linguistic contexts.
Date: 23.12.2025 / 15:00 Place: A-212









