Elif Beril Şayli, An LLM-Powered Conversational Analytic System for Intelligent Data Discovery Across Mesh-Fabric Data Environments

M.S. Candidate: Elif Beril Şayli
Program: Information Systems
Date: 18.06.2026 / 15:00
Place: A-212

Abstract: Today, modern organizations operate in increasingly complex data environments where traditional centralized architectures struggle to meet demands for scalability and agility. A persistent challenge is the effective management of metadata and intuitive data discovery, particularly when natural-language questions must be translated into executable queries over raw lakehouse storage where foreign-key constraints are not explicitly declared. This thesis presents an approach that uses LLM-based metadata agents to support data discovery, enrichment, cataloging, and structuring across domains, and translates natural-language questions into SQL queries grounded in inferred metadata. The goal is to reduce the manual effort required for cataloging and schema exploration within an architecture informed by Data Mesh and Data Fabric principles. The system provides LLM-assisted metadata extraction and relationship inference to generate structured artifacts stored in versioned machine-readable formats so they can be inspected, reused, and updated as datasets evolve. This approach is useful in environments that require both decentralized ownership and cross-system interoperability while supporting consistency and reproducibility. The proposed system gathers schema and table metadata through the catalog and query layers and uses it for metadata enrichment and LLM-assisted SQL generation. The approach is evaluated on a controlled multi-domain benchmark by comparing configurations with and without inferred relationship metadata. The relation-aware configuration achieves a statistically significant correctness improvement on a specific set of analytical query patterns.The results show that versioned, inspectable relationship metadata can support NL to SQL generation in lakehouse environments under the tested conditions.