M.S. Candidate: Hasan Can Öztürk
Program: Cognitive Science
Date: 22.01.2024 / 13:00
Place: B-116
Abstract: Scholars from around the world study human language thoroughly and use varied methods for doing so. The best technique to research a language phenomenon is still up for debate, although employing TED Talks that contain real-world linguistic data is thought to be sufficient. To this end, discourse coherence analysis has been applied on TED Talks for a long time but the literature is little to nonexistent in terms of the Turkish context. The aim of this work is to investigate the global discourse structure of Turkish TED talks. The thesis is based on an annotation study that aims to capture how the talks are composed of specific discourse segments that motivate listeners. 70 TEDx Talks in Turkish with, reliable human-generated transcriptions were chosen to be annotated. These were collected as subtitle files and manually annotated to map out significant discourse segments. This study uses three basic units based on the nature of motivational talks: (1) Awareness, (2) Task and (3) Calling. For every talk, a number of features such as the number of total words, specific transition words, duration (second-wise), speed, average embedding and the ending percentile of each sentence were used for training Machine Learning (ML) models. The results indicate that by taking into account all these different features, the transitions between motivational discourse segments can be predicted with an F1-score of 0.78.