METU Turkish Corpus Project

Principal Investigator:

Bilge Say



The goal of the METU Turkish Corpus Project was to develop a corpus of Turkish texts. The Project was lead by Asst. Prof. Bilge Say and has been successfully completed.  The METU Turkish Corpus is now a collection of 2 million words of post-1990 written Turkish samples and represents 10 genres. At most two samples from one source have been used, where each sample contains 2000 words. The corpus is XCES tagged at the typographical level. A subset of the corpus is tree-banked with joint efforts of METU and Sabancı University. The complete METU Turkish Corpus is available to researchers around the world for research purposes only; and it is free of charge.


You may find information on obtaining the corpus and the treebank here.


As part of a separate project (METU-Turkish Discourse Bank Project), discourse annotation has been done on a part of the corpus. METU- Turkish Discourse Bank Project site can be found here.

Group Members: