home edit page issue tracker

Introduction

The Finnish UD treebank is based on the Turku Dependency Treebank (TDT), created at the University of Turku. The treebank consists of 15,000 sentences (200,000 tokens) and covers 10 different genres ranging from news to fiction and blog entries.

The morphological and syntactic annotation of the Finnish UD treebank is created through a conversion of TDT data, and much of the Finnish UD documentation draws directly on the TDT annotation guidelines (Haverinen et al. 2013).

Acknowledgments

We wish to thank all of the contributors to the original TDT annotation effort, including Katri Haverinen who led the annotation, Jenna Kanerva, Filip Ginter, Timo Viljanen, Veronika Laippala, Samuel Kohonen, Anna Missilä, Stina Ojala, and Tapio Salakoski, as well as the University of Turku, the Turku Centre for Computer Science, the Finnish Academy and the Turku University Foundation for supporting the original TDT annotation effort.

See also

The University of Helsinki provides a different Finnish treebank, converted to the UD notation from a newly revised FinnTreeBank 1 (ftb1-2014.zip, beta). The 19089 sentences and fragments originate as grammatical examples in the VISK Finnish grammar reference (161906 tokens, sentence lengths from 1 to 72 tokens with quartiles 5, 7, 11). This treebank is distributed as UD_Finnish-FTB.

References