home edit page issue tracker

Tokenization

The tokenization in the Hungarian UD treebank follows the principles of the Szeged Dependency Treebank (Vincze et al. 2010). It does not contain multiword tokens.

References

Vincze, Veronika; Szauter, Dóra; Almási, Attila; Móra, György; Alexin, Zoltán; Csirik, János 2010: Hungarian Dependency Treebank. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta.