home edit page issue tracker

Tokenization

The tokenization of the UD Basque treebank follows the tokenization of the Basque Dependency Treebank (BDT), which is a straightforward whitespace-based tokenization with conventional separation of punctuation. The Basque UD treebank does not contain multiword tokens.