home edit page issue tracker

Tokenization

White space always indicates a token boundary and punctuation constitute separate tokens, except:

The treebank does not contain multiword tokens.