home edit page issue tracker

Morphology

The tagset behind BulTreeBank is elaborately described in English in the following stylebook of (Simov, Osenova and Slavcheva 2004): http://www.bultreebank.org/TechRep/BTB-TR03.pdf

The tagset is positional. It encodes both levels: part-of-speech and its grammatical features (when available). It contains nearly 700 tags, since Bulgarian is a morphologically rich language.

Note that the symbol `#’, used in the Universal POS section indicates a holder for arbitrary number of features, suppressed in the respective tag as irrelevant in the BulTreeBank tagset, when mapped to the Universal one.