Variant
: alternative form of word
Sometimes there are multiple word forms for the same lemma and set of features.
The Variant
feature helps distinguish alternate forms.
In Czech there are two groups of words where double forms are regular and worth capturing:
short forms of adjectives and short (clitic) forms of personal pronouns.
This feature only marks the non-standard short forms, hence there is only one value, Short
.
For the long standard forms the Variant
feature remains unspecified.
Short
: short form of adjectives
The short form is called nominal form of adjective (jmenný tvar přídavného jména), as opposed to the long form, which is pronominal because it originated as a combination of a nominal form and a personal pronoun. But this is ancient history of the language. In modern Czech, only a subset of the nominal forms survive, and using them sometimes sounds slightly archaic. They are used as nominal predicates with copula, but they do not appear as premodifiers of nouns. The pronominal forms are considered standard, except for two frequent adjectives that do not have them: třeba, rád.
Examples
- možno “possible”, schopen “able”, nutno “necessary”, znám “known”, spokojen “satisfied”, povinen “supposed to”, ochoten “willing”, jist “sure”, vědom “knowing”, přítomen “present”, roven “equal”, patrno “apparent”, hotov “finished”, spjat “connected”, vinen “guilty”
- Long equivalents: možné, schopný, nutné, známý, spokojený, povinný, ochotný, jistý, vědomý, přítomný, rovný, patrné, hotový, spjatý, vinný
Short
: short (clitic) form of personal pronouns
Some personal pronouns in dative and accusative Case have double forms. The normal (long) form is more independent in terms of positions it can take in word order. The short forms are clitics (http://cs.wikipedia.org/wiki/P%C5%99%C3%ADklonka). They are separate words (unlike in some other languages) but in the word order they usually stick to the second position.
- mi, mě, ti, tě, mu, ho, si, se
- mně, mne, tobě, tebe, jemu, jeho, sobě, sebe
- “me, me, you, you, him, him, oneself, oneself”
Treebank Statistics (UD_Czech)
This feature is language-specific.
It occurs with 1 different values: Short
.
29070 tokens (2%) have a non-empty value of Variant
.
159 types (0%) occur at least once with a non-empty value of Variant
.
57 lemmas (0%) occur at least once with a non-empty value of Variant
.
The feature is used with 2 part-of-speech tags: cs-pos/PRON (27181; 2% instances), cs-pos/ADJ (1889; 0% instances).
PRON
27181 cs-pos/PRON tokens (38% of all PRON
tokens) have a non-empty value of Variant
.
The most frequent other feature values with which PRON
and Variant
co-occurred: PronType=Prs (27181; 100%), Gender=EMPTY (25948; 95%), Reflex=Yes (25163; 93%), Person=EMPTY (25163; 93%), Number=EMPTY (25163; 93%), Case=Acc (22246; 82%).
PRON
tokens may have the following values of Variant
:
Short
(27181; 100% of non-emptyVariant
): se, si, mu, ho, mi, mě, tě, ti, sa
ADJ
1889 cs-pos/ADJ tokens (1% of all ADJ
tokens) have a non-empty value of Variant
.
The most frequent other feature values with which ADJ
and Variant
co-occurred: Degree=EMPTY (1889; 100%), Case=EMPTY (1889; 100%), Negative=Pos (1844; 98%), Animacy=EMPTY (1529; 81%), Number=Sing (1333; 71%).
ADJ
tokens may have the following values of Variant
:
Short
(1889; 100% of non-emptyVariant
): třeba, možno, rád, schopen, nutno, schopni, známo, schopna, rádi, spokojen
Variant
seems to be lexical feature of ADJ
. 100% lemmas (51) occur only with one value of Variant
.
Treebank Statistics (UD_Czech-CAC)
This feature is language-specific.
It occurs with 1 different values: Short
.
9993 tokens (2%) have a non-empty value of Variant
.
104 types (0%) occur at least once with a non-empty value of Variant
.
42 lemmas (0%) occur at least once with a non-empty value of Variant
.
The feature is used with 2 part-of-speech tags: cs-pos/PRON (9195; 2% instances), cs-pos/ADJ (798; 0% instances).
PRON
9195 cs-pos/PRON tokens (38% of all PRON
tokens) have a non-empty value of Variant
.
The most frequent other feature values with which PRON
and Variant
co-occurred: PronType=Prs (9195; 100%), Gender=EMPTY (8883; 97%), Reflex=Yes (8706; 95%), Number=EMPTY (8705; 95%), Person=EMPTY (8705; 95%), Case=Acc (7929; 86%).
PRON
tokens may have the following values of Variant
:
Short
(9195; 100% of non-emptyVariant
): se, si, mu, ho, mi, mě, ti, tě, mně, sis
ADJ
798 cs-pos/ADJ tokens (1% of all ADJ
tokens) have a non-empty value of Variant
.
The most frequent other feature values with which ADJ
and Variant
co-occurred: Degree=EMPTY (798; 100%), Case=EMPTY (795; 100%), Negative=Pos (790; 99%), Animacy=EMPTY (647; 81%), Number=Sing (588; 74%), Gender=Neut (431; 54%).
ADJ
tokens may have the following values of Variant
:
Short
(798; 100% of non-emptyVariant
): možno, nutno, povinen, známo, rád, rádi, povinna, potřeba, povinni, schopen
Variant
seems to be lexical feature of ADJ
. 100% lemmas (38) occur only with one value of Variant
.
Treebank Statistics (UD_Czech-CLTT)
This feature is language-specific.
It occurs with 1 different values: Short
.
572 tokens (2%) have a non-empty value of Variant
.
15 types (0%) occur at least once with a non-empty value of Variant
.
8 lemmas (0%) occur at least once with a non-empty value of Variant
.
The feature is used with 2 part-of-speech tags: cs-pos/PRON (469; 1% instances), cs-pos/ADJ (103; 0% instances).
PRON
469 cs-pos/PRON tokens (39% of all PRON
tokens) have a non-empty value of Variant
.
The most frequent other feature values with which PRON
and Variant
co-occurred: PronType=Prs (469; 100%), Gender=EMPTY (468; 100%), Case=Acc (468; 100%), Number=EMPTY (468; 100%), Reflex=Yes (468; 100%).
PRON
tokens may have the following values of Variant
:
Short
(469; 100% of non-emptyVariant
): se, ho, si
ADJ
103 cs-pos/ADJ tokens (2% of all ADJ
tokens) have a non-empty value of Variant
.
The most frequent other feature values with which ADJ
and Variant
co-occurred: Degree=EMPTY (103; 100%), Case=EMPTY (103; 100%), Negative=Pos (103; 100%), Gender=Fem,Masc (58; 56%), Animacy=Inan (58; 56%), Number=Plur (58; 56%).
ADJ
tokens may have the following values of Variant
:
Short
(103; 100% of non-emptyVariant
): povinny, povinna, možno, známa, známy, schopna, znám, nutno, povinen, rovny