NumType
: numeral type
Czech has a complex system of numerals. For example, in the school grammar of Czech, the main part of speech is “numeral”, it includes almost everything where counting is involved and there are various subtypes. It also includes interrogative, relative, indefinite and demonstrative quantifiers (words like kolik “how many”, tolik “so many”, několik “several”), so at the same time we may have a non-empty value of PronType.
From the syntactic point of view, some numtypes behave like adjectives
and some behave like adverbs. We tag them cs-pos/ADJ and
cs-pos/ADV respectively. Thus the NumType
feature applies to
several different parts of speech:
- cs-pos/NUM: cardinal numerals
- cs-pos/DET: quantifiers
- cs-pos/ADJ: adjectival ordinal and some generic numerals
- cs-pos/ADV: adverbial (e.g. ordinal and multiplicative) numerals
Card
: cardinal number or corresponding interrogative / relative / indefinite / demonstrative word
Examples
- jeden, dva, tři “one, two, three”
- kolik “how many”
- několik “several”, mnoho “many”, málo “few”
- tolik “so many”
Ord
: ordinal number or corresponding interrogative / relative / indefinite / demonstrative word
This is a subtype of adjective or adverb.
Adjectival examples
- první “first”; druhý “second”, třetí “third”
- kolikátý lit. how manieth “which rank”
- několikátý “some rank”
- tolikátý “this/that rank”
Adverbial examples
- poprvé “for the first time”; podruhé “for the second time”; potřetí “for the third time”
- pokolikáté “for which time”
- poněkolikáté “for x-th time”
- potolikáté “it has been so many times”
Mult
: multiplicative numeral or corresponding interrogative / relative / indefinite / demonstrative word
This is a subtype of adverb.
Examples
- jednou “once”; dvakrát “twice”; třikrát “three times”
- kolikrát “how many times”
- několikrát “several times”
- tolikrát “so many times”
Frac
: fraction
This is a subtype of cardinal numbers. It may denote a fraction or just the denominator of the fraction.
Examples
- půl / polovina “half”; třetina “one third”; čtvrt / čtvrtina “quarter”
Sets
: number of sets of things
Morphologically distinct class of numerals used to count sets of things, or nouns that are pluralia tantum.
Examples
- dvoje / troje boty “two / three [pairs of] shoes”; as opposed to normal cardinal numbers: dvě / tři boty “two / three shoes”
Gen
: generic numeral, i.e. a numeral that is neither of the above
Czech school grammar distinguishes this subclass, which is why it
appears in Czech tagsets. (Note that
“generic numerals” in Czech grammar also include the Sets
subclass
mentioned above.)
Examples
- čtvero, patero, desatero (specific forms of four, five, ten; they are morphologically, syntactically and stylistically distinct from the default forms čtyři, pět, deset)
- dvojí, trojí, čtverý (twofold, threefold, fourfold; these are morphologically and syntactically adjectives)
Treebank Statistics (UD_Czech)
This feature is universal.
It occurs with 6 different values: Card
, Frac
, Gen
, Mult
, Ord
, Sets
.
49212 tokens (3%) have a non-empty value of NumType
.
4024 types (3%) occur at least once with a non-empty value of NumType
.
3572 lemmas (6%) occur at least once with a non-empty value of NumType
.
The feature is used with 5 part-of-speech tags: cs-pos/NUM (41510; 3% instances), cs-pos/ADJ (4990; 0% instances), cs-pos/DET (1553; 0% instances), cs-pos/ADV (864; 0% instances), cs-pos/PRON (295; 0% instances).
NUM
41510 cs-pos/NUM tokens (100% of all NUM
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which NUM
and NumType
co-occurred: Gender=EMPTY (36751; 89%), NumValue=EMPTY (33460; 81%), Case=EMPTY (29887; 72%), Number=EMPTY (29861; 72%), NumForm=Digit (29484; 71%).
NUM
tokens may have the following values of NumType
:
Card
(41168; 99% of non-emptyNumType
): 1, 2, 3, dva, tři, 4, jeden, 6, dvě, tisícFrac
(342; 1% of non-emptyNumType
): třetiny, třetinu, třetina, třetině, čtvrtinu, čtvrtina, desetinu, čtvrtiny, pětinu, desetina
NumType
seems to be lexical feature of NUM
. 100% lemmas (3436) occur only with one value of NumType
.
ADJ
4990 cs-pos/ADJ tokens (3% of all ADJ
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which ADJ
and NumType
co-occurred: Negative=EMPTY (4990; 100%), Degree=EMPTY (4990; 100%), Number=Sing (4215; 84%), Animacy=EMPTY (3246; 65%).
ADJ
tokens may have the following values of NumType
:
Gen
(62; 1% of non-emptyNumType
): dvojí, obojí, dvojím, dvojího, obojím, trojí, dvojími, obéhoOrd
(4889; 98% of non-emptyNumType
): první, druhé, prvním, třetí, druhý, druhou, prvních, prvního, druhá, druhémSets
(39; 1% of non-emptyNumType
): jedny, jedni, dvoje, jedněch, jedněm, oboje, jedněmi, obé, trojeEMPTY
(175821): další, české, nové, poslední, státní, dalších, možné, vlastní, jiné, každý
Paradigm dvojí | Sets | Gen |
---|---|---|
Animacy=Inan|Case=Acc|Gender=Masc|Number=Sing | dvojí | |
Case=Acc|Gender=Fem|Number=Sing | dvojí | |
Case=Acc|Gender=Neut|Number=Sing | dvojí | |
Case=Acc|Number=Plur | dvoje | |
Case=Gen|Gender=Fem|Number=Sing | dvojí | |
Case=Gen|Gender=Neut|Number=Sing | dvojího | |
Case=Ins|Gender=Masc|Number=Sing | dvojím | |
Case=Ins|Gender=Fem|Number=Sing | dvojí | |
Case=Ins|Gender=Neut|Number=Sing | dvojím | |
Case=Ins|Number=Plur | dvojími | |
Case=Loc|Gender=Neut|Number=Sing | dvojím | |
Case=Nom|Number=Sing | dvojí |
NumType
seems to be lexical feature of ADJ
. 96% lemmas (64) occur only with one value of NumType
.
DET
1553 cs-pos/DET tokens (6% of all DET
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which DET
and NumType
co-occurred: Reflex=EMPTY (1553; 100%), Person=EMPTY (1553; 100%), Poss=EMPTY (1553; 100%), Number[psor]=EMPTY (1553; 100%), Gender[psor]=EMPTY (1553; 100%), Gender=EMPTY (1543; 99%), Number=EMPTY (1543; 99%), PronType=Dem,Ind (1455; 94%).
DET
tokens may have the following values of NumType
:
Card
(1552; 100% of non-emptyNumType
): několik, několika, mnoho, mnoha, kolik, málo, tolik, mála, moc, tolikaOrd
(1; 0% of non-emptyNumType
): několikátýEMPTY
(26266): jeho, jejich, své, této, její, tento, tohoto, svou, tato, těchto
NumType
seems to be lexical feature of DET
. 100% lemmas (13) occur only with one value of NumType
.
ADV
864 cs-pos/ADV tokens (1% of all ADV
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which ADV
and NumType
co-occurred: Degree=EMPTY (864; 100%), Negative=EMPTY (864; 100%).
ADV
tokens may have the following values of NumType
:
Mult
(536; 62% of non-emptyNumType
): dvakrát, jednou, třikrát, několikrát, pětkrát, desetkrát, čtyřikrát, nejednou, šestkrát, mnohokrátOrd
(328; 38% of non-emptyNumType
): poprvé, podruhé, potřetí, počtvrté, pošesté, podvanácté, popáté, Pošestnácté, podesáté, podvaadvacátéEMPTY
(79133): tak, už, také, jak, včera, ještě, již, tedy, dnes, pak
NumType
seems to be lexical feature of ADV
. 100% lemmas (56) occur only with one value of NumType
.
PRON
295 cs-pos/PRON tokens (0% of all PRON
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which PRON
and NumType
co-occurred: Variant=EMPTY (295; 100%), Reflex=EMPTY (295; 100%), Person=EMPTY (295; 100%), Number=EMPTY (289; 98%), Gender=EMPTY (289; 98%), PronType=Dem,Ind (197; 67%), Case=Acc (181; 61%).
PRON
tokens may have the following values of NumType
:
Card
(294; 100% of non-emptyNumType
): kolik, mnoho, tolik, málo, moc, několik, několika, mnoha, nejeden, nemáloOrd
(1; 0% of non-emptyNumType
): několikátéEMPTY
(72124): se, to, si, které, který, která, co, tím, kteří, tom
NumType
seems to be lexical feature of PRON
. 100% lemmas (12) occur only with one value of NumType
.
Relations with Agreement in NumType
The 10 most frequent relations where parent and child node agree in NumType
:
NUM –[conj]–> NUM (3373; 100%),
NUM –[compound]–> NUM (2801; 100%),
ADJ –[conj]–> ADJ (75; 56%),
NUM –[dep]–> NUM (52; 100%),
NUM –[det:nummod]–> DET (16; 100%),
DET –[appos]–> NUM (4; 100%),
DET –[conj]–> PRON (4; 80%),
PRON –[conj]–> PRON (3; 100%),
DET –[det:nummod]–> DET (2; 100%),
DET –[dep]–> NUM (1; 100%).
Treebank Statistics (UD_Czech-CAC)
This feature is universal.
It occurs with 6 different values: Card
, Frac
, Gen
, Mult
, Ord
, Sets
.
8992 tokens (2%) have a non-empty value of NumType
.
345 types (1%) occur at least once with a non-empty value of NumType
.
139 lemmas (0%) occur at least once with a non-empty value of NumType
.
The feature is used with 5 part-of-speech tags: cs-pos/NUM (7307; 1% instances), cs-pos/ADJ (863; 0% instances), cs-pos/DET (572; 0% instances), cs-pos/ADV (168; 0% instances), cs-pos/PRON (82; 0% instances).
NUM
7307 cs-pos/NUM tokens (100% of all NUM
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which NUM
and NumType
co-occurred: Gender=EMPTY (6108; 84%), NumValue=EMPTY (5345; 73%), Case=EMPTY (4836; 66%), Number=EMPTY (4836; 66%), NumForm=Digit (4836; 66%).
NUM
tokens may have the following values of NumType
:
Card
(7247; 99% of non-emptyNumType
): #, dvou, jeden, dvě, tři, dva, obou, jedné, jednoho, jednímFrac
(60; 1% of non-emptyNumType
): třetinu, třetina, třetiny, čtvrtiny, dvanáctinu, třetinou, třetině, šestině, desetin, desetinu
NumType
seems to be lexical feature of NUM
. 100% lemmas (59) occur only with one value of NumType
.
ADJ
863 cs-pos/ADJ tokens (1% of all ADJ
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which ADJ
and NumType
co-occurred: Negative=EMPTY (863; 100%), Degree=EMPTY (863; 100%), Number=Sing (607; 70%), Animacy=EMPTY (566; 66%).
ADJ
tokens may have the following values of NumType
:
Gen
(33; 4% of non-emptyNumType
): dvojí, obojí, dvojím, trojí, dvojího, trojím, dvojímu, obojího, obojímOrd
(819; 95% of non-emptyNumType
): první, prvním, třetí, prvních, prvního, šedesátých, třetího, třicátých, dvacátých, pátéSets
(11; 1% of non-emptyNumType
): jedněch, jedni, jedny, obojeEMPTY
(69665): další, pracovní, jednotlivých, základní, nové, možno, socialistické, různých, každý, dalších
Paradigm obojí | Sets | Gen |
---|---|---|
Case=Acc|Number=Plur | obojí | |
Case=Gen|Gender=Neut|Number=Sing | obojího | |
Case=Loc|Gender=Masc|Number=Sing | obojím | |
Case=Nom|Gender=Neut|Number=Sing | oboje | |
Case=Nom|Number=Sing | obojí | |
Case=Nom|Number=Plur | Obojí |
NumType
seems to be lexical feature of ADJ
. 97% lemmas (38) occur only with one value of NumType
.
DET
572 cs-pos/DET tokens (5% of all DET
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which DET
and NumType
co-occurred: Reflex=EMPTY (572; 100%), Person=EMPTY (572; 100%), Poss=EMPTY (572; 100%), Number[psor]=EMPTY (572; 100%), Gender[psor]=EMPTY (572; 100%), Gender=EMPTY (557; 97%), Number=EMPTY (557; 97%), PronType=Dem,Ind (521; 91%).
DET
tokens may have the following values of NumType
:
Card
(569; 99% of non-emptyNumType
): několik, mnoho, několika, mnoha, kolik, málo, tolik, nejeden, mála, nejednomOrd
(3; 1% of non-emptyNumType
): Kolikátý, kolikátá, kolikátémEMPTY
(10516): jejich, jeho, této, své, těchto, tyto, tento, tohoto, její, tato
ADV
168 cs-pos/ADV tokens (1% of all ADV
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which ADV
and NumType
co-occurred: Degree=EMPTY (168; 100%), Negative=EMPTY (168; 100%).
ADV
tokens may have the following values of NumType
:
Mult
(119; 71% of non-emptyNumType
): dvakrát, nejednou, několikrát, třikrát, mnohokrát, kolikrát, desetkrát, stokrát, čtyřikrát, dvanáctkrátOrd
(49; 29% of non-emptyNumType
): poprvé, podruhé, potřetí, potřináctéEMPTY
(27322): tak, také, jak, již, už, ještě, pak, kde, tedy, velmi
NumType
seems to be lexical feature of ADV
. 100% lemmas (33) occur only with one value of NumType
.
PRON
82 cs-pos/PRON tokens (0% of all PRON
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which PRON
and NumType
co-occurred: Variant=EMPTY (82; 100%), Reflex=EMPTY (82; 100%), Person=EMPTY (82; 100%), Number=EMPTY (81; 99%), Gender=EMPTY (81; 99%), PronType=Dem,Ind (59; 72%), Case=Acc (49; 60%).
PRON
tokens may have the following values of NumType
:
Card
(82; 100% of non-emptyNumType
): mnoho, kolik, tolik, několik, mnoha, málo, mála, nejeden, několikaEMPTY
(24307): se, to, které, si, který, která, tím, co, všech, všechny
Relations with Agreement in NumType
The 10 most frequent relations where parent and child node agree in NumType
:
NUM –[conj]–> NUM (317; 100%),
NUM –[compound]–> NUM (42; 100%),
ADV –[conj]–> ADV (6; 55%),
NUM –[det:nummod]–> DET (5; 100%),
PRON –[conj]–> PRON (1; 100%),
NUM –[appos]–> PRON (1; 100%).
Treebank Statistics (UD_Czech-CLTT)
This feature is universal.
It occurs with 3 different values: Card
, Mult
, Ord
.
499 tokens (1%) have a non-empty value of NumType
.
112 types (2%) occur at least once with a non-empty value of NumType
.
93 lemmas (3%) occur at least once with a non-empty value of NumType
.
The feature is used with 5 part-of-speech tags: cs-pos/NUM (440; 1% instances), cs-pos/ADJ (43; 0% instances), cs-pos/ADV (14; 0% instances), cs-pos/DET (1; 0% instances), cs-pos/PRON (1; 0% instances).
NUM
440 cs-pos/NUM tokens (100% of all NUM
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which NUM
and NumType
co-occurred: Gender=EMPTY (394; 90%), NumValue=EMPTY (382; 87%), NumForm=Roman (371; 84%), Case=EMPTY (371; 84%), Number=EMPTY (371; 84%).
NUM
tokens may have the following values of NumType
:
Card
(440; 100% of non-emptyNumType
): 1, 3, 2, 4, jeden, 5, 41, 7, jedné, tří
NumType
seems to be lexical feature of NUM
. 100% lemmas (83) occur only with one value of NumType
.
ADJ
43 cs-pos/ADJ tokens (1% of all ADJ
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which ADJ
and NumType
co-occurred: Negative=EMPTY (43; 100%), Degree=EMPTY (43; 100%), Number=Sing (43; 100%), Animacy=EMPTY (24; 56%).
ADJ
tokens may have the following values of NumType
:
Ord
(43; 100% of non-emptyNumType
): prvním, prvnímu, prvního, první, třetí, PÁTÁ, ČTVRTÁ, ŠESTÁ, SEDMÁ, druhéEMPTY
(6496): účetní, účetních, účetního, konsolidované, konsolidující, finanční, účetním, povinny, výroční, právní
ADV
14 cs-pos/ADV tokens (2% of all ADV
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which ADV
and NumType
co-occurred: Negative=EMPTY (14; 100%), Degree=EMPTY (14; 100%).
ADV
tokens may have the following values of NumType
:
Mult
(3; 21% of non-emptyNumType
): jednouOrd
(11; 79% of non-emptyNumType
): poprvéEMPTY
(773): dále, zejména, popřípadě, jinak, pouze, kdy, též, například, tak, více
DET
1 cs-pos/DET tokens (0% of all DET
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which DET
and NumType
co-occurred: Number[psor]=EMPTY (1; 100%), Number=EMPTY (1; 100%), Case=Ins (1; 100%), Gender=EMPTY (1; 100%), Gender[psor]=EMPTY (1; 100%), Poss=EMPTY (1; 100%), PronType=Dem,Ind (1; 100%), Person=EMPTY (1; 100%).
DET
tokens may have the following values of NumType
:
Card
(1; 100% of non-emptyNumType
): několikaEMPTY
(594): jejich, jeho, této, tohoto, těchto, tyto, tato, tento, její, tomto
PRON
1 cs-pos/PRON tokens (0% of all PRON
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which PRON
and NumType
co-occurred: PronType=Dem,Ind (1; 100%), Reflex=EMPTY (1; 100%), Number=EMPTY (1; 100%), Case=Ins (1; 100%), Variant=EMPTY (1; 100%), Gender=EMPTY (1; 100%).
PRON
tokens may have the following values of NumType
:
Card
(1; 100% of non-emptyNumType
): několikaEMPTY
(1211): se, které, která, který, to, kterých, kterým, kterém, kterému, nichž
Relations with Agreement in NumType
The 10 most frequent relations where parent and child node agree in NumType
:
NUM –[conj]–> NUM (39; 100%),
NUM –[compound]–> NUM (1; 100%),
NUM –[conj]–> PRON (1; 100%).
NumType in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]