: numeral type
Czech has a complex system of numerals. For example, in the school grammar of Czech, the main part of speech is “numeral”, it includes almost everything where counting is involved and there are various subtypes. It also includes interrogative, relative, indefinite and demonstrative quantifiers (words like kolik “how many”, tolik “so many”, několik “several”), so at the same time we may have a non-empty value of PronType.
From the syntactic point of view, some numtypes behave like adjectives
and some behave like adverbs. We tag them cs-pos/ADJ and
cs-pos/ADV respectively. Thus the NumType
feature applies to
several different parts of speech:
- cs-pos/NUM: cardinal numerals
- cs-pos/DET: quantifiers
- cs-pos/ADJ: adjectival ordinal and some generic numerals
- cs-pos/ADV: adverbial (e.g. ordinal and multiplicative) numerals
: cardinal number or corresponding interrogative / relative / indefinite / demonstrative word
- jeden, dva, tři “one, two, three”
- kolik “how many”
- několik “several”, mnoho “many”, málo “few”
- tolik “so many”
: ordinal number or corresponding interrogative / relative / indefinite / demonstrative word
This is a subtype of adjective or adverb.
Adjectival examples
- první “first”; druhý “second”, třetí “third”
- kolikátý lit. how manieth “which rank”
- několikátý “some rank”
- tolikátý “this/that rank”
Adverbial examples
- poprvé “for the first time”; podruhé “for the second time”; potřetí “for the third time”
- pokolikáté “for which time”
- poněkolikáté “for x-th time”
- potolikáté “it has been so many times”
: multiplicative numeral or corresponding interrogative / relative / indefinite / demonstrative word
This is a subtype of adverb.
- jednou “once”; dvakrát “twice”; třikrát “three times”
- kolikrát “how many times”
- několikrát “several times”
- tolikrát “so many times”
: fraction
This is a subtype of cardinal numbers. It may denote a fraction or just the denominator of the fraction.
- půl / polovina “half”; třetina “one third”; čtvrt / čtvrtina “quarter”
: number of sets of things
Morphologically distinct class of numerals used to count sets of things, or nouns that are pluralia tantum.
- dvoje / troje boty “two / three [pairs of] shoes”; as opposed to normal cardinal numbers: dvě / tři boty “two / three shoes”
: generic numeral, i.e. a numeral that is neither of the above
Czech school grammar distinguishes this subclass, which is why it
appears in Czech tagsets. (Note that
“generic numerals” in Czech grammar also include the Sets
mentioned above.)
- čtvero, patero, desatero (specific forms of four, five, ten; they are morphologically, syntactically and stylistically distinct from the default forms čtyři, pět, deset)
- dvojí, trojí, čtverý (twofold, threefold, fourfold; these are morphologically and syntactically adjectives)
Treebank Statistics (UD_Czech)
This feature is universal.
It occurs with 6 different values: Card
, Frac
, Gen
, Mult
, Ord
, Sets
49212 tokens (3%) have a non-empty value of NumType
4024 types (3%) occur at least once with a non-empty value of NumType
3572 lemmas (6%) occur at least once with a non-empty value of NumType
The feature is used with 5 part-of-speech tags: cs-pos/NUM (41510; 3% instances), cs-pos/ADJ (4990; 0% instances), cs-pos/DET (1553; 0% instances), cs-pos/ADV (864; 0% instances), cs-pos/PRON (295; 0% instances).
41510 cs-pos/NUM tokens (100% of all NUM
tokens) have a non-empty value of NumType
The most frequent other feature values with which NUM
and NumType
co-occurred: Gender=EMPTY (36751; 89%), NumValue=EMPTY (33460; 81%), Case=EMPTY (29887; 72%), Number=EMPTY (29861; 72%), NumForm=Digit (29484; 71%).
tokens may have the following values of NumType
(41168; 99% of non-emptyNumType
): 1, 2, 3, dva, tři, 4, jeden, 6, dvě, tisícFrac
(342; 1% of non-emptyNumType
): třetiny, třetinu, třetina, třetině, čtvrtinu, čtvrtina, desetinu, čtvrtiny, pětinu, desetina
seems to be lexical feature of NUM
. 100% lemmas (3436) occur only with one value of NumType
4990 cs-pos/ADJ tokens (3% of all ADJ
tokens) have a non-empty value of NumType
The most frequent other feature values with which ADJ
and NumType
co-occurred: Negative=EMPTY (4990; 100%), Degree=EMPTY (4990; 100%), Number=Sing (4215; 84%), Animacy=EMPTY (3246; 65%).
tokens may have the following values of NumType
(62; 1% of non-emptyNumType
): dvojí, obojí, dvojím, dvojího, obojím, trojí, dvojími, obéhoOrd
(4889; 98% of non-emptyNumType
): první, druhé, prvním, třetí, druhý, druhou, prvních, prvního, druhá, druhémSets
(39; 1% of non-emptyNumType
): jedny, jedni, dvoje, jedněch, jedněm, oboje, jedněmi, obé, trojeEMPTY
(175821): další, české, nové, poslední, státní, dalších, možné, vlastní, jiné, každý
Paradigm dvojí | Sets | Gen |
Animacy=Inan|Case=Acc|Gender=Masc|Number=Sing | dvojí | |
Case=Acc|Gender=Fem|Number=Sing | dvojí | |
Case=Acc|Gender=Neut|Number=Sing | dvojí | |
Case=Acc|Number=Plur | dvoje | |
Case=Gen|Gender=Fem|Number=Sing | dvojí | |
Case=Gen|Gender=Neut|Number=Sing | dvojího | |
Case=Ins|Gender=Masc|Number=Sing | dvojím | |
Case=Ins|Gender=Fem|Number=Sing | dvojí | |
Case=Ins|Gender=Neut|Number=Sing | dvojím | |
Case=Ins|Number=Plur | dvojími | |
Case=Loc|Gender=Neut|Number=Sing | dvojím | |
Case=Nom|Number=Sing | dvojí |
seems to be lexical feature of ADJ
. 96% lemmas (64) occur only with one value of NumType
1553 cs-pos/DET tokens (6% of all DET
tokens) have a non-empty value of NumType
The most frequent other feature values with which DET
and NumType
co-occurred: Reflex=EMPTY (1553; 100%), Person=EMPTY (1553; 100%), Poss=EMPTY (1553; 100%), Number[psor]=EMPTY (1553; 100%), Gender[psor]=EMPTY (1553; 100%), Gender=EMPTY (1543; 99%), Number=EMPTY (1543; 99%), PronType=Dem,Ind (1455; 94%).
tokens may have the following values of NumType
(1552; 100% of non-emptyNumType
): několik, několika, mnoho, mnoha, kolik, málo, tolik, mála, moc, tolikaOrd
(1; 0% of non-emptyNumType
): několikátýEMPTY
(26266): jeho, jejich, své, této, její, tento, tohoto, svou, tato, těchto
seems to be lexical feature of DET
. 100% lemmas (13) occur only with one value of NumType
864 cs-pos/ADV tokens (1% of all ADV
tokens) have a non-empty value of NumType
The most frequent other feature values with which ADV
and NumType
co-occurred: Degree=EMPTY (864; 100%), Negative=EMPTY (864; 100%).
tokens may have the following values of NumType
(536; 62% of non-emptyNumType
): dvakrát, jednou, třikrát, několikrát, pětkrát, desetkrát, čtyřikrát, nejednou, šestkrát, mnohokrátOrd
(328; 38% of non-emptyNumType
): poprvé, podruhé, potřetí, počtvrté, pošesté, podvanácté, popáté, Pošestnácté, podesáté, podvaadvacátéEMPTY
(79133): tak, už, také, jak, včera, ještě, již, tedy, dnes, pak
seems to be lexical feature of ADV
. 100% lemmas (56) occur only with one value of NumType
295 cs-pos/PRON tokens (0% of all PRON
tokens) have a non-empty value of NumType
The most frequent other feature values with which PRON
and NumType
co-occurred: Variant=EMPTY (295; 100%), Reflex=EMPTY (295; 100%), Person=EMPTY (295; 100%), Number=EMPTY (289; 98%), Gender=EMPTY (289; 98%), PronType=Dem,Ind (197; 67%), Case=Acc (181; 61%).
tokens may have the following values of NumType
(294; 100% of non-emptyNumType
): kolik, mnoho, tolik, málo, moc, několik, několika, mnoha, nejeden, nemáloOrd
(1; 0% of non-emptyNumType
): několikátéEMPTY
(72124): se, to, si, které, který, která, co, tím, kteří, tom
seems to be lexical feature of PRON
. 100% lemmas (12) occur only with one value of NumType
Relations with Agreement in NumType
The 10 most frequent relations where parent and child node agree in NumType
NUM –[conj]–> NUM (3373; 100%),
NUM –[compound]–> NUM (2801; 100%),
ADJ –[conj]–> ADJ (75; 56%),
NUM –[dep]–> NUM (52; 100%),
NUM –[det:nummod]–> DET (16; 100%),
DET –[appos]–> NUM (4; 100%),
DET –[conj]–> PRON (4; 80%),
PRON –[conj]–> PRON (3; 100%),
DET –[det:nummod]–> DET (2; 100%),
DET –[dep]–> NUM (1; 100%).
Treebank Statistics (UD_Czech-CAC)
This feature is universal.
It occurs with 6 different values: Card
, Frac
, Gen
, Mult
, Ord
, Sets
8992 tokens (2%) have a non-empty value of NumType
345 types (1%) occur at least once with a non-empty value of NumType
139 lemmas (0%) occur at least once with a non-empty value of NumType
The feature is used with 5 part-of-speech tags: cs-pos/NUM (7307; 1% instances), cs-pos/ADJ (863; 0% instances), cs-pos/DET (572; 0% instances), cs-pos/ADV (168; 0% instances), cs-pos/PRON (82; 0% instances).
7307 cs-pos/NUM tokens (100% of all NUM
tokens) have a non-empty value of NumType
The most frequent other feature values with which NUM
and NumType
co-occurred: Gender=EMPTY (6108; 84%), NumValue=EMPTY (5345; 73%), Case=EMPTY (4836; 66%), Number=EMPTY (4836; 66%), NumForm=Digit (4836; 66%).
tokens may have the following values of NumType
(7247; 99% of non-emptyNumType
): #, dvou, jeden, dvě, tři, dva, obou, jedné, jednoho, jednímFrac
(60; 1% of non-emptyNumType
): třetinu, třetina, třetiny, čtvrtiny, dvanáctinu, třetinou, třetině, šestině, desetin, desetinu
seems to be lexical feature of NUM
. 100% lemmas (59) occur only with one value of NumType
863 cs-pos/ADJ tokens (1% of all ADJ
tokens) have a non-empty value of NumType
The most frequent other feature values with which ADJ
and NumType
co-occurred: Negative=EMPTY (863; 100%), Degree=EMPTY (863; 100%), Number=Sing (607; 70%), Animacy=EMPTY (566; 66%).
tokens may have the following values of NumType
(33; 4% of non-emptyNumType
): dvojí, obojí, dvojím, trojí, dvojího, trojím, dvojímu, obojího, obojímOrd
(819; 95% of non-emptyNumType
): první, prvním, třetí, prvních, prvního, šedesátých, třetího, třicátých, dvacátých, pátéSets
(11; 1% of non-emptyNumType
): jedněch, jedni, jedny, obojeEMPTY
(69665): další, pracovní, jednotlivých, základní, nové, možno, socialistické, různých, každý, dalších
Paradigm obojí | Sets | Gen |
Case=Acc|Number=Plur | obojí | |
Case=Gen|Gender=Neut|Number=Sing | obojího | |
Case=Loc|Gender=Masc|Number=Sing | obojím | |
Case=Nom|Gender=Neut|Number=Sing | oboje | |
Case=Nom|Number=Sing | obojí | |
Case=Nom|Number=Plur | Obojí |
seems to be lexical feature of ADJ
. 97% lemmas (38) occur only with one value of NumType
572 cs-pos/DET tokens (5% of all DET
tokens) have a non-empty value of NumType
The most frequent other feature values with which DET
and NumType
co-occurred: Reflex=EMPTY (572; 100%), Person=EMPTY (572; 100%), Poss=EMPTY (572; 100%), Number[psor]=EMPTY (572; 100%), Gender[psor]=EMPTY (572; 100%), Gender=EMPTY (557; 97%), Number=EMPTY (557; 97%), PronType=Dem,Ind (521; 91%).
tokens may have the following values of NumType
(569; 99% of non-emptyNumType
): několik, mnoho, několika, mnoha, kolik, málo, tolik, nejeden, mála, nejednomOrd
(3; 1% of non-emptyNumType
): Kolikátý, kolikátá, kolikátémEMPTY
(10516): jejich, jeho, této, své, těchto, tyto, tento, tohoto, její, tato
168 cs-pos/ADV tokens (1% of all ADV
tokens) have a non-empty value of NumType
The most frequent other feature values with which ADV
and NumType
co-occurred: Degree=EMPTY (168; 100%), Negative=EMPTY (168; 100%).
tokens may have the following values of NumType
(119; 71% of non-emptyNumType
): dvakrát, nejednou, několikrát, třikrát, mnohokrát, kolikrát, desetkrát, stokrát, čtyřikrát, dvanáctkrátOrd
(49; 29% of non-emptyNumType
): poprvé, podruhé, potřetí, potřináctéEMPTY
(27322): tak, také, jak, již, už, ještě, pak, kde, tedy, velmi
seems to be lexical feature of ADV
. 100% lemmas (33) occur only with one value of NumType
82 cs-pos/PRON tokens (0% of all PRON
tokens) have a non-empty value of NumType
The most frequent other feature values with which PRON
and NumType
co-occurred: Variant=EMPTY (82; 100%), Reflex=EMPTY (82; 100%), Person=EMPTY (82; 100%), Number=EMPTY (81; 99%), Gender=EMPTY (81; 99%), PronType=Dem,Ind (59; 72%), Case=Acc (49; 60%).
tokens may have the following values of NumType
(82; 100% of non-emptyNumType
): mnoho, kolik, tolik, několik, mnoha, málo, mála, nejeden, několikaEMPTY
(24307): se, to, které, si, který, která, tím, co, všech, všechny
Relations with Agreement in NumType
The 10 most frequent relations where parent and child node agree in NumType
NUM –[conj]–> NUM (317; 100%),
NUM –[compound]–> NUM (42; 100%),
ADV –[conj]–> ADV (6; 55%),
NUM –[det:nummod]–> DET (5; 100%),
PRON –[conj]–> PRON (1; 100%),
NUM –[appos]–> PRON (1; 100%).
Treebank Statistics (UD_Czech-CLTT)
This feature is universal.
It occurs with 3 different values: Card
, Mult
, Ord
499 tokens (1%) have a non-empty value of NumType
112 types (2%) occur at least once with a non-empty value of NumType
93 lemmas (3%) occur at least once with a non-empty value of NumType
The feature is used with 5 part-of-speech tags: cs-pos/NUM (440; 1% instances), cs-pos/ADJ (43; 0% instances), cs-pos/ADV (14; 0% instances), cs-pos/DET (1; 0% instances), cs-pos/PRON (1; 0% instances).
440 cs-pos/NUM tokens (100% of all NUM
tokens) have a non-empty value of NumType
The most frequent other feature values with which NUM
and NumType
co-occurred: Gender=EMPTY (394; 90%), NumValue=EMPTY (382; 87%), NumForm=Roman (371; 84%), Case=EMPTY (371; 84%), Number=EMPTY (371; 84%).
tokens may have the following values of NumType
(440; 100% of non-emptyNumType
): 1, 3, 2, 4, jeden, 5, 41, 7, jedné, tří
seems to be lexical feature of NUM
. 100% lemmas (83) occur only with one value of NumType
43 cs-pos/ADJ tokens (1% of all ADJ
tokens) have a non-empty value of NumType
The most frequent other feature values with which ADJ
and NumType
co-occurred: Negative=EMPTY (43; 100%), Degree=EMPTY (43; 100%), Number=Sing (43; 100%), Animacy=EMPTY (24; 56%).
tokens may have the following values of NumType
(43; 100% of non-emptyNumType
): prvním, prvnímu, prvního, první, třetí, PÁTÁ, ČTVRTÁ, ŠESTÁ, SEDMÁ, druhéEMPTY
(6496): účetní, účetních, účetního, konsolidované, konsolidující, finanční, účetním, povinny, výroční, právní
14 cs-pos/ADV tokens (2% of all ADV
tokens) have a non-empty value of NumType
The most frequent other feature values with which ADV
and NumType
co-occurred: Negative=EMPTY (14; 100%), Degree=EMPTY (14; 100%).
tokens may have the following values of NumType
(3; 21% of non-emptyNumType
): jednouOrd
(11; 79% of non-emptyNumType
): poprvéEMPTY
(773): dále, zejména, popřípadě, jinak, pouze, kdy, též, například, tak, více
1 cs-pos/DET tokens (0% of all DET
tokens) have a non-empty value of NumType
The most frequent other feature values with which DET
and NumType
co-occurred: Number[psor]=EMPTY (1; 100%), Number=EMPTY (1; 100%), Case=Ins (1; 100%), Gender=EMPTY (1; 100%), Gender[psor]=EMPTY (1; 100%), Poss=EMPTY (1; 100%), PronType=Dem,Ind (1; 100%), Person=EMPTY (1; 100%).
tokens may have the following values of NumType
(1; 100% of non-emptyNumType
): několikaEMPTY
(594): jejich, jeho, této, tohoto, těchto, tyto, tato, tento, její, tomto
1 cs-pos/PRON tokens (0% of all PRON
tokens) have a non-empty value of NumType
The most frequent other feature values with which PRON
and NumType
co-occurred: PronType=Dem,Ind (1; 100%), Reflex=EMPTY (1; 100%), Number=EMPTY (1; 100%), Case=Ins (1; 100%), Variant=EMPTY (1; 100%), Gender=EMPTY (1; 100%).
tokens may have the following values of NumType
(1; 100% of non-emptyNumType
): několikaEMPTY
(1211): se, které, která, který, to, kterých, kterým, kterém, kterému, nichž
Relations with Agreement in NumType
The 10 most frequent relations where parent and child node agree in NumType
NUM –[conj]–> NUM (39; 100%),
NUM –[compound]–> NUM (1; 100%),
NUM –[conj]–> PRON (1; 100%).
NumType in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]