home cs/pos edit page issue tracker

NUM: numeral

Definition

A numeral is a word, functioning most typically as a determiner, adjective or pronoun, that expresses a number and a relation to the number, such as quantity, sequence, frequency or fraction.

Note that cardinal numerals are covered by NUM whether they are used as determiners or not (as in Windows 7) and whether they are expressed as words (čtyři), digits (4) or Roman numerals (IV).

Czech grammar distinguishes several subclasses of pronominal numerals (quantifiers): interrogative and relative (kolik  “how many”); demonstrative (tolik  “this many”); indefinite (několik, mnoho, málo  “several, many, few”). These words behave similarly to (most) cardinal numbers, e.g. they require that the counted noun phrase be in genitive. They are not similar to adjectives (unlike their English counterparts). However, in accord with the UD standard, they should be tagged DET, not NUM.

In addition, several types of (non-pronominal) numerals, such as ordinal numerals and multiplicative numerals, are tagged ADJ or ADV, based on their syntactic and morphological behavior.

Examples

Counterexamples

References


Treebank Statistics (UD_Czech)

There are 3435 NUM lemmas (6%), 3542 NUM types (3%) and 41510 NUM tokens (3%). Out of 17 observed tags, the rank of NUM is: 5 in number of lemmas, 5 in number of types and 10 in number of tokens.

The 10 most frequent NUM lemmas: jeden, dva, 1, tři, 2, oba, 3, 4, pět, čtyři

The 10 most frequent NUM types: 1, 2, 3, dva, tři, 4, jeden, 6, dvě, tisíc

The 10 most frequent ambiguous lemmas: jeden (NUM 2526, ADJ 31), tři (NUM 1207, ADJ 1), pět (NUM 625, VERB 1), tisíc (NUM 539, NOUN 330, ADV 1), 12 (NUM 307, ADV 1), osm (NUM 236, ADJ 1), I (NUM 97, PROPN 62, ADJ 17, PRON 16), půl (NOUN 177, NUM 64), třináct (NUM 53, ADJ 1), sto (NOUN 304, NUM 41)

The 10 most frequent ambiguous types: tisíc (NUM 538, NOUN 92), dvou (NUM 519, ADJ 1), 12 (NUM 306, ADV 1), tří (NUM 239, ADJ 3), jedno (NUM 152, ADJ 1), jednou (ADV 165, NUM 129), čtyř (NUM 100, ADJ 1), I (CONJ 465, NUM 97, PROPN 62, ADJ 19, PRON 6, NOUN 1), osmi (NUM 91, ADJ 1), půl (NOUN 164, NUM 64)

Morphology

The form / lemma ratio of NUM is 1.031150 (the average of all parts of speech is 2.195950).

The 1st highest number of forms (10) was observed with the lemma “jeden”: jeden, jedna, jedno, jednoho, jednom, jednomu, jednou, jednu, jedné, jedním.

The 2nd highest number of forms (8) was observed with the lemma “třetina”: třetin, třetina, třetinou, třetinu, třetiny, třetinách, třetinám, třetině.

The 3rd highest number of forms (7) was observed with the lemma “čtvrtina”: čtvrtina, čtvrtinami, čtvrtinou, čtvrtinu, čtvrtiny, čtvrtinách, čtvrtině.

NUM occurs with 10 features: cs-feat/NumType (41510; 100% instances), cs-feat/NumForm (41168; 99% instances), cs-feat/Number (11649; 28% instances), cs-feat/Case (11623; 28% instances), cs-feat/NumValue (8050; 19% instances), cs-feat/Gender (4759; 11% instances), cs-feat/Animacy (303; 1% instances), cs-feat/Foreign (29; 0% instances), cs-feat/NameType (20; 0% instances), cs-feat/Style (2; 0% instances)

NUM occurs with 25 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Foreign=Foreign, Gender=Fem, Gender=Fem,Neut, Gender=Masc, Gender=Masc,Neut, Gender=Neut, NameType=Com, NumForm=Digit, NumForm=Roman, NumForm=Word, NumType=Card, NumType=Frac, NumValue=1,2,3, Number=Dual, Number=Plur, Number=Sing, Style=Arch

NUM occurs with 59 feature combinations. The most frequent feature combination is NumForm=Digit|NumType=Card (29484 tokens). Examples: 1, 2, 3, 4, 6, 5, 1992, 10, 1994, 1993

Relations

NUM nodes are attached to their parents using 22 different relations: cs-dep/nummod (19667; 47% instances), cs-dep/nummod:gov (7351; 18% instances), cs-dep/conj (4243; 10% instances), cs-dep/compound (2801; 7% instances), cs-dep/dep (1946; 5% instances), cs-dep/advmod (1879; 5% instances), cs-dep/root (1216; 3% instances), cs-dep/dobj (939; 2% instances), cs-dep/nsubj (711; 2% instances), cs-dep/appos (292; 1% instances), cs-dep/nmod (123; 0% instances), cs-dep/nsubjpass (90; 0% instances), cs-dep/xcomp (81; 0% instances), cs-dep/iobj (61; 0% instances), cs-dep/advcl (39; 0% instances), cs-dep/acl (28; 0% instances), cs-dep/ccomp (25; 0% instances), cs-dep/parataxis (8; 0% instances), cs-dep/advmod:emph (5; 0% instances), cs-dep/csubj (2; 0% instances), cs-dep/mark (2; 0% instances), cs-dep/csubjpass (1; 0% instances)

Parents of NUM nodes belong to 15 different parts of speech: NOUN (26036; 63% instances), NUM (6329; 15% instances), VERB (3763; 9% instances), PROPN (2556; 6% instances), ROOT (1216; 3% instances), ADJ (775; 2% instances), ADV (319; 1% instances), SYM (256; 1% instances), PRON (195; 0% instances), PUNCT (29; 0% instances), CONJ (28; 0% instances), DET (5; 0% instances), ADP (1; 0% instances), INTJ (1; 0% instances), PART (1; 0% instances)

24327 (59%) NUM nodes are leaves.

9736 (23%) NUM nodes have one child.

4009 (10%) NUM nodes have two children.

3438 (8%) NUM nodes have three or more children.

The highest child degree of a NUM node is 85.

Children of NUM nodes are attached using 28 different relations: cs-dep/punct (11542; 37% instances), cs-dep/nmod (4333; 14% instances), cs-dep/conj (3930; 12% instances), cs-dep/compound (2801; 9% instances), cs-dep/case (2109; 7% instances), cs-dep/advmod:emph (2021; 6% instances), cs-dep/cc (1290; 4% instances), cs-dep/dep (792; 3% instances), cs-dep/amod (615; 2% instances), cs-dep/cop (466; 1% instances), cs-dep/nsubj (383; 1% instances), cs-dep/advmod (315; 1% instances), cs-dep/mark (305; 1% instances), cs-dep/appos (241; 1% instances), cs-dep/nummod (99; 0% instances), cs-dep/parataxis (50; 0% instances), cs-dep/acl (47; 0% instances), cs-dep/csubj (33; 0% instances), cs-dep/xcomp (33; 0% instances), cs-dep/advcl (25; 0% instances), cs-dep/det:nummod (23; 0% instances), cs-dep/dobj (20; 0% instances), cs-dep/aux (9; 0% instances), cs-dep/neg (7; 0% instances), cs-dep/discourse (4; 0% instances), cs-dep/mwe (2; 0% instances), cs-dep/foreign (1; 0% instances), cs-dep/vocative (1; 0% instances)

Children of NUM nodes belong to 16 different parts of speech: PUNCT (11542; 37% instances), NUM (6329; 20% instances), NOUN (4421; 14% instances), ADP (2093; 7% instances), ADV (1492; 5% instances), CONJ (1238; 4% instances), SYM (920; 3% instances), PART (914; 3% instances), ADJ (785; 2% instances), VERB (716; 2% instances), PROPN (378; 1% instances), PRON (341; 1% instances), SCONJ (302; 1% instances), DET (16; 0% instances), AUX (9; 0% instances), INTJ (1; 0% instances)


Treebank Statistics (UD_Czech-CAC)

There are 59 NUM lemmas (0%), 123 NUM types (0%) and 7307 NUM tokens (1%). Out of 16 observed tags, the rank of NUM is: 8 in number of lemmas, 8 in number of types and 12 in number of tokens.

The 10 most frequent NUM lemmas: &camount;, &cyear;, jeden, dva, &clabel;, &cdate;, &cother;, oba, tři, čtyři

The 10 most frequent NUM types: #, dvou, jeden, dvě, tři, dva, obou, jedné, jednoho, jedním

The 10 most frequent ambiguous lemmas: jeden (NUM 755, ADJ 10), tisíc (NUM 48, NOUN 39), půl (NUM 37, NOUN 1), pár (NUM 23, NOUN 12), sto (NOUN 36, NUM 13)

The 10 most frequent ambiguous types: tisíc (NUM 48, NOUN 8), jednou (ADV 52, NUM 23), půl (NUM 37, NOUN 1), pár (NUM 23, NOUN 5), sto (NOUN 8, NUM 6), set (NOUN 19, NUM 4)

Morphology

The form / lemma ratio of NUM is 2.084746 (the average of all parts of speech is 2.206260).

The 1st highest number of forms (10) was observed with the lemma “jeden”: jeden, jedna, jedno, jednoho, jednom, jednomu, jednou, jednu, jedné, jedním.

The 2nd highest number of forms (7) was observed with the lemma “třetina”: třetin, třetina, třetinami, třetinou, třetinu, třetiny, třetině.

The 3rd highest number of forms (5) was observed with the lemma “tři”: třech, třem, třemi, tři, tří.

NUM occurs with 7 features: cs-feat/NumType (7307; 100% instances), cs-feat/NumForm (7247; 99% instances), cs-feat/Case (2471; 34% instances), cs-feat/Number (2471; 34% instances), cs-feat/NumValue (1962; 27% instances), cs-feat/Gender (1199; 16% instances), cs-feat/Animacy (100; 1% instances)

NUM occurs with 21 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Gender=Fem, Gender=Fem,Neut, Gender=Masc, Gender=Masc,Neut, Gender=Neut, NumForm=Digit, NumForm=Word, NumType=Card, NumType=Frac, NumValue=1,2,3, Number=Dual, Number=Plur, Number=Sing

NUM occurs with 49 feature combinations. The most frequent feature combination is NumForm=Digit|NumType=Card (4836 tokens). Examples: #

Relations

NUM nodes are attached to their parents using 23 different relations: cs-dep/nummod (4440; 61% instances), cs-dep/nummod:gov (1189; 16% instances), cs-dep/conj (500; 7% instances), cs-dep/advmod (441; 6% instances), cs-dep/dobj (170; 2% instances), cs-dep/dep (139; 2% instances), cs-dep/nsubj (115; 2% instances), cs-dep/root (114; 2% instances), cs-dep/nsubjpass (43; 1% instances), cs-dep/compound (42; 1% instances), cs-dep/advcl (30; 0% instances), cs-dep/xcomp (23; 0% instances), cs-dep/appos (21; 0% instances), cs-dep/iobj (14; 0% instances), cs-dep/acl (8; 0% instances), cs-dep/ccomp (5; 0% instances), cs-dep/parataxis (5; 0% instances), cs-dep/aux (2; 0% instances), cs-dep/nmod (2; 0% instances), cs-dep/auxpass (1; 0% instances), cs-dep/case (1; 0% instances), cs-dep/cop (1; 0% instances), cs-dep/csubjpass (1; 0% instances)

Parents of NUM nodes belong to 12 different parts of speech: NOUN (5425; 74% instances), VERB (816; 11% instances), NUM (388; 5% instances), SYM (222; 3% instances), ADJ (168; 2% instances), ROOT (114; 2% instances), PROPN (67; 1% instances), ADV (54; 1% instances), PRON (47; 1% instances), PART (4; 0% instances), CONJ (1; 0% instances), DET (1; 0% instances)

4902 (67%) NUM nodes are leaves.

1237 (17%) NUM nodes have one child.

787 (11%) NUM nodes have two children.

381 (5%) NUM nodes have three or more children.

The highest child degree of a NUM node is 12.

Children of NUM nodes are attached using 25 different relations: cs-dep/nmod (1390; 32% instances), cs-dep/case (588; 13% instances), cs-dep/advmod:emph (552; 13% instances), cs-dep/conj (454; 10% instances), cs-dep/cc (357; 8% instances), cs-dep/punct (323; 7% instances), cs-dep/cop (158; 4% instances), cs-dep/nsubj (134; 3% instances), cs-dep/amod (103; 2% instances), cs-dep/mark (88; 2% instances), cs-dep/advmod (79; 2% instances), cs-dep/compound (42; 1% instances), cs-dep/appos (33; 1% instances), cs-dep/nummod (27; 1% instances), cs-dep/dep (22; 1% instances), cs-dep/acl (11; 0% instances), cs-dep/advcl (6; 0% instances), cs-dep/csubj (6; 0% instances), cs-dep/xcomp (6; 0% instances), cs-dep/det:nummod (5; 0% instances), cs-dep/parataxis (4; 0% instances), cs-dep/aux (2; 0% instances), cs-dep/neg (2; 0% instances), cs-dep/discourse (1; 0% instances), cs-dep/dobj (1; 0% instances)

Children of NUM nodes belong to 15 different parts of speech: SYM (972; 22% instances), ADP (587; 13% instances), NOUN (572; 13% instances), ADV (399; 9% instances), NUM (388; 9% instances), PART (349; 8% instances), PUNCT (323; 7% instances), CONJ (268; 6% instances), VERB (211; 5% instances), ADJ (124; 3% instances), PRON (95; 2% instances), SCONJ (86; 2% instances), PROPN (13; 0% instances), DET (5; 0% instances), AUX (2; 0% instances)


Treebank Statistics (UD_Czech-CLTT)

There are 83 NUM lemmas (3%), 97 NUM types (2%) and 440 NUM tokens (1%). Out of 15 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 11 in number of tokens.

The 10 most frequent NUM lemmas: 1, 3, 2, jeden, 4, 5, dva, 41, 7, dvanáct

The 10 most frequent NUM types: 1, 3, 2, 4, jeden, 5, 41, 7, jedné, tří

The 10 most frequent ambiguous lemmas:

The 10 most frequent ambiguous types: jednou (NUM 3, ADV 3)

Morphology

The form / lemma ratio of NUM is 1.168675 (the average of all parts of speech is 1.764161).

The 1st highest number of forms (9) was observed with the lemma “jeden”: jeden, jedno, jednoho, jednom, jednomu, jednou, jednu, jedné, jedním.

The 2nd highest number of forms (4) was observed with the lemma “dva”: dva, dvou, dvě, dvěma.

The 3rd highest number of forms (2) was observed with the lemma “dvanáct”: dvanáct, dvanácti.

NUM occurs with 7 features: cs-feat/NumForm (440; 100% instances), cs-feat/NumType (440; 100% instances), cs-feat/Case (69; 16% instances), cs-feat/Number (69; 16% instances), cs-feat/NumValue (58; 13% instances), cs-feat/Gender (46; 10% instances), cs-feat/Animacy (12; 3% instances)

NUM occurs with 18 feature-value pairs: Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Gender=Fem, Gender=Fem,Neut, Gender=Masc, Gender=Masc,Neut, Gender=Neut, NumForm=Roman, NumForm=Word, NumType=Card, NumValue=1,2,3, Number=Plur, Number=Sing

NUM occurs with 20 feature combinations. The most frequent feature combination is NumForm=Roman|NumType=Card (371 tokens). Examples: 1, 3, 2, 4, 5, 41, 7, 10, 2004, 2008

Relations

NUM nodes are attached to their parents using 10 different relations: cs-dep/nummod (286; 65% instances), cs-dep/nmod (50; 11% instances), cs-dep/conj (45; 10% instances), cs-dep/nummod:gov (24; 5% instances), cs-dep/advcl (17; 4% instances), cs-dep/dobj (9; 2% instances), cs-dep/advmod (5; 1% instances), cs-dep/nsubj (2; 0% instances), cs-dep/compound (1; 0% instances), cs-dep/dep (1; 0% instances)

Parents of NUM nodes belong to 7 different parts of speech: NOUN (337; 77% instances), NUM (40; 9% instances), VERB (37; 8% instances), ADV (13; 3% instances), ADJ (6; 1% instances), X (6; 1% instances), SYM (1; 0% instances)

272 (62%) NUM nodes are leaves.

104 (24%) NUM nodes have one child.

50 (11%) NUM nodes have two children.

14 (3%) NUM nodes have three or more children.

The highest child degree of a NUM node is 11.

Children of NUM nodes are attached using 12 different relations: cs-dep/punct (86; 32% instances), cs-dep/nmod (62; 23% instances), cs-dep/conj (43; 16% instances), cs-dep/cc (34; 13% instances), cs-dep/mark (17; 6% instances), cs-dep/advmod:emph (14; 5% instances), cs-dep/dep (7; 3% instances), cs-dep/advmod (3; 1% instances), cs-dep/case (3; 1% instances), cs-dep/compound (1; 0% instances), cs-dep/cop (1; 0% instances), cs-dep/nsubj (1; 0% instances)

Children of NUM nodes belong to 12 different parts of speech: PUNCT (86; 32% instances), NUM (40; 15% instances), X (33; 12% instances), NOUN (25; 9% instances), CONJ (24; 9% instances), SCONJ (21; 8% instances), ADV (20; 7% instances), SYM (13; 5% instances), PART (6; 2% instances), ADP (2; 1% instances), PRON (1; 0% instances), VERB (1; 0% instances)


NUM in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]