home cs/pos edit page issue tracker

DET: determiner

Definition

Determiners are words that modify nouns or noun phrases and express the reference of the noun phrase in context. That is, a determiner may indicate whether the noun is referring to a definite or indefinite element of a class, to a closer or more distant element, to an element belonging to a specified person or thing, to a particular number or quantity, etc.

An important point to note is that the traditional grammar of Czech does not define determiners as a separate word class. Czech does not have articles. Most determiners are traditionally called pronouns; that is, an UD-conformant annotation of Czech must distinguish between substantive pronouns (UD tag PRON) and attributive pronouns (UD tag DET).

Also note that the DET tag includes (pronominal) quantifiers (words like mnoho, málo  “many, few”), which the traditional grammar classifies as a special subclass of numerals. However, cardinal numerals in the narrow sense (jeden, pět, sto) are not tagged DET even though some authors would include them in quantifiers. Cardinal numbers have their own tag NUM.

Conversion from the Prague Dependency Treebank

Since the PDT tagset (like all other Czech tagsets) does not distinguish substantive and attributive pronouns, morphological tags alone are not enough to find the correct universal POS tag. Morphological rules could help, as the inflection patterns of some pronouns bear similarities to adjectival inflection; nevertheless, there will be other cases that cannot be solved this way. We have to examine the dependency tree. If a pronoun modifies a noun, it should be tagged DET. Otherwise it is PRON. As a result, all words that can be tagged DET can also be tagged PRON, but some words can only be tagged PRON. (We cannot recognize cases where the pronoun is in fact attributive, but the modified noun has been elided and is not represented in the tree.)

For instance, tohle  “this” is either pronoun (Tohle jsem viděl včera.  “I saw this yesterday.”) or determiner (Tohle auto jsem viděl včera.  “I saw this car yesterday.”)

Examples

References


Treebank Statistics (UD_Czech)

There are 55 DET lemmas (0%), 325 DET types (0%) and 27819 DET tokens (2%). Out of 17 observed tags, the rank of DET is: 10 in number of lemmas, 8 in number of types and 11 in number of tokens.

The 10 most frequent DET lemmas: tento, jeho, svůj, můj, ten, některý, několik, takový, žádný, jenž

The 10 most frequent DET types: jeho, jejich, své, této, její, tento, tohoto, svou, tato, těchto

The 10 most frequent ambiguous lemmas: tento (DET 6204, PRON 97), jeho (DET 5792, PRON 44), svůj (DET 4767, PRON 113, ADJ 4), můj (DET 2581, PRON 71), ten (PRON 11968, DET 1312), některý (DET 1096, PRON 234), několik (DET 872, PRON 25), takový (DET 866, PRON 169), žádný (DET 745, PRON 86), jenž (PRON 2211, DET 648)

The 10 most frequent ambiguous types: jeho (DET 2457, PRON 32), jejich (DET 1698, PRON 11), své (DET 1366, PRON 40, ADJ 1), této (DET 993, PRON 3), její (DET 711, PRON 8), tento (DET 586, PRON 9), svou (DET 607, PRON 7), tato (DET 377, PRON 7), těchto (DET 582, PRON 7), tyto (DET 432, PRON 1)

Morphology

The form / lemma ratio of DET is 5.909091 (the average of all parts of speech is 2.195950).

The 1st highest number of forms (27) was observed with the lemma “můj”: Mí, moje, moji, mojí, mou, má, mé, mého, mém, mému, mých, mýho, mým, mými, můj, n, naše, našeho, našem, našemu, naši, našich, našim, našimi, naší, naším, náš.

The 2nd highest number of forms (19) was observed with the lemma “jakýkoliv”: jakoukoli, jakoukoliv, jakákoli, jakákoliv, jakéhokoli, jakéhokoliv, jakékoli, jakékoliv, jakémkoli, jakémkoliv, jakémukoli, jakémukoliv, jakýchkoli, jakýchkoliv, jakýkoli, jakýkoliv, jakýmikoliv, jakýmkoli, jakýmkoliv.

The 3rd highest number of forms (16) was observed with the lemma “ten”: ta, ten, ti, to, toho, tom, tomu, tou, tu, ty, té, tím, těch, těm, těma, těmi.

DET occurs with 16 features: cs-feat/PronType (27819; 100% instances), cs-feat/Case (22389; 80% instances), cs-feat/Number (21267; 76% instances), cs-feat/Gender (17998; 65% instances), cs-feat/Poss (14046; 50% instances), cs-feat/Number[psor] (9278; 33% instances), cs-feat/Person (9278; 33% instances), cs-feat/Reflex (4768; 17% instances), cs-feat/Gender[psor] (4332; 16% instances), cs-feat/Animacy (2622; 9% instances), cs-feat/NumType (1553; 6% instances), cs-feat/Negative (745; 3% instances), cs-feat/Abbr (15; 0% instances), cs-feat/Style (14; 0% instances), cs-feat/Foreign (1; 0% instances), cs-feat/NameType (1; 0% instances)

DET occurs with 40 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Foreign=Foreign, Gender=Fem, Gender=Fem,Neut, Gender=Masc, Gender=Masc,Neut, Gender=Neut, Gender[psor]=Fem, Gender[psor]=Masc,Neut, NameType=Oth, Negative=Neg, NumType=Card, NumType=Ord, Number=Dual, Number=Plur, Number=Sing, Number[psor]=Plur, Number[psor]=Sing, Person=1, Person=2, Person=3, Poss=Yes, PronType=Dem, PronType=Dem,Ind, PronType=Ind, PronType=Int,Rel, PronType=Neg, PronType=Prs, PronType=Rel, Reflex=Yes, Style=Coll

DET occurs with 273 feature combinations. The most frequent feature combination is Gender[psor]=Masc,Neut|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs (2719 tokens). Examples: jeho

Relations

DET nodes are attached to their parents using 8 different relations: cs-dep/det (26236; 94% instances), cs-dep/det:numgov (979; 4% instances), cs-dep/det:nummod (564; 2% instances), cs-dep/advcl (24; 0% instances), cs-dep/acl (9; 0% instances), cs-dep/ccomp (3; 0% instances), cs-dep/csubj (2; 0% instances), cs-dep/nmod (2; 0% instances)

Parents of DET nodes belong to 6 different parts of speech: NOUN (27491; 99% instances), ADJ (106; 0% instances), PROPN (106; 0% instances), PRON (97; 0% instances), NUM (16; 0% instances), DET (3; 0% instances)

27209 (98%) DET nodes are leaves.

371 (1%) DET nodes have one child.

182 (1%) DET nodes have two children.

57 (0%) DET nodes have three or more children.

The highest child degree of a DET node is 12.

Children of DET nodes are attached using 18 different relations: cs-dep/advmod:emph (180; 19% instances), cs-dep/punct (129; 13% instances), cs-dep/conj (116; 12% instances), cs-dep/cc (96; 10% instances), cs-dep/case (91; 10% instances), cs-dep/acl (90; 9% instances), cs-dep/advmod (67; 7% instances), cs-dep/advcl (34; 4% instances), cs-dep/mark (30; 3% instances), cs-dep/appos (27; 3% instances), cs-dep/amod (25; 3% instances), cs-dep/nmod (24; 3% instances), cs-dep/cop (17; 2% instances), cs-dep/dep (12; 1% instances), cs-dep/nsubj (12; 1% instances), cs-dep/xcomp (4; 0% instances), cs-dep/det:nummod (2; 0% instances), cs-dep/neg (1; 0% instances)

Children of DET nodes belong to 13 different parts of speech: ADV (174; 18% instances), PUNCT (129; 13% instances), CONJ (128; 13% instances), VERB (114; 12% instances), ADP (91; 10% instances), ADJ (87; 9% instances), NOUN (71; 7% instances), PRON (67; 7% instances), PART (46; 5% instances), SCONJ (30; 3% instances), PROPN (12; 1% instances), NUM (5; 1% instances), DET (3; 0% instances)


Treebank Statistics (UD_Czech-CAC)

There are 44 DET lemmas (0%), 281 DET types (0%) and 11088 DET tokens (2%). Out of 16 observed tags, the rank of DET is: 10 in number of lemmas, 7 in number of types and 9 in number of tokens.

The 10 most frequent DET lemmas: tento, jeho, svůj, můj, ten, některý, takový, jenž, několik, mnoho

The 10 most frequent DET types: jejich, jeho, této, své, těchto, tyto, tento, tohoto, její, tato

The 10 most frequent ambiguous lemmas: tento (DET 2974, PRON 58), jeho (DET 2290, PRON 10), svůj (DET 1393, PRON 24), můj (DET 1168, PRON 25), ten (PRON 3818, DET 537), některý (DET 505, PRON 98), takový (DET 362, PRON 64), jenž (PRON 840, DET 339), několik (DET 271, PRON 9), mnoho (DET 206, PRON 38, ADV 6)

The 10 most frequent ambiguous types: jejich (DET 902, PRON 1), jeho (DET 801, PRON 30), této (DET 463, PRON 4), své (DET 431, PRON 9), těchto (DET 377, PRON 8), tyto (DET 251, PRON 9), tohoto (DET 322, PRON 1), její (DET 284, PRON 3), tato (DET 153, PRON 2), naší (DET 260, PRON 4)

Morphology

The form / lemma ratio of DET is 6.386364 (the average of all parts of speech is 2.206260).

The 1st highest number of forms (25) was observed with the lemma “můj”: moje, moji, mojí, mou, má, mé, mého, mém, mému, mých, mým, mýma, můj, naše, našeho, našem, našemu, naši, našich, našim, našima, našimi, naší, naším, náš.

The 2nd highest number of forms (16) was observed with the lemma “ten”: ta, ten, ti, to, toho, tom, tomu, tou, tu, ty, té, tím, tý, těch, těm, těmi.

The 3rd highest number of forms (15) was observed with the lemma “tento”: tato, tento, tito, tohoto, tomto, tomuto, toto, touto, tuto, tyto, této, tímto, těchto, těmito, těmto.

DET occurs with 13 features: cs-feat/PronType (11088; 100% instances), cs-feat/Case (8846; 80% instances), cs-feat/Number (8465; 76% instances), cs-feat/Gender (6978; 63% instances), cs-feat/Poss (5279; 48% instances), cs-feat/Number[psor] (3886; 35% instances), cs-feat/Person (3886; 35% instances), cs-feat/Gender[psor] (1559; 14% instances), cs-feat/Reflex (1393; 13% instances), cs-feat/Animacy (895; 8% instances), cs-feat/NumType (572; 5% instances), cs-feat/Negative (121; 1% instances), cs-feat/Style (3; 0% instances)

DET occurs with 36 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Gender=Fem, Gender=Fem,Neut, Gender=Masc, Gender=Masc,Neut, Gender=Neut, Gender[psor]=Fem, Gender[psor]=Masc,Neut, Negative=Neg, NumType=Card, NumType=Ord, Number=Dual, Number=Plur, Number=Sing, Number[psor]=Plur, Number[psor]=Sing, Person=1, Person=2, Person=3, Poss=Yes, PronType=Dem, PronType=Dem,Ind, PronType=Ind, PronType=Int,Rel, PronType=Neg, PronType=Prs, PronType=Rel, Reflex=Yes, Style=Coll

DET occurs with 236 feature combinations. The most frequent feature combination is Number[psor]=Plur|Person=3|Poss=Yes|PronType=Prs (949 tokens). Examples: jejich

Relations

DET nodes are attached to their parents using 7 different relations: cs-dep/det (10517; 95% instances), cs-dep/det:numgov (367; 3% instances), cs-dep/det:nummod (190; 2% instances), cs-dep/advcl (7; 0% instances), cs-dep/acl (4; 0% instances), cs-dep/nmod (2; 0% instances), cs-dep/csubj (1; 0% instances)

Parents of DET nodes belong to 5 different parts of speech: NOUN (10966; 99% instances), ADJ (55; 0% instances), PROPN (32; 0% instances), PRON (30; 0% instances), NUM (5; 0% instances)

10797 (97%) DET nodes are leaves.

198 (2%) DET nodes have one child.

69 (1%) DET nodes have two children.

24 (0%) DET nodes have three or more children.

The highest child degree of a DET node is 5.

Children of DET nodes are attached using 16 different relations: cs-dep/advmod:emph (142; 34% instances), cs-dep/cc (61; 14% instances), cs-dep/conj (56; 13% instances), cs-dep/acl (30; 7% instances), cs-dep/punct (28; 7% instances), cs-dep/advmod (23; 5% instances), cs-dep/case (20; 5% instances), cs-dep/advcl (11; 3% instances), cs-dep/cop (10; 2% instances), cs-dep/amod (8; 2% instances), cs-dep/dep (8; 2% instances), cs-dep/nsubj (8; 2% instances), cs-dep/appos (7; 2% instances), cs-dep/mark (6; 1% instances), cs-dep/nmod (4; 1% instances), cs-dep/nummod (1; 0% instances)

Children of DET nodes belong to 12 different parts of speech: ADV (97; 23% instances), CONJ (91; 22% instances), VERB (43; 10% instances), PRON (39; 9% instances), PART (37; 9% instances), ADJ (36; 9% instances), PUNCT (28; 7% instances), NOUN (22; 5% instances), ADP (20; 5% instances), SCONJ (7; 2% instances), PROPN (2; 0% instances), NUM (1; 0% instances)


Treebank Statistics (UD_Czech-CLTT)

There are 13 DET lemmas (0%), 56 DET types (1%) and 595 DET tokens (2%). Out of 15 observed tags, the rank of DET is: 12 in number of lemmas, 8 in number of types and 10 in number of tokens.

The 10 most frequent DET lemmas: tento, jeho, svůj, jenž, takový, některý, takovýto, jaký, jakýkoliv, žádný

The 10 most frequent DET types: jejich, jeho, této, tohoto, těchto, tyto, tato, tento, její, tomto

The 10 most frequent ambiguous lemmas: tento (DET 315, PRON 5), jenž (PRON 73, DET 21), takový (DET 21, PRON 1), některý (DET 6, PRON 1), jaký (PRON 3, DET 2), žádný (PRON 4, DET 2), který (PRON 449, DET 1), několik (DET 1, PRON 1)

The 10 most frequent ambiguous types: tato (DET 24, PRON 1), tuto (DET 11, ADV 7), tímto (DET 9, PRON 3), jehož (DET 6, PRON 5), takové (DET 5, PRON 1), kterým (PRON 25, DET 1), několika (PRON 1, DET 1), žádná (DET 1, PRON 1)

Morphology

The form / lemma ratio of DET is 4.307692 (the average of all parts of speech is 1.764161).

The 1st highest number of forms (14) was observed with the lemma “tento”: tato, tento, tohoto, tomto, tomuto, toto, touto, tuto, tyto, této, tímto, těchto, těmito, těmto.

The 2nd highest number of forms (8) was observed with the lemma “jeho”: jeho, jejich, její, jejích, jejího, jejím, jejími, jejímu.

The 3rd highest number of forms (8) was observed with the lemma “takový”: takovou, taková, takové, takovému, takový, takových, takovým, takovými.

DET occurs with 12 features: cs-feat/PronType (595; 100% instances), cs-feat/Number (414; 70% instances), cs-feat/Case (406; 68% instances), cs-feat/Gender (354; 59% instances), cs-feat/Poss (240; 40% instances), cs-feat/Number[psor] (216; 36% instances), cs-feat/Person (216; 36% instances), cs-feat/Gender[psor] (111; 19% instances), cs-feat/Animacy (30; 5% instances), cs-feat/Reflex (24; 4% instances), cs-feat/Negative (2; 0% instances), cs-feat/NumType (1; 0% instances)

DET occurs with 29 feature-value pairs: Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Gender=Fem, Gender=Masc, Gender=Masc,Neut, Gender=Neut, Gender[psor]=Fem, Gender[psor]=Masc,Neut, Negative=Neg, NumType=Card, Number=Plur, Number=Sing, Number[psor]=Plur, Number[psor]=Sing, Person=3, Poss=Yes, PronType=Dem, PronType=Dem,Ind, PronType=Ind, PronType=Int,Rel, PronType=Neg, PronType=Prs, PronType=Rel, Reflex=Yes

DET occurs with 57 feature combinations. The most frequent feature combination is Number[psor]=Plur|Person=3|Poss=Yes|PronType=Prs (93 tokens). Examples: jejich

Relations

DET nodes are attached to their parents using 3 different relations: cs-dep/det (593; 100% instances), cs-dep/acl (1; 0% instances), cs-dep/det:nummod (1; 0% instances)

Parents of DET nodes belong to 2 different parts of speech: NOUN (594; 100% instances), ADJ (1; 0% instances)

593 (100%) DET nodes are leaves.

1 (0%) DET nodes have one child.

0 (0%) DET nodes have two children.

1 (0%) DET nodes have three or more children.

The highest child degree of a DET node is 3.

Children of DET nodes are attached using 4 different relations: cs-dep/cop (1; 25% instances), cs-dep/nmod (1; 25% instances), cs-dep/nsubj (1; 25% instances), cs-dep/punct (1; 25% instances)

Children of DET nodes belong to 3 different parts of speech: NOUN (2; 50% instances), PUNCT (1; 25% instances), VERB (1; 25% instances)


DET in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]