home cs/pos edit page issue tracker

PRON: pronoun

Definition

Pronouns are words that substitute for nouns or noun phrases, whose meaning is recoverable from the linguistic or extralinguistic context.

Pronouns under this definition function like nouns. Note that Czech grammar traditionally extends the term pronoun to words that substitute for adjectives. Such words are not tagged PRON under our universal scheme. They are tagged as determiners in order to annotate the same thing same way across languages.

For instance, tohle  “this” is traditionally called pronoun in Czech grammar, regardless of context (the notion of determiners does not exist in Czech grammar). To make the annotation parallel across languages, it should be now tagged PRON in Tohle jsem viděl včera.  “I saw this yesterday.” and DET in Tohle auto jsem viděl včera.  “I saw this car yesterday.”

Examples

References


Treebank Statistics (UD_Czech)

There are 108 PRON lemmas (0%), 415 PRON types (0%) and 72419 PRON tokens (5%). Out of 17 observed tags, the rank of PRON is: 8 in number of lemmas, 7 in number of types and 8 in number of tokens.

The 10 most frequent PRON lemmas: se, ten, který, on, já, všechen, jenž, co, kdo, což

The 10 most frequent PRON types: se, to, si, které, který, která, co, tím, kteří, tom

The 10 most frequent ambiguous lemmas: ten (PRON 11968, DET 1312), který (PRON 10604, DET 143), on (PRON 7262, ADP 9, PART 1), (PRON 3373, NOUN 1), jenž (PRON 2211, DET 648), co (PRON 1859, ADV 239, SCONJ 210, PART 21), což (PRON 748, INTJ 3, PART 1), něco (PRON 541, DET 1), jaký (DET 391, PRON 337), někdo (PRON 321, DET 3)

The 10 most frequent ambiguous types: se (PRON 21370, ADP 1901), to (PRON 5916, DET 101, PART 30, ADP 5), si (PRON 3737, AUX 1, VERB 1), které (PRON 3205, DET 43), který (PRON 2886, DET 20), která (PRON 1993, DET 11), co (PRON 1187, ADV 233, SCONJ 207, PART 7), tím (PRON 1059, DET 55), kteří (PRON 1156, DET 6), tom (PRON 1023, DET 66, PROPN 5)

Morphology

The form / lemma ratio of PRON is 3.842593 (the average of all parts of speech is 2.195950).

The 1st highest number of forms (28) was observed with the lemma “on”: ho, je, jeho, jej, jemu, ji, jich, jim, jimi, jí, jím, mu, ni, nich, nim, nimi, ní, ním, ně, něho, něj, něm, němu, on, ona, oni, ono, ony.

The 2nd highest number of forms (25) was observed with the lemma “jenž”: jehož, jejichž, jejímž, jejíž, jejž, jemuž, jenž, jež, jichž, jimiž, jimž, již, jímž, jíž, nichž, nimiž, nimž, niž, nímž, níž, něhož, nějž, němuž, němž, něž.

The 3rd highest number of forms (19) was observed with the lemma “můj”: moje, mou, má, mého, mému, mých, mým, můj, naše, našeho, našem, našemu, naši, našich, našim, našimi, naší, naším, náš.

PRON occurs with 18 features: cs-feat/PronType (72419; 100% instances), cs-feat/Case (72280; 100% instances), cs-feat/Number (40776; 56% instances), cs-feat/Gender (33608; 46% instances), cs-feat/Variant (27181; 38% instances), cs-feat/Reflex (25901; 36% instances), cs-feat/Person (11269; 16% instances), cs-feat/Animacy (7234; 10% instances), cs-feat/PrepCase (4926; 7% instances), cs-feat/Negative (1101; 2% instances), cs-feat/Style (316; 0% instances), cs-feat/NumType (295; 0% instances), cs-feat/Poss (254; 0% instances), cs-feat/Number[psor] (137; 0% instances), cs-feat/Foreign (79; 0% instances), cs-feat/Gender[psor] (40; 0% instances), cs-feat/NameType (15; 0% instances), cs-feat/Abbr (6; 0% instances)

PRON occurs with 46 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Foreign=Foreign, Gender=Fem, Gender=Fem,Neut, Gender=Masc, Gender=Masc,Neut, Gender=Neut, Gender[psor]=Fem, Gender[psor]=Masc,Neut, NameType=Com, NameType=Oth, NameType=Pro, Negative=Neg, NumType=Card, NumType=Ord, Number=Plur, Number=Sing, Number[psor]=Plur, Number[psor]=Sing, Person=1, Person=2, Person=3, Poss=Yes, PrepCase=Npr, PrepCase=Pre, PronType=Dem, PronType=Dem,Ind, PronType=Ind, PronType=Int,Rel, PronType=Neg, PronType=Prs, PronType=Rel, PronType=Tot, Reflex=Yes, Style=Arch, Style=Coll, Variant=Short

PRON occurs with 414 feature combinations. The most frequent feature combination is Case=Acc|PronType=Prs|Reflex=Yes|Variant=Short (21419 tokens). Examples: se, sa

Relations

PRON nodes are attached to their parents using 25 different relations: cs-dep/expl (17180; 24% instances), cs-dep/nsubj (15925; 22% instances), cs-dep/dobj (14338; 20% instances), cs-dep/nmod (9360; 13% instances), cs-dep/auxpass:reflex (4906; 7% instances), cs-dep/advmod (2807; 4% instances), cs-dep/iobj (2658; 4% instances), cs-dep/nsubjpass (1019; 1% instances), cs-dep/xcomp (841; 1% instances), cs-dep/conj (839; 1% instances), cs-dep/root (556; 1% instances), cs-dep/cc (412; 1% instances), cs-dep/dep (369; 1% instances), cs-dep/amod (341; 0% instances), cs-dep/discourse (319; 0% instances), cs-dep/appos (151; 0% instances), cs-dep/acl (126; 0% instances), cs-dep/advcl (93; 0% instances), cs-dep/ccomp (89; 0% instances), cs-dep/foreign (52; 0% instances), cs-dep/parataxis (20; 0% instances), cs-dep/mark (9; 0% instances), cs-dep/csubj (6; 0% instances), cs-dep/csubjpass (2; 0% instances), cs-dep/mwe (1; 0% instances)

Parents of PRON nodes belong to 15 different parts of speech: VERB (60614; 84% instances), NOUN (6319; 9% instances), ADJ (2807; 4% instances), PRON (857; 1% instances), ADV (589; 1% instances), ROOT (556; 1% instances), NUM (341; 0% instances), PROPN (198; 0% instances), DET (67; 0% instances), CONJ (23; 0% instances), PART (22; 0% instances), SYM (17; 0% instances), INTJ (4; 0% instances), ADP (3; 0% instances), SCONJ (2; 0% instances)

56496 (78%) PRON nodes are leaves.

11792 (16%) PRON nodes have one child.

2715 (4%) PRON nodes have two children.

1416 (2%) PRON nodes have three or more children.

The highest child degree of a PRON node is 24.

Children of PRON nodes are attached using 32 different relations: cs-dep/case (10593; 46% instances), cs-dep/acl (3251; 14% instances), cs-dep/punct (1657; 7% instances), cs-dep/advmod:emph (983; 4% instances), cs-dep/amod (829; 4% instances), cs-dep/cc (817; 4% instances), cs-dep/nmod (772; 3% instances), cs-dep/cop (735; 3% instances), cs-dep/conj (633; 3% instances), cs-dep/nsubj (618; 3% instances), cs-dep/xcomp (572; 2% instances), cs-dep/advmod (288; 1% instances), cs-dep/dep (268; 1% instances), cs-dep/mark (228; 1% instances), cs-dep/appos (223; 1% instances), cs-dep/nummod:gov (132; 1% instances), cs-dep/advcl (86; 0% instances), cs-dep/det (75; 0% instances), cs-dep/foreign (39; 0% instances), cs-dep/neg (37; 0% instances), cs-dep/nummod (29; 0% instances), cs-dep/ccomp (27; 0% instances), cs-dep/discourse (23; 0% instances), cs-dep/csubj (22; 0% instances), cs-dep/aux (19; 0% instances), cs-dep/det:numgov (18; 0% instances), cs-dep/mwe (10; 0% instances), cs-dep/parataxis (9; 0% instances), cs-dep/dobj (7; 0% instances), cs-dep/det:nummod (2; 0% instances), cs-dep/auxpass:reflex (1; 0% instances), cs-dep/vocative (1; 0% instances)

Children of PRON nodes belong to 16 different parts of speech: ADP (10551; 46% instances), VERB (3984; 17% instances), NOUN (1719; 7% instances), PUNCT (1657; 7% instances), ADJ (1199; 5% instances), CONJ (1137; 5% instances), ADV (930; 4% instances), PRON (857; 4% instances), PART (253; 1% instances), SCONJ (225; 1% instances), NUM (195; 1% instances), PROPN (178; 1% instances), DET (97; 0% instances), AUX (19; 0% instances), SYM (2; 0% instances), INTJ (1; 0% instances)


Treebank Statistics (UD_Czech-CAC)

There are 74 PRON lemmas (0%), 310 PRON types (0%) and 24389 PRON tokens (5%). Out of 16 observed tags, the rank of PRON is: 6 in number of lemmas, 6 in number of types and 7 in number of tokens.

The 10 most frequent PRON lemmas: se, ten, který, on, všechno, já, jenž, co, sám, což

The 10 most frequent PRON types: se, to, které, si, který, která, tím, co, všech, všechny

The 10 most frequent ambiguous lemmas: se (PRON 9039, ADP 1), ten (PRON 3818, DET 537), který (PRON 3557, DET 58), (PRON 995, NOUN 1), jenž (PRON 840, DET 339), co (PRON 525, ADV 166, SCONJ 16, PART 3, ADJ 1), nízko (PRON 138, ADV 8), nic (PRON 116, DET 1), jaký (DET 150, PRON 113), některý (DET 505, PRON 98)

The 10 most frequent ambiguous types: se (PRON 7715, ADP 601), to (PRON 1856, DET 39, PART 11), které (PRON 1407, DET 27), si (PRON 996, AUX 1), který (PRON 728, DET 6), která (PRON 638, DET 5), tím (PRON 404, DET 16), co (PRON 382, ADV 160, SCONJ 15, PART 1, ADJ 1), je (VERB 4673, AUX 482, PRON 356), tomu (PRON 305, DET 13)

Morphology

The form / lemma ratio of PRON is 4.189189 (the average of all parts of speech is 2.206260).

The 1st highest number of forms (28) was observed with the lemma “on”: ho, je, jeho, jej, jemu, ji, jich, jim, jimi, jí, jím, mu, ni, nich, nim, nimi, ní, ním, ně, něho, něj, něm, němu, on, ona, oni, ono, ony.

The 2nd highest number of forms (24) was observed with the lemma “jenž”: jehož, jejichž, jejíž, jejž, jemuž, jenž, jež, jichž, jimiž, jimž, již, jímž, jíž, nichž, nimiž, nimž, niž, nímž, níž, něhož, nějž, němuž, němž, něž.

The 3rd highest number of forms (16) was observed with the lemma “ten”: ta, ten, ti, to, toho, tom, tomu, tou, tu, ty, té, tím, těch, těm, těma, těmi.

PRON occurs with 17 features: cs-feat/PronType (24389; 100% instances), cs-feat/Case (24253; 99% instances), cs-feat/Number (14022; 57% instances), cs-feat/Gender (10869; 45% instances), cs-feat/Variant (9195; 38% instances), cs-feat/Reflex (9066; 37% instances), cs-feat/Person (3500; 14% instances), cs-feat/Animacy (2039; 8% instances), cs-feat/PrepCase (1809; 7% instances), cs-feat/Negative (184; 1% instances), cs-feat/Style (111; 0% instances), cs-feat/NumType (82; 0% instances), cs-feat/Poss (72; 0% instances), cs-feat/Number[psor] (45; 0% instances), cs-feat/Gender[psor] (15; 0% instances), cs-feat/Foreign (13; 0% instances), cs-feat/NameType (1; 0% instances)

PRON occurs with 42 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Foreign=Foreign, Gender=Fem, Gender=Fem,Neut, Gender=Masc, Gender=Masc,Neut, Gender=Neut, Gender[psor]=Fem, Gender[psor]=Masc,Neut, NameType=Com, Negative=Neg, NumType=Card, Number=Dual, Number=Plur, Number=Sing, Number[psor]=Plur, Number[psor]=Sing, Person=1, Person=2, Person=3, Poss=Yes, PrepCase=Npr, PrepCase=Pre, PronType=Dem, PronType=Dem,Ind, PronType=Ind, PronType=Int,Rel, PronType=Neg, PronType=Prs, PronType=Rel, PronType=Tot, Reflex=Yes, Style=Arch, Variant=Short

PRON occurs with 312 feature combinations. The most frequent feature combination is Case=Acc|PronType=Prs|Reflex=Yes|Variant=Short (7709 tokens). Examples: se

Relations

PRON nodes are attached to their parents using 27 different relations: cs-dep/expl (6066; 25% instances), cs-dep/nsubj (4715; 19% instances), cs-dep/dobj (3908; 16% instances), cs-dep/nmod (3762; 15% instances), cs-dep/auxpass:reflex (2090; 9% instances), cs-dep/advmod (1130; 5% instances), cs-dep/iobj (603; 2% instances), cs-dep/xcomp (349; 1% instances), cs-dep/nsubjpass (330; 1% instances), cs-dep/conj (322; 1% instances), cs-dep/root (305; 1% instances), cs-dep/cc (297; 1% instances), cs-dep/dep (132; 1% instances), cs-dep/amod (105; 0% instances), cs-dep/discourse (83; 0% instances), cs-dep/appos (66; 0% instances), cs-dep/acl (39; 0% instances), cs-dep/advcl (35; 0% instances), cs-dep/ccomp (16; 0% instances), cs-dep/parataxis (14; 0% instances), cs-dep/foreign (8; 0% instances), cs-dep/mark (5; 0% instances), cs-dep/csubj (3; 0% instances), cs-dep/csubjpass (3; 0% instances), cs-dep/advmod:emph (1; 0% instances), cs-dep/aux (1; 0% instances), cs-dep/vocative (1; 0% instances)

Parents of PRON nodes belong to 15 different parts of speech: VERB (19560; 80% instances), NOUN (2598; 11% instances), ADJ (1140; 5% instances), ROOT (305; 1% instances), PRON (293; 1% instances), ADV (214; 1% instances), SYM (96; 0% instances), NUM (95; 0% instances), DET (39; 0% instances), PROPN (25; 0% instances), CONJ (10; 0% instances), PART (7; 0% instances), SCONJ (3; 0% instances), ADP (2; 0% instances), AUX (2; 0% instances)

18825 (77%) PRON nodes are leaves.

4146 (17%) PRON nodes have one child.

836 (3%) PRON nodes have two children.

582 (2%) PRON nodes have three or more children.

The highest child degree of a PRON node is 13.

Children of PRON nodes are attached using 28 different relations: cs-dep/case (3647; 44% instances), cs-dep/acl (1017; 12% instances), cs-dep/punct (625; 8% instances), cs-dep/cc (452; 5% instances), cs-dep/cop (394; 5% instances), cs-dep/nsubj (350; 4% instances), cs-dep/advmod:emph (297; 4% instances), cs-dep/nmod (286; 3% instances), cs-dep/conj (260; 3% instances), cs-dep/xcomp (214; 3% instances), cs-dep/amod (180; 2% instances), cs-dep/advmod (145; 2% instances), cs-dep/mark (78; 1% instances), cs-dep/dep (68; 1% instances), cs-dep/appos (52; 1% instances), cs-dep/advcl (45; 1% instances), cs-dep/nummod:gov (31; 0% instances), cs-dep/det (19; 0% instances), cs-dep/neg (10; 0% instances), cs-dep/parataxis (10; 0% instances), cs-dep/det:numgov (9; 0% instances), cs-dep/aux (7; 0% instances), cs-dep/nummod (7; 0% instances), cs-dep/csubj (5; 0% instances), cs-dep/discourse (5; 0% instances), cs-dep/dobj (4; 0% instances), cs-dep/foreign (2; 0% instances), cs-dep/vocative (1; 0% instances)

Children of PRON nodes belong to 15 different parts of speech: ADP (3622; 44% instances), VERB (1419; 17% instances), NOUN (781; 10% instances), PUNCT (625; 8% instances), CONJ (508; 6% instances), ADV (347; 4% instances), ADJ (300; 4% instances), PRON (293; 4% instances), PART (98; 1% instances), SCONJ (86; 1% instances), NUM (47; 1% instances), PROPN (36; 0% instances), DET (30; 0% instances), SYM (20; 0% instances), AUX (8; 0% instances)


Treebank Statistics (UD_Czech-CLTT)

There are 15 PRON lemmas (1%), 72 PRON types (2%) and 1212 PRON tokens (3%). Out of 15 observed tags, the rank of PRON is: 10 in number of lemmas, 7 in number of types and 8 in number of tokens.

The 10 most frequent PRON lemmas: se, který, ten, jenž, on, všechen, veškerý, tento, sám, žádný

The 10 most frequent PRON types: se, které, která, který, to, kterých, kterým, kterém, kterému, nichž

The 10 most frequent ambiguous lemmas: který (PRON 449, DET 1), jenž (PRON 73, DET 21), tento (DET 315, PRON 5), žádný (PRON 4, DET 2), jaký (PRON 3, DET 2), několik (DET 1, PRON 1), některý (DET 6, PRON 1), takový (DET 21, PRON 1)

The 10 most frequent ambiguous types: se (PRON 467, ADP 34), kterým (PRON 25, DET 1), je (VERB 185, AUX 14, PRON 11), jehož (DET 6, PRON 5), tímto (DET 9, PRON 3), několika (PRON 1, DET 1), t (PRON 1, NOUN 1), takové (DET 5, PRON 1), tato (DET 24, PRON 1), žádná (DET 1, PRON 1)

Morphology

The form / lemma ratio of PRON is 4.800000 (the average of all parts of speech is 1.764161).

The 1st highest number of forms (15) was observed with the lemma “on”: ho, je, jej, jemu, ji, jim, jimi, jí, nich, nim, nimi, ní, ním, ně, něj.

The 2nd highest number of forms (14) was observed with the lemma “jenž”: jehož, jenž, jež, jimiž, jímž, nichž, nimž, niž, nímž, níž, nějž, němuž, němž, něž.

The 3rd highest number of forms (10) was observed with the lemma “který”: kterou, která, které, kterého, kterém, kterému, který, kterých, kterým, kterými.

PRON occurs with 13 features: cs-feat/PronType (1212; 100% instances), cs-feat/Case (1211; 100% instances), cs-feat/Number (735; 61% instances), cs-feat/Gender (610; 50% instances), cs-feat/Reflex (475; 39% instances), cs-feat/Variant (469; 39% instances), cs-feat/PrepCase (93; 8% instances), cs-feat/Person (71; 6% instances), cs-feat/Animacy (57; 5% instances), cs-feat/Style (12; 1% instances), cs-feat/Negative (4; 0% instances), cs-feat/Abbr (1; 0% instances), cs-feat/NumType (1; 0% instances)

PRON occurs with 31 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Gender=Fem, Gender=Masc, Gender=Masc,Neut, Gender=Neut, Negative=Neg, NumType=Card, Number=Plur, Number=Sing, Person=3, PrepCase=Npr, PrepCase=Pre, PronType=Dem, PronType=Dem,Ind, PronType=Ind, PronType=Int,Rel, PronType=Neg, PronType=Prs, PronType=Rel, PronType=Tot, Reflex=Yes, Style=Arch, Variant=Short

PRON occurs with 92 feature combinations. The most frequent feature combination is Case=Acc|PronType=Prs|Reflex=Yes|Variant=Short (467 tokens). Examples: se

Relations

PRON nodes are attached to their parents using 14 different relations: cs-dep/auxpass:reflex (353; 29% instances), cs-dep/nsubj (267; 22% instances), cs-dep/nmod (132; 11% instances), cs-dep/dobj (127; 10% instances), cs-dep/advmod (122; 10% instances), cs-dep/expl (113; 9% instances), cs-dep/cc (35; 3% instances), cs-dep/nsubjpass (32; 3% instances), cs-dep/conj (10; 1% instances), cs-dep/iobj (8; 1% instances), cs-dep/acl (4; 0% instances), cs-dep/amod (4; 0% instances), cs-dep/xcomp (4; 0% instances), cs-dep/dep (1; 0% instances)

Parents of PRON nodes belong to 7 different parts of speech: VERB (952; 79% instances), NOUN (166; 14% instances), ADJ (71; 6% instances), X (10; 1% instances), ADV (6; 0% instances), PRON (6; 0% instances), NUM (1; 0% instances)

950 (78%) PRON nodes are leaves.

235 (19%) PRON nodes have one child.

18 (1%) PRON nodes have two children.

9 (1%) PRON nodes have three or more children.

The highest child degree of a PRON node is 5.

Children of PRON nodes are attached using 12 different relations: cs-dep/case (200; 66% instances), cs-dep/cc (41; 14% instances), cs-dep/acl (18; 6% instances), cs-dep/nmod (9; 3% instances), cs-dep/punct (8; 3% instances), cs-dep/xcomp (8; 3% instances), cs-dep/conj (7; 2% instances), cs-dep/cop (4; 1% instances), cs-dep/nsubj (4; 1% instances), cs-dep/advmod (1; 0% instances), cs-dep/advmod:emph (1; 0% instances), cs-dep/ccomp (1; 0% instances)

Children of PRON nodes belong to 8 different parts of speech: ADP (200; 66% instances), CONJ (40; 13% instances), NOUN (23; 8% instances), VERB (22; 7% instances), PUNCT (8; 3% instances), PRON (6; 2% instances), ADV (2; 1% instances), ADJ (1; 0% instances)


PRON in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]