DET

home bg/pos edit page issue tracker

`DET`: determiner

Definition

Determiners are words that modify nouns or noun phrases and express the reference of the noun phrase in context. That is, a determiner may indicate whether the noun is referring to a definite or indefinite element of a class, to a closer or more distant element, to an element belonging to a specified person or thing, to a particular number or quantity, etc.

In Bulgarian the definite article is part of the word, thus it is not considered as determiner.

However, the following pronouns are mapped to detereminers:

demonstratives: Pda#, Pde#
relatives: Pra#, Pre#, Prp#
collectives: Pca#, Pce#
interrogatives: Pia#, Pie#, Piy#, Pip#
indefinites: Pfa#, Pfe#, Pfp#
negatives: Pna#, Pne#, Pnp#
possessives: Ps@l

Note that the attributive usages (#a#) and possessive attributive usages (#p#) go directly into DET category, while entities (#e#) can be either determiners or pronouns. The possessive pronouns (Ps#) are mapped with only their long forms (#l#). The short forms are clitics and will be treated differently.

Examples

possessive determiners: мой / moy “my”, твой / tvoy “your”
demonstrative determiners: тази / _tazi__ “this” as in _Вчера видях тази кола / Vchera vidyah tazi kola “I saw this car yesterday.”
interrogative determiners: какъв / kakav “which.MASC.SG”
relative determiners: какъвто / kakavto “which.MASC.SG”
indefinite determiners: някакъв / nyakakav “some.MASC.SG”
collective determiners: всякакъв / vsyakakav “any.MASC.SG”
negative determiners: никакъв / nikakav “no.MASC.SG”

Note that the symbol `#’, used in the Universal POS section indicates a holder for arbitrary number of features, suppressed in the respective tag as irrelevant in the BulTreeBank tagset, when mapped to the Universal one.

The symbol `@’ marks the suppresion with one feature in the tag.

Treebank Statistics (UD_Bulgarian)

There are 23 DET lemmas (0%), 141 DET types (1%) and 2433 DET tokens (2%). Out of 16 observed tags, the rank of DET is: 11 in number of lemmas, 7 in number of types and 11 in number of tokens.

The 10 most frequent DET lemmas: този, всеки, един, какъв, наш, мой, свой, такъв, някой, никакъв

The 10 most frequent DET types: тази, този, тези, това, всички, един, какво, една, всеки, всяка

The 10 most frequent ambiguous lemmas: този (DET 793, PRON 540), всеки (DET 292, PRON 112), един (DET 250, NUM 225, PRON 6), наш (DET 184, PRON 164), мой (PRON 378, DET 162), свой (PRON 623, DET 123), някой (PRON 92, DET 90), ваш (DET 35, PRON 33), какъвто (DET 22, PRON 10), кой (PRON 96, DET 20, PROPN 1)

The 10 most frequent ambiguous types: това (PRON 288, DET 131), всички (DET 129, PRON 30), един (DET 88, NUM 59, PRON 1), една (DET 80, NUM 53), всеки (DET 47, PRON 6), едно (DET 42, NUM 40, PRON 2), някои (DET 36, PRON 6), някой (PRON 17, DET 12), каквото (PRON 9, DET 8), кой (PRON 31, DET 8)

това
- PRON 288: Чух го да казва това .
- DET 131: Следствието от това , че е заминал , е очевидно .
всички
- DET 129: - Аз съм плебей по рождение и всички дрипльовци са мои братя !
- PRON 30: Това , че всички сме прави , поставя въпроса как можем да живеем заедно .
един
- DET 88: Баща ми е само един книжен плъх .
- NUM 59: Но този проблем е само един етап .
- PRON 1: Не искам да давам никакви прогнози , колко медали ще спечелим , защото в Атланта сгреших с един
една
- DET 80: Почакаха , докато настрана , на една могила , направиха чадъра на Индже .
- NUM 53: След една седмица между рая и ада реши , че няма право да се предава .
всеки
- DET 47: Престанете всеки път да правите скандали .
- PRON 6: Освен това всеки говори в името на едно съгласие .
едно
- DET 42: Да бях по-млад , щях да отида в гората да ти уловя едно славейче .
- NUM 40: Другите актове на Народното събрание се приемат с едно гласуване .
- PRON 2: Но се оказало , че ректорският съвет си има едно наум .
някои
- DET 36: Представени са също така някои икономически оценки и прогнози .
- PRON 6: Освен това някои са втора употреба .
някой
- PRON 17: Тук дали няма някой да ни подслушва ?
- DET 12: Не може да не сме се разминали на някой светофар .
каквото
- PRON 9: Вземи , каквото искаш .
- DET 8: Няма да пропадна в света каквото и да ми се случи .
кой
- PRON 31: Ще ти кажа кой да остане .
- DET 8: За кой българин Балканът не е живо същество ?

Morphology

The form / lemma ratio of DET is 6.130435 (the average of all parts of speech is 1.728233).

The 1st highest number of forms (27) was observed with the lemma “мой”: Моят, мое, моето, мои, моите, мой, моя, моята, негов, негова, неговата, негови, неговите, неговия, неговият, негово, неговото, неин, нейна, нейната, нейни, нейните, нейния, нейният, нейното, твое, твоите.

The 2nd highest number of forms (18) was observed with the lemma “наш”: наш, наша, нашата, наше, нашето, наши, нашите, нашия, нашият, техен, техни, техните, техния, техният, тяхна, тяхната, тяхно, тяхното.

The 3rd highest number of forms (15) was observed with the lemma “този”: онази, онези, онзи, ония, онова, оня, тeзи, тази, тая, тези, тия, това, този, тоя, туй.

DET occurs with 9 features: bg-feat/Number (2393; 98% instances), bg-feat/PronType (2392; 98% instances), bg-feat/Gender (1692; 70% instances), bg-feat/Definite (778; 32% instances), bg-feat/Poss (500; 21% instances), bg-feat/Person (378; 16% instances), bg-feat/Reflex (122; 5% instances), bg-feat/Case (4; 0% instances), bg-feat/Animacy (1; 0% instances)

DET occurs with 22 feature-value pairs: Animacy=Anim, Case=Acc, Case=Nom, Definite=Def, Definite=Ind, Gender=Fem, Gender=Masc, Gender=Neut, Number=Plur, Number=Sing, Person=1, Person=2, Person=3, Poss=Yes, PronType=Dem, PronType=Ind, PronType=Int, PronType=Neg, PronType=Prs, PronType=Rel, PronType=Tot, Reflex=Yes

DET occurs with 70 feature combinations. The most frequent feature combination is Gender=Fem|Number=Sing|PronType=Dem (237 tokens). Examples: тази, такава, тая, онази, тeзи

Relations

DET nodes are attached to their parents using 11 different relations: bg-dep/det (2017; 83% instances), bg-dep/dobj (184; 8% instances), bg-dep/nsubj (110; 5% instances), bg-dep/nmod (45; 2% instances), bg-dep/iobj (37; 2% instances), bg-dep/root (13; 1% instances), bg-dep/nsubjpass (12; 0% instances), bg-dep/conj (11; 0% instances), bg-dep/ccomp (2; 0% instances), bg-dep/discourse (1; 0% instances), bg-dep/vocative (1; 0% instances)

Parents of DET nodes belong to 9 different parts of speech: NOUN (2043; 84% instances), VERB (346; 14% instances), PROPN (16; 1% instances), ROOT (13; 1% instances), ADJ (6; 0% instances), PRON (4; 0% instances), ADV (3; 0% instances), DET (1; 0% instances), NUM (1; 0% instances)

2135 (88%) DET nodes are leaves.

189 (8%) DET nodes have one child.

85 (3%) DET nodes have two children.

24 (1%) DET nodes have three or more children.

The highest child degree of a DET node is 6.

Children of DET nodes are attached using 16 different relations: bg-dep/acl (118; 27% instances), bg-dep/case (93; 21% instances), bg-dep/nmod (88; 20% instances), bg-dep/advmod (38; 9% instances), bg-dep/punct (24; 5% instances), bg-dep/mwe (17; 4% instances), bg-dep/cc (15; 3% instances), bg-dep/discourse (11; 2% instances), bg-dep/conj (9; 2% instances), bg-dep/neg (8; 2% instances), bg-dep/nsubj (7; 2% instances), bg-dep/cop (6; 1% instances), bg-dep/det (6; 1% instances), bg-dep/amod (1; 0% instances), bg-dep/expl (1; 0% instances), bg-dep/iobj (1; 0% instances)

Children of DET nodes belong to 13 different parts of speech: VERB (122; 28% instances), ADP (93; 21% instances), NOUN (77; 17% instances), ADV (40; 9% instances), PUNCT (24; 5% instances), PART (23; 5% instances), CONJ (21; 5% instances), ADJ (19; 4% instances), PRON (17; 4% instances), PROPN (3; 1% instances), INTJ (2; 0% instances), DET (1; 0% instances), NUM (1; 0% instances)