NumType
: numeral type
In Slovenian UD Treebank, NumType
is a lexical feature of numerals and some adjectives that denote counting by numbers.
Card
: cardinal number
Examples
- en, dva, tri “one, two, three”
- 1, 2, 3
- I, II, III
Ord
: ordinal number
Examples
- prvi, drugi, tretji “first, second, third”
- 1., 2., 3.
- I., II., III.
Sets
: number of sets of things
Numerals used to count sets of things or nouns that are pluralia tantum.
Examples
- enoj, dvoj, troj “one-fold, two-fold, three-fold”
Gen
: generic numeral, i.e. a numeral that is neither of the above
Examples
- enojen, dvojen, trojen “single, double, triple”
Conversion from JOS
All numerals with Type=cardinal are converted to NumType=Card
and all numerals with Type=ordinal are converted to NumType=Ord
. Numerals with Type=pronominal are either converted to NumType=Card
(lemmas en and eden) or to NumType=Ord
(lemma drug). Numerals with Type=special are either converted to NumType=Sets
(lemmas not ending in -en) or to NumType=Gen
(lemmas ending in -en).
Note that other types of quantifying words have not been explicitly marked in JOS, so assigning these and other NumType
values to other words or part-of-speech categories, such as adjectives (enkraten, dvakraten, trikraten), adverbs (enkrat, dvakrat, trikrat; prvič, drugič, tretjič), determiners (veliko, malo, nekaj, koliko) and nouns (tretjina, polovica, četrtina), remains for future work.
Treebank Statistics (UD_Slovenian)
This feature is universal.
It occurs with 4 different values: Card
, Gen
, Ord
, Sets
.
2241 tokens (2%) have a non-empty value of NumType
.
623 types (2%) occur at least once with a non-empty value of NumType
.
509 lemmas (3%) occur at least once with a non-empty value of NumType
.
The feature is used with 2 part-of-speech tags: sl-pos/NUM (1927; 1% instances), sl-pos/ADJ (314; 0% instances).
NUM
1927 sl-pos/NUM tokens (100% of all NUM
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which NUM
and NumType
co-occurred: Gender=EMPTY (1441; 75%), Case=EMPTY (1187; 62%), Number=EMPTY (1187; 62%), NumForm=Digit (1166; 61%).
NUM
tokens may have the following values of NumType
:
Card
(1665; 86% of non-emptyNumType
): eno, tri, dveh, dva, ena, eden, tisoč, štiri, štirih, dveOrd
(257; 13% of non-emptyNumType
): 1., 20., 18., 9., 14., 17., 19., 3., 6., 15.Sets
(5; 0% of non-emptyNumType
): dvoje, tisočerih, troje
NumType
seems to be lexical feature of NUM
. 100% lemmas (485) occur only with one value of NumType
.
ADJ
314 sl-pos/ADJ tokens (2% of all ADJ
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which ADJ
and NumType
co-occurred: VerbForm=EMPTY (314; 100%), Definite=EMPTY (314; 100%), Degree=EMPTY (314; 100%), Number=Sing (251; 80%).
ADJ
tokens may have the following values of NumType
:
Gen
(4; 1% of non-emptyNumType
): dvojnega, dvojnim, dvojno, trojnimOrd
(310; 99% of non-emptyNumType
): prvi, prva, prvo, prve, prvem, prvih, prvega, tretji, tretje, prvimEMPTY
(14713): drugi, mogoče, druge, sam, novo, drugih, nove, različnih, slovenski, veliko
NumType
seems to be lexical feature of ADJ
. 100% lemmas (24) occur only with one value of NumType
.
Relations with Agreement in NumType
The 10 most frequent relations where parent and child node agree in NumType
:
NUM –[conj]–> NUM (94; 100%),
NUM –[compound]–> NUM (31; 91%).
Treebank Statistics (UD_Slovenian-SST)
This feature is universal.
It occurs with 4 different values: Card
, Gen
, Ord
, Sets
.
586 tokens (2%) have a non-empty value of NumType
.
121 types (2%) occur at least once with a non-empty value of NumType
.
76 lemmas (2%) occur at least once with a non-empty value of NumType
.
The feature is used with 2 part-of-speech tags: sl-pos/NUM (499; 2% instances), sl-pos/ADJ (87; 0% instances).
NUM
499 sl-pos/NUM tokens (100% of all NUM
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which NUM
and NumType
co-occurred: NumForm=Word (499; 100%), Number=Plur (287; 58%).
NUM
tokens may have the following values of NumType
:
Card
(498; 100% of non-emptyNumType
): eno, dva, en, ena, tri, tisoč, dvajset, dve, pet, enegaSets
(1; 0% of non-emptyNumType
): dvoje
NumType
seems to be lexical feature of NUM
. 100% lemmas (53) occur only with one value of NumType
.
ADJ
87 sl-pos/ADJ tokens (5% of all ADJ
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which ADJ
and NumType
co-occurred: Degree=EMPTY (87; 100%), VerbForm=EMPTY (87; 100%), Definite=EMPTY (85; 98%), Number=Sing (82; 94%).
ADJ
tokens may have the following values of NumType
:
Gen
(3; 3% of non-emptyNumType
): dvojni, dvojno, trojniOrd
(84; 97% of non-emptyNumType
): prvi, prvo, prva, tretjo, prvega, devetindvajseti, peta, tretja, tretji, tridesetiEMPTY
(1578): dobro, drugo, drugi, dober, zanimivo, druga, drugega, glavnem, lep, lepa
NumType
seems to be lexical feature of ADJ
. 100% lemmas (23) occur only with one value of NumType
.
Relations with Agreement in NumType
The 10 most frequent relations where parent and child node agree in NumType
:
NUM –[compound]–> NUM (48; 100%),
NUM –[conj]–> NUM (29; 100%),
ADJ –[conj]–> ADJ (5; 56%),
NUM –[reparandum]–> NUM (4; 100%),
NUM –[mwe]–> NUM (4; 100%),
ADJ –[reparandum]–> ADJ (2; 100%),
NUM –[nummod]–> NUM (1; 100%),
NUM –[advmod]–> NUM (1; 100%).
NumType in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]