home sl/pos edit page issue tracker

PUNCT: punctuation

Definition

Punctuation marks are non-alphabetical characters and character groups used to delimit linguistic units in printed text, including bullets in itemized lists.

Examples

Conversion from JOS

The list of characters in ssj500k treebank has been manually divided into subgroups of PUNCT and SYM. Note that some characters display characteristics of both POS categories, such as asterisk or dash-like characters that can either function as mathematical operators (SYM) or bullets in itemized lists (PUNCT). In case of such ambiguity, the more common function was chosen.


Treebank Statistics (UD_Slovenian)

There are 21 PUNCT lemmas (0%), 21 PUNCT types (0%) and 18555 PUNCT tokens (13%). Out of 16 observed tags, the rank of PUNCT is: 14 in number of lemmas, 15 in number of types and 2 in number of tokens.

The 10 most frequent PUNCT lemmas: ,, ., “, -, (, ), ?, », :, «

The 10 most frequent PUNCT types: ,, ., “, -, (, ), ?, », :, «

The 10 most frequent ambiguous lemmas:

The 10 most frequent ambiguous types:

Morphology

The form / lemma ratio of PUNCT is 1.000000 (the average of all parts of speech is 1.894262).

The 1st highest number of forms (1) was observed with the lemma “!”: !.

The 2nd highest number of forms (1) was observed with the lemma “””: .

The 3rd highest number of forms (1) was observed with the lemma “’”: .

PUNCT does not occur with any features.

Relations

PUNCT nodes are attached to their parents using 2 different relations: sl-dep/punct (18553; 100% instances), sl-dep/root (2; 0% instances)

Parents of PUNCT nodes belong to 14 different parts of speech: VERB (12871; 69% instances), ADJ (2356; 13% instances), NOUN (2347; 13% instances), PROPN (381; 2% instances), NUM (178; 1% instances), ADV (120; 1% instances), X (114; 1% instances), PRON (84; 0% instances), PART (71; 0% instances), INTJ (20; 0% instances), CONJ (7; 0% instances), ADP (3; 0% instances), ROOT (2; 0% instances), PUNCT (1; 0% instances)

18551 (100%) PUNCT nodes are leaves.

4 (0%) PUNCT nodes have one child.

The highest child degree of a PUNCT node is 1.

Children of PUNCT nodes are attached using 3 different relations: sl-dep/case (2; 50% instances), sl-dep/nmod (1; 25% instances), sl-dep/punct (1; 25% instances)

Children of PUNCT nodes belong to 3 different parts of speech: ADP (2; 50% instances), PUNCT (1; 25% instances), X (1; 25% instances)


Treebank Statistics (UD_Slovenian-SST)

There are 3 PUNCT lemmas (0%), 3 PUNCT types (0%) and 542 PUNCT tokens (2%). Out of 16 observed tags, the rank of PUNCT is: 15 in number of lemmas, 16 in number of types and 15 in number of tokens.

The 10 most frequent PUNCT lemmas: ?, …, !

The 10 most frequent PUNCT types: ?, …, !

The 10 most frequent ambiguous lemmas:

The 10 most frequent ambiguous types:

Morphology

The form / lemma ratio of PUNCT is 1.000000 (the average of all parts of speech is 1.575031).

The 1st highest number of forms (1) was observed with the lemma “!”: !.

The 2nd highest number of forms (1) was observed with the lemma “?”: ?.

The 3rd highest number of forms (1) was observed with the lemma “…”: .

PUNCT does not occur with any features.

Relations

PUNCT nodes are attached to their parents using 1 different relations: sl-dep/punct (542; 100% instances)

Parents of PUNCT nodes belong to 11 different parts of speech: VERB (289; 53% instances), NOUN (64; 12% instances), ADV (48; 9% instances), PRON (45; 8% instances), INTJ (27; 5% instances), PART (26; 5% instances), ADJ (24; 4% instances), PROPN (10; 2% instances), X (6; 1% instances), NUM (2; 0% instances), SCONJ (1; 0% instances)

542 (100%) PUNCT nodes are leaves.

The highest child degree of a PUNCT node is 0.


PUNCT in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]