home ru/pos edit page issue tracker

PUNCT: punctuation

Definition

Punctuation marks are non-alphabetical characters and character groups used to delimit linguistic units in printed text.

Punctuation is not taken to include logograms such as $, %, and §, which are instead tagged as SYM.

Examples


Treebank Statistics (UD_Russian)

There are 1 PUNCT lemmas (7%), 17 PUNCT types (0%) and 18807 PUNCT tokens (19%). Out of 15 observed tags, the rank of PUNCT is: 11 in number of lemmas, 13 in number of types and 2 in number of tokens.

The 10 most frequent PUNCT lemmas: _

The 10 most frequent PUNCT types: ,, ., –, ), (, ``, '', -, :, ;

The 10 most frequent ambiguous lemmas: _ (NOUN 26660, PUNCT 18807, ADJ 12528, ADP 10735, VERB 9436, PROPN 7604, CONJ 3168, ADV 2142, NUM 1900, PRON 1763, X 1700, DET 1673, SCONJ 624, PART 491, SYM 158)

The 10 most frequent ambiguous types: . (PUNCT 4941, DET 1), '' (PUNCT 1088, X 2), (PUNCT 41, X 1), (PUNCT 4, X 2), мест (NOUN 12, PUNCT 1)

Morphology

The form / lemma ratio of PUNCT is 17.000000 (the average of all parts of speech is 2046.733333).

The 1st highest number of forms (17) was observed with the lemma “_”: !, '', ‘, (, ), ,, -, –, ., .(, …, :, ;, ?, ``, мест, −.

PUNCT does not occur with any features.

Relations

PUNCT nodes are attached to their parents using 7 different relations: ru-dep/punct (17695; 94% instances), ru-dep/goeswith (1096; 6% instances), ru-dep/conj (6; 0% instances), ru-dep/dep (3; 0% instances), ru-dep/nmod (3; 0% instances), ru-dep/parataxis (3; 0% instances), ru-dep/nummod (1; 0% instances)

Parents of PUNCT nodes belong to 14 different parts of speech: VERB (7433; 40% instances), NOUN (6232; 33% instances), PROPN (2410; 13% instances), ADJ (1643; 9% instances), NUM (331; 2% instances), ADP (231; 1% instances), ADV (226; 1% instances), X (141; 1% instances), DET (81; 0% instances), SYM (35; 0% instances), PUNCT (18; 0% instances), CONJ (15; 0% instances), PRON (9; 0% instances), SCONJ (2; 0% instances)

18778 (100%) PUNCT nodes are leaves.

12 (0%) PUNCT nodes have one child.

10 (0%) PUNCT nodes have two children.

7 (0%) PUNCT nodes have three or more children.

The highest child degree of a PUNCT node is 4.

Children of PUNCT nodes are attached using 9 different relations: ru-dep/punct (14; 26% instances), ru-dep/case (13; 24% instances), ru-dep/nmod (11; 20% instances), ru-dep/goeswith (7; 13% instances), ru-dep/conj (4; 7% instances), ru-dep/amod (2; 4% instances), ru-dep/acl:relcl (1; 2% instances), ru-dep/discourse (1; 2% instances), ru-dep/mark (1; 2% instances)

Children of PUNCT nodes belong to 9 different parts of speech: PUNCT (18; 33% instances), ADP (12; 22% instances), NOUN (9; 17% instances), X (8; 15% instances), ADJ (3; 6% instances), NUM (1; 2% instances), PRON (1; 2% instances), PROPN (1; 2% instances), SCONJ (1; 2% instances)


Treebank Statistics (UD_Russian-SynTagRus)

There are 17 PUNCT lemmas (0%), 17 PUNCT types (0%) and 188918 PUNCT tokens (18%). Out of 15 observed tags, the rank of PUNCT is: 11 in number of lemmas, 13 in number of types and 2 in number of tokens.

The 10 most frequent PUNCT lemmas: ,, ., “, -, :, ?, ), (, !, …

The 10 most frequent PUNCT types: ,, ., “, -, :, ?, ), (, !, …

The 10 most frequent ambiguous lemmas:

The 10 most frequent ambiguous types: ? (PUNCT 3014, NOUN 1)

Morphology

The form / lemma ratio of PUNCT is 1.000000 (the average of all parts of speech is 2.787274).

The 1st highest number of forms (1) was observed with the lemma “!”: !.

The 2nd highest number of forms (1) was observed with the lemma “””: .

The 3rd highest number of forms (1) was observed with the lemma “(”: (.

PUNCT does not occur with any features.

Relations

PUNCT nodes are attached to their parents using 1 different relations: ru-dep/punct (188918; 100% instances)

Parents of PUNCT nodes belong to 15 different parts of speech: NOUN (86577; 46% instances), VERB (59427; 31% instances), ADJ (19185; 10% instances), ADV (12305; 7% instances), PRON (3176; 2% instances), NUM (2065; 1% instances), PART (1932; 1% instances), CONJ (1554; 1% instances), ADP (1431; 1% instances), SCONJ (519; 0% instances), DET (292; 0% instances), AUX (186; 0% instances), SYM (153; 0% instances), INTJ (115; 0% instances), PUNCT (1; 0% instances)

188917 (100%) PUNCT nodes are leaves.

1 (0%) PUNCT nodes have one child.

The highest child degree of a PUNCT node is 1.

Children of PUNCT nodes are attached using 1 different relations: ru-dep/punct (1; 100% instances)

Children of PUNCT nodes belong to 1 different parts of speech: PUNCT (1; 100% instances)


PUNCT in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]