PUNCT
: punctuation
Definition
Punctuation marks are non-alphabetical characters and character groups used to delimit linguistic units in printed text.
Punctuation is not taken to include logograms such as $, %, and §, which are instead tagged as SYM.
Examples
- Period: .
- Comma: ,
- Parentheses: ()
Treebank Statistics (UD_Russian)
There are 1 PUNCT
lemmas (7%), 17 PUNCT
types (0%) and 18807 PUNCT
tokens (19%).
Out of 15 observed tags, the rank of PUNCT
is: 11 in number of lemmas, 13 in number of types and 2 in number of tokens.
The 10 most frequent PUNCT
lemmas: _
The 10 most frequent PUNCT
types: ,, ., –, ), (, ``, '', -, :, ;
The 10 most frequent ambiguous lemmas: _ (NOUN 26660, PUNCT 18807, ADJ 12528, ADP 10735, VERB 9436, PROPN 7604, CONJ 3168, ADV 2142, NUM 1900, PRON 1763, X 1700, DET 1673, SCONJ 624, PART 491, SYM 158)
The 10 most frequent ambiguous types: . (PUNCT 4941, DET 1), '' (PUNCT 1088, X 2), … (PUNCT 41, X 1), − (PUNCT 4, X 2), мест (NOUN 12, PUNCT 1)
- .
- ''
- …
- −
- мест
Morphology
The form / lemma ratio of PUNCT
is 17.000000 (the average of all parts of speech is 2046.733333).
The 1st highest number of forms (17) was observed with the lemma “_”: !, '', ‘, (, ), ,, -, –, ., .(, …, :, ;, ?, ``, мест, −.
PUNCT
does not occur with any features.
Relations
PUNCT
nodes are attached to their parents using 7 different relations: ru-dep/punct (17695; 94% instances), ru-dep/goeswith (1096; 6% instances), ru-dep/conj (6; 0% instances), ru-dep/dep (3; 0% instances), ru-dep/nmod (3; 0% instances), ru-dep/parataxis (3; 0% instances), ru-dep/nummod (1; 0% instances)
Parents of PUNCT
nodes belong to 14 different parts of speech: VERB (7433; 40% instances), NOUN (6232; 33% instances), PROPN (2410; 13% instances), ADJ (1643; 9% instances), NUM (331; 2% instances), ADP (231; 1% instances), ADV (226; 1% instances), X (141; 1% instances), DET (81; 0% instances), SYM (35; 0% instances), PUNCT (18; 0% instances), CONJ (15; 0% instances), PRON (9; 0% instances), SCONJ (2; 0% instances)
18778 (100%) PUNCT
nodes are leaves.
12 (0%) PUNCT
nodes have one child.
10 (0%) PUNCT
nodes have two children.
7 (0%) PUNCT
nodes have three or more children.
The highest child degree of a PUNCT
node is 4.
Children of PUNCT
nodes are attached using 9 different relations: ru-dep/punct (14; 26% instances), ru-dep/case (13; 24% instances), ru-dep/nmod (11; 20% instances), ru-dep/goeswith (7; 13% instances), ru-dep/conj (4; 7% instances), ru-dep/amod (2; 4% instances), ru-dep/acl:relcl (1; 2% instances), ru-dep/discourse (1; 2% instances), ru-dep/mark (1; 2% instances)
Children of PUNCT
nodes belong to 9 different parts of speech: PUNCT (18; 33% instances), ADP (12; 22% instances), NOUN (9; 17% instances), X (8; 15% instances), ADJ (3; 6% instances), NUM (1; 2% instances), PRON (1; 2% instances), PROPN (1; 2% instances), SCONJ (1; 2% instances)
Treebank Statistics (UD_Russian-SynTagRus)
There are 17 PUNCT
lemmas (0%), 17 PUNCT
types (0%) and 188918 PUNCT
tokens (18%).
Out of 15 observed tags, the rank of PUNCT
is: 11 in number of lemmas, 13 in number of types and 2 in number of tokens.
The 10 most frequent PUNCT
lemmas: ,, ., “, -, :, ?, ), (, !, …
The 10 most frequent PUNCT
types: ,, ., “, -, :, ?, ), (, !, …
The 10 most frequent ambiguous lemmas:
The 10 most frequent ambiguous types: ? (PUNCT 3014, NOUN 1)
Morphology
The form / lemma ratio of PUNCT
is 1.000000 (the average of all parts of speech is 2.787274).
The 1st highest number of forms (1) was observed with the lemma “!”: !.
The 2nd highest number of forms (1) was observed with the lemma “””: ”.
The 3rd highest number of forms (1) was observed with the lemma “(”: (.
PUNCT
does not occur with any features.
Relations
PUNCT
nodes are attached to their parents using 1 different relations: ru-dep/punct (188918; 100% instances)
Parents of PUNCT
nodes belong to 15 different parts of speech: NOUN (86577; 46% instances), VERB (59427; 31% instances), ADJ (19185; 10% instances), ADV (12305; 7% instances), PRON (3176; 2% instances), NUM (2065; 1% instances), PART (1932; 1% instances), CONJ (1554; 1% instances), ADP (1431; 1% instances), SCONJ (519; 0% instances), DET (292; 0% instances), AUX (186; 0% instances), SYM (153; 0% instances), INTJ (115; 0% instances), PUNCT (1; 0% instances)
188917 (100%) PUNCT
nodes are leaves.
1 (0%) PUNCT
nodes have one child.
The highest child degree of a PUNCT
node is 1.
Children of PUNCT
nodes are attached using 1 different relations: ru-dep/punct (1; 100% instances)
Children of PUNCT
nodes belong to 1 different parts of speech: PUNCT (1; 100% instances)
PUNCT in other languages: [bg] [cs] [de] [el] [en] [es] [eu] [fa] [fi] [fr] [ga] [he] [hu] [it] [ja] [ko] [sv] [u]