home edit page issue tracker

Universal Dependencies tools

This page collects links to and minimal instructions for tools related to Universal Dependencies.

If you would like to have your tool added to this page, please write the UD mailing list.

Listing

UD-maintained tools

The Universal Dependencies project maintains a number of core tools for working with UD data. These tools are available from https://github.com/universaldependencies/tools and briefly described below.

Annotation statistics

(Description TODO)

Consistency checking

(Description TODO)

Data validation

(Description TODO)

Format conversion

(Description TODO)

Third-party tools

brat rapid annotation tool

brat is a browser-based tool for text annotation. The brat visualization component is used in the UD documentation system and the tool can be easily configured for UD annotation (TODO: link instructions).

Treex

Treex is a modular natural language processing framework. It reads and writes data in many formats (including CoNLL-U) and provides API for dependency tree manipulation. Many treebanks have been harmonized in HamleDT and then converted to UD using Treex.

Tred

Tred (Tree Editor) is a graph visualization and manipulation program written in Perl. It was the main tool used to annotate the Prague treebanks. It supports macros (in Perl) to automate frequently repeated operations. There are extensions for various annotation layers such as MWEs or coreference. It cannot read directly the CoNLL-U format. However, it is quite powerful in combination with Treex, which can also convert the files from and to CoNLL-U.

UDPipe

UDPipe is a trainable pipeline for tokenization, tagging, lemmatization and parsing of CoNLL-U files. UDPipe is language-agnostic and can be trained given only annotated data in CoNLL-U format. (Nevertheless, to train the tokenizer, either the SpaceAfter feature must be present, or at least some plain text must be available; also morphological analyzer and lemmatizer can be improved if morphological dictionary is provided.) Trained models are provided for nearly all UD treebanks. UDPipe is available as a binary, as a library for C++, Python, Perl, Java, C#, and as a web service.

DgAnnotator

DgAnnotator (Dependency Graph Annotator) is a visual tool for annotating text with syntactic information, in particular creating a dependency tree. The tool reads and produces annotated documents in both XML, CoNLL-X and CoNLL-U tab separated format. Additional features: shows the differences, highlighted in red, between two different dependency trees on the same corpus; generates PNG snapshots of trees; performs PoS tagging and parsing connecting to a network service; panning and zooming.