home edit page issue tracker

Introduction

The French data come from the universal Google dataset (version 2.0): a mix of random sentences sampled from Google News, from Blogger, from Wikipedia and from Google local reviews. The conversion to the UD POS and UD dependencies have been performed automatically, using heuristic rules and fixed lists of words (produced by native speakers of the language). The output of the conversion has not been manually corrected systematically.

More information about the original Google dataset can be found in the following paper:

Universal Dependency Annotation for Multilingual Parsing Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Tackstrom, Claudia Bedini, Nuria Bertomeu Castello and Jungmee Lee Proceedings of ACL 2013