WikiWars-UA: Ukrainian corpus annotated with temporal expressions

Abstract

Reliability of tools and reproducibility of study results are important features of modern Natural Language Processing (NLP) tools and methods. The scientific research is indeed increasingly coming under criticism for the lack of reproducibility of results. First step towards the reproducibility is related to the availability of freely usable tools and corpora. In our work, we are interested in automatic processing of unstructured documents for the extraction of temporal information. Our main objective is to create reference annotated corpus with temporal information related to dates (absolute and relative), periods, time, etc. in Ukrainian, and to their normalization. The approach relies on the adaptation of existing application, automatic pre-annotation of WikiWars corpus in Ukrainian and its manual correction. The reference corpus permits to reliably evaluate the current version of the automatic temporal annotator and to prepare future work on these topics.

Description

Keywords

Temporality, Information Extraction, Ukrainian, WikiWars, HeidelTime, Reference Corpus

Citation

Grabar N. WikiWars-UA: Ukrainian corpus annotated with temporal expressions / Natalia Grabar, Thierry Hamon // Computational Linguistics and Intelligent Systems. — Lviv : Lviv Politechnic Publishing House, 2019. — Vol 2 : Proceedings of the 3nd International conference, COLINS 2019. Workshop, Kharkiv, Ukraine, April 18-19, 2019. — P. 22–31. — (Paper presentations).