Creation of a multilingual aligned corpus with Ukrainian as the target language and its exploitation

Grabar, Natalia; Hamon, Thierry

Creation of a multilingual aligned corpus with Ukrainian as the target language and its exploitation

Files

004-010-019.pdf (305.13 KB)

Date

2017

Authors

Grabar, Natalia

Hamon, Thierry

Publisher

National Technical University «KhPI»

Abstract

The question on creation of linguistic resources (such as corpora, lexica or terminologies) occupies an important place in the research areas related to linguistics, Natural Language Processing, Computer Sciences, psycholinguistics, etc. In this paper, we propose the description of a multilingual corpus in which Ukrainian is the target language, while source languages are Polish, French and English. The corpus contains literary texts and a small subset built with texts provided by medical area. On the whole, the corpus is composed of 62 literary texts and 129 medical texts. The corpus counts over 1 million words in the tar-get Ukrainian language, and at least as much in the source languages taken all together. This is a directional corpus aligned at the level of sentences. After the description of this corpus, we introduce some possible exploitations and first results. We then conclude and indicate some directions for future work. The corpus presented in this work is available for the research purposes: http://natalia.grabar.free.fr/resources.php.

Citation

Grabar N. Creation of a multilingual aligned corpus with Ukrainian as the target language and its exploitation / Natalia Grabar, Thierry Hamon // Computational linguistics andintelligent systems (COLINS 2017) : proceedings of the 1st International conference, Kharkiv, Ukraine, 21 April 2017 / National Technical University «KhPI», Lviv Polytechnic National University. – Kharkiv, 2017. – P. 10–19. – Bibliography: 40 titles.

URI

https://ena.lpnu.ua/handle/ntb/39454

Collections

Computational linguistics and intelligent systems. – 2017 р.

Full item page

Creation of a multilingual aligned corpus with Ukrainian as the target language and its exploitation

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By