Automated building and analysis of Ukrainian Twitter corpus for toxic text detection

Bobrovnyk, Kateryna

Automated building and analysis of Ukrainian Twitter corpus for toxic text detection

Files

2019v2___Proceedings_of_the_3nd_International_conference_COLINS_2019_Workshop_Kharkiv_Ukraine_April_18-19_2019_Bobrovnyk_K-Automated_building_and_55-56.pdf (321.3 KB)

2019v2___Proceedings_of_the_3nd_International_conference_COLINS_2019_Workshop_Kharkiv_Ukraine_April_18-19_2019_Bobrovnyk_K-Automated_building_and_55-56__COVER.png (291.21 KB)

Date

2019-04-18

Authors

Bobrovnyk, Kateryna

Publisher

Lviv Politechnic Publishing House

Abstract

Toxic text detection is an emerging area of study in Inter-net linguistics and corpus linguistics. The relevance of the topic can be explained by the lack of Ukrainian social media text corpora that are publicly available. Research involves building of the Ukrainian Twitter corpus by means of scraping; collective annotation of 'toxic/non-toxic' texts; construction of the obscene words dictionary for future feature engineering; and models training for the task of text classi cation (com-paring Logistic Regression, Support Vector Machine, and Deep Neural Network).

Keywords

toxic text detection, text corpus, Twitter

Citation

Bobrovnyk K. Automated building and analysis of Ukrainian Twitter corpus for toxic text detection / Kateryna Bobrovnyk // Computational Linguistics and Intelligent Systems. — Lviv : Lviv Politechnic Publishing House, 2019. — Vol 2 : Proceedings of the 3nd International conference, COLINS 2019. Workshop, Kharkiv, Ukraine, April 18-19, 2019. — P. 55–56. — (Student section).

URI

https://ena.lpnu.ua/handle/ntb/45496

Collections

Computational linguistics and intelligent systems. – 2019 р.

Full item page

Automated building and analysis of Ukrainian Twitter corpus for toxic text detection

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By