Automated building and analysis of Ukrainian Twitter corpus for toxic text detection

dc.citation.epage56
dc.citation.journalTitleComputational Linguistics and Intelligent Systems
dc.citation.spage55
dc.citation.volume2 : Proceedings of the 3nd International conference, COLINS 2019. Workshop, Kharkiv, Ukraine, April 18-19, 2019
dc.contributor.affiliationTaras Shevchenko National University of Kyiv
dc.contributor.authorBobrovnyk, Kateryna
dc.coverage.placenameLviv
dc.date.accessioned2019-10-31T13:21:07Z
dc.date.available2019-10-31T13:21:07Z
dc.date.created2019-04-18
dc.date.issued2019-04-18
dc.description.abstractToxic text detection is an emerging area of study in Inter-net linguistics and corpus linguistics. The relevance of the topic can be explained by the lack of Ukrainian social media text corpora that are publicly available. Research involves building of the Ukrainian Twitter corpus by means of scraping; collective annotation of 'toxic/non-toxic' texts; construction of the obscene words dictionary for future feature engineering; and models training for the task of text classi cation (com-paring Logistic Regression, Support Vector Machine, and Deep Neural Network).
dc.format.extent55-56
dc.format.pages2
dc.identifier.citationBobrovnyk K. Automated building and analysis of Ukrainian Twitter corpus for toxic text detection / Kateryna Bobrovnyk // Computational Linguistics and Intelligent Systems. — Lviv : Lviv Politechnic Publishing House, 2019. — Vol 2 : Proceedings of the 3nd International conference, COLINS 2019. Workshop, Kharkiv, Ukraine, April 18-19, 2019. — P. 55–56. — (Student section).
dc.identifier.citationenBobrovnyk K. Automated building and analysis of Ukrainian Twitter corpus for toxic text detection / Kateryna Bobrovnyk // Computational Linguistics and Intelligent Systems. — Lviv Politechnic Publishing House, 2019. — Vol 2 : Proceedings of the 3nd International conference, COLINS 2019. Workshop, Kharkiv, Ukraine, April 18-19, 2019. — P. 55–56. — (Student section).
dc.identifier.issn2523-4013
dc.identifier.urihttps://ena.lpnu.ua/handle/ntb/45496
dc.language.isoen
dc.publisherLviv Politechnic Publishing House
dc.relation.ispartofComputational Linguistics and Intelligent Systems (2), 2019
dc.relation.referencesen1. Pradheep, T. and Sheeba, J.I. and Yogeshwaran, T. and Pradeep Devaneyan, S.: Au-tomatic Multi Model Cyber Bullying Detection from Social Networks. In: Proceedings of the International Conference on Intelligent Computing, Salem, Tamilnadu, India. (2017) Available at SSRN: https://ssrn.com/abstract=3123710 or http://dx.doi.org/10.2139/ssrn.3123710
dc.relation.referencesen2. Kennedy, G. W., McCollough, A.W., Dixon, E., Bastidas, A.,Ryan, J.,Loo, C., Sahay, S.: Hack Harassment: Technology Solutions to Combat Online Harassment. In: Proceedings of the First Workshop on Abusive Language Online, pp. 73–77, Vancouver, Canada (2017)
dc.relation.referencesen3. Rubtsova, Y.: Constructing a corpus for sentiment classication training. SOFT-WARE SYSTEMS 1(109), 72-78 (2015)
dc.relation.referencesen4. Twitter Scraper, https://github.com/kennethreitz/twitter-scraper. Last accessed 13 April 2019
dc.relation.referencesen5. Language identication, https://fasttext.cc/docs/en/language-identi cation.html. Last accessed 13 April 2019
dc.relation.urihttps://ssrn.com/abstract=3123710
dc.relation.urihttp://dx.doi.org/10.2139/ssrn.3123710
dc.relation.urihttps://github.com/kennethreitz/twitter-scraper
dc.relation.urihttps://fasttext.cc/docs/en/language-identi
dc.rights.holder© 2019 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors.
dc.subjecttoxic text detection
dc.subjecttext corpus
dc.subjectTwitter
dc.titleAutomated building and analysis of Ukrainian Twitter corpus for toxic text detection
dc.typeArticle

Files

Original bundle

Now showing 1 - 2 of 2
Thumbnail Image
Name:
2019v2___Proceedings_of_the_3nd_International_conference_COLINS_2019_Workshop_Kharkiv_Ukraine_April_18-19_2019_Bobrovnyk_K-Automated_building_and_55-56.pdf
Size:
321.3 KB
Format:
Adobe Portable Document Format
Thumbnail Image
Name:
2019v2___Proceedings_of_the_3nd_International_conference_COLINS_2019_Workshop_Kharkiv_Ukraine_April_18-19_2019_Bobrovnyk_K-Automated_building_and_55-56__COVER.png
Size:
291.21 KB
Format:
Portable Network Graphics

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.95 KB
Format:
Plain Text
Description: