This volume represents the proceedings of the Workshop Conference, with Posters and Demonstrations track, of the 3rd International Conference on Computational Linguistics and Intelligent Systems, held in Kharkiv, Ukraine, in April 2019. It comprises 13 contributed papers that were carefully peer-reviewed and selected from 27 submissions. The volume opens with the abstracts of the keynote talks. The rest of the collection is organized in two parts. Parts II contain the contributions to the Main COLINS Conference tracks, structured in two topical sections: (I) Computational Linguistics; (II) Intelligent Systems.
Computational Linguistics and Intelligent Systems. – Lviv : Lviv Politechnic Publishing House, 2019. – Volume 2 : Proceedings of the 3nd International conference, COLINS 2019. Workshop, Kharkiv, Ukraine, April 18–19, 2019. – 78 p.
Computational Linguistics and Intelligent Systems
Зміст (том 2 : Proceedings of the 3nd International conference, COLINS 2019. Workshop, Kharkiv, Ukraine, April 18-19, 2019)
(Lviv Politechnic Publishing House, 2019-04-18) Chuiko, Viktoriia; Khairova, Nina; National Technical University «Kharkiv Polytechnic Institute»
The paper contains review of the existing methods for semantic similarity identification, such as methods based on the distance between concepts and methods based on lexical intersection. We proposed a method for measuring the semantic similarity of short text fragment, i.e. two sentences. Also, we created corpus of mass-media text. It contains articles of Kharkiv news, that were sorted by their source and date. Then we annotated texts. We defined semantic similarity of sentences manually. In this way, we created learning corpus for our future system.
(Lviv Politechnic Publishing House, 2019-04-18) Bobrovnyk, Kateryna; Taras Shevchenko National University of Kyiv
Toxic text detection is an emerging area of study in Inter-net linguistics and corpus linguistics. The relevance of the topic can be explained by the lack of Ukrainian social media text corpora that are publicly available. Research involves building of the Ukrainian Twitter corpus by means of scraping; collective annotation of 'toxic/non-toxic' texts; construction of the obscene words dictionary for future feature engineering; and models training for the task of text classi cation (com-paring Logistic Regression, Support Vector Machine, and Deep Neural Network).
(Lviv Politechnic Publishing House, 2019-04-18) Puzik, Oleksii; Kharkiv National University of Radio Electronics
Intelligence knowledge-based systems are important part of natural language processing researches. Appropriate formal models simplify developing of such systems and open new ways to improve their quality. This work is devoted to developing of intelligence knowledge-based system using model based on algebra of finite predicates. The model also isbased on lexicographical computer system which consists of trilingual and explanatory dictionaries. Algebra of finite predicates is used as formalization tool.Problems of distinguishing semantic entities is investigated during research. Method of resolving homonymy ambiguities is used to extract separate entities, thus allowing formalization of semantic relationships. In result formal model of intelligence knowledge-based system was developed.It was shown way to extend the model for different languages.
(Lviv Politechnic Publishing House, 2019-04-18) Bilova, Mariia; Trehubenko, Oleksandr; National Technical University «Kharkiv Polytechnic Institute»
On the background of software (SW) increase in quantity and complexity and SW versions change, a friendly interface allows enhancing SW competitiveness, reduction in SW development costs, increase in SW users number and users satisfaction, as well as reduction in costs needed for users training and support. The product using which users achieve the goals set and solve various issues in an efficient way, is deemed to be a user-friendly software product.The purpose of the article is to study existing methods for assessing the application usability and analyzing the features of using the main software usability indicators on the example of software for customer loyalty of 'Infotech' consumer society.
(Lviv Politechnic Publishing House, 2019-04-18) Klyushin, Dmitry; Lyashko, Sergey; Zub, Stanislav; Taras Shevchenko National University of Kyiv
The commonly used statistical tools in machine learning are two-sample tests for verifying hypotheses on homogeneity, for example, for estimation of corpushomogeneity, testing text authorship and so on. Often, they are effective only for sufficiently large sample (n> 100) and have limited application in situations where the size of samples is small (n < 30). To solve the problem for small samples, methods of reproducing samples are often used: jackknife and bootstrap. We propose and investigate a family of homogeneity measures based on A(n) assumption that are effective both for small and large samples.
(Lviv Politechnic Publishing House, 2019-04-18) Grabar, Natalia; Hamon, Thierry; CNRS, Univ. Lille, UMR 81G3 - STL - Savoirs Textes Langage, F-59000 Lille, France; LIMSI, CNRS, Université Paris-Saclay. F-91405 Orsay, France; Université Paris 13. Sorbonne Paris Cité. F-93430 Villetaneuse. France
Reliability of tools and reproducibility of study results are important features of modern Natural Language Processing (NLP) tools and methods. The scientific research is indeed increasingly coming under criticism for the lack of reproducibility of results. First step towards the reproducibility is related to the availability of freely usable tools and corpora. In our work, we are interested in automatic processing of unstructured documents for the extraction of temporal information. Our main objective is to create reference annotated corpus with temporal information related to dates (absolute and relative), periods, time, etc. in Ukrainian, and to their normalization. The approach relies on the adaptation of existing application, automatic pre-annotation of WikiWars corpus in Ukrainian and its manual correction. The reference corpus permits to reliably evaluate the current version of the automatic temporal annotator and to prepare future work on these topics.
(Lviv Politechnic Publishing House, 2019-04-18) Berko, Andrii; Lviv Polytechnic National University
Unlike traditional databases, Big Data stored as NoSQL data resources. Therefore such resources are not ready for efficient use in its original form in most cases. It is due to the availability of various kinds of data anomalies. Most of these anomalies are such as data duplication, ambiguity, inaccuracy, contradiction, absence, the incompleteness of data, etc. To eliminate such incorrectness, data source special cleanup procedures are needed. Data cleanup process requires additional information about the composition, content, meaning, and function of this Big Data resource. Using the special knowledge base can provide a resolving of such problem.
(Lviv Politechnic Publishing House, 2019-04-18) Shanidze, Olexandr; Petrasova, Svitlana; National Technical University "Kharkiv Polytechnic Institute"
This paper proposes the algorithm for automatic extraction of semantic relations using the rule-based approach. The authors suggest identifying certain verbs (predicates) between a subject and an object of expressions to obtain a sequence of semantic relations in the designed text corpus of Wikipedia articles. The synsets from WordNet are applied to extract semantic relations between concepts and their synonyms from the text corpus.
(Lviv Politechnic Publishing House, 2019-04-18) Manuilov, Illia; Petrasova, Svitlana; National Technical University "Kharkiv Polytechnic Institute"
The paper discusses the process of automatic extraction of paraphrases used in rewriting. The researchers propose the method for extracting paraphrases from English news text corpora. The method is based on both the developed syntactic rules to define phrases and synsets to identify synonymous words in the designed text corpus of BBC news. In order to implement the method, Natural Language Toolkit, Universal Dependencies parser and WordNet are used.
(Lviv Politechnic Publishing House, 2019-04-18) Razno, Maria; National Technical University "Kharkiv Polytechnic Institute"
This article describes the relevance of the word processing task that is written in human language by the methods of Machine Learning and NLP approach, that can be used on Python programming language. It also portrays the concept of Machine Learning, its main varieties and the most popular Pythonpackages and libraries for working with text data using Machine Learning methods. The concept of NLP and the most popular python packages are also presented in the article. The machine learning classification model algorithm based on the text processing is introduced in the article. It shows how to use classification machine learning and NLP methods in practice.
(Lviv Politechnic Publishing House, 2019-04-18) Liutenko, Iryna; Kurasov, Oleksiy; National Technical University "Kharkiv Polytechnic Institute"
Software testing problems were considered. Tests quality estimation approaches were determined and justified. There are performance, coverage and implementation factors, which can be used for comprehensive evaluation. Performance approach can be used to estimate testing effectiveness in action by proportions of fixed and not fixed bugs. Coverage approach means volume of fully tested requirements and code structures. Implementation characteristics can be used to evaluate tests as software code. Software tests quality indicators were selected for each of these factors and can be used for assessment. Multicriterion evaluation problems were considered. ПАКС ("Последовательное агрегирование классифицируемых состояний") method was proposed as decision of quality assessment problems.
(Lviv Politechnic Publishing House, 2019-04-18) Lytvynenko, Julia; National Technical University «Kharkiv Polytechnic Institute»
This article describes methods and existing libraries for POS-tagging and collocations extraction, using NLP technologies, processing natural language text in the Python programming language. In addition, it describes one of the possible methods for the selection of collocations for a given pattern.
(Lviv Politechnic Publishing House, 2019-04-18) Drobot, Tetiana; Taras Shevchenko National University of Kyiv
The first commercial implementation of Natural Language Generation (NLG) system dates back to the turn of the XXI century. Since then two main methods of NLG – text-to-text generation and data-to-text generation – have grown more complex in order to solve new business challenges. This research project focuses on the full cycle of template-based generation of hotel descriptions from linguistic and non-linguistic input: starting with data scraping and preparation up to rendering the whole text. Also, several improvements to the template- based approach were suggested.