Computational linguistics and intelligent systems
Permanent URI for this communityhttps://ena.lpnu.ua/handle/ntb/39447
Browse
61 results
Search Results
Item Automated building and analysis of Ukrainian Twitter corpus for toxic text detection(Lviv Politechnic Publishing House, 2019-04-18) Bobrovnyk, Kateryna; Taras Shevchenko National University of KyivToxic text detection is an emerging area of study in Inter-net linguistics and corpus linguistics. The relevance of the topic can be explained by the lack of Ukrainian social media text corpora that are publicly available. Research involves building of the Ukrainian Twitter corpus by means of scraping; collective annotation of 'toxic/non-toxic' texts; construction of the obscene words dictionary for future feature engineering; and models training for the task of text classi cation (com-paring Logistic Regression, Support Vector Machine, and Deep Neural Network).Item Semantic similarity identification for short text fragments(Lviv Politechnic Publishing House, 2019-04-18) Chuiko, Viktoriia; Khairova, Nina; National Technical University «Kharkiv Polytechnic Institute»The paper contains review of the existing methods for semantic similarity identification, such as methods based on the distance between concepts and methods based on lexical intersection. We proposed a method for measuring the semantic similarity of short text fragment, i.e. two sentences. Also, we created corpus of mass-media text. It contains articles of Kharkiv news, that were sorted by their source and date. Then we annotated texts. We defined semantic similarity of sentences manually. In this way, we created learning corpus for our future system.Item Intelligence knowledge-based system based on multilingual dictionaries(Lviv Politechnic Publishing House, 2019-04-18) Puzik, Oleksii; Kharkiv National University of Radio ElectronicsIntelligence knowledge-based systems are important part of natural language processing researches. Appropriate formal models simplify developing of such systems and open new ways to improve their quality. This work is devoted to developing of intelligence knowledge-based system using model based on algebra of finite predicates. The model also isbased on lexicographical computer system which consists of trilingual and explanatory dictionaries. Algebra of finite predicates is used as formalization tool.Problems of distinguishing semantic entities is investigated during research. Method of resolving homonymy ambiguities is used to extract separate entities, thus allowing formalization of semantic relationships. In result formal model of intelligence knowledge-based system was developed.It was shown way to extend the model for different languages.Item Study of software systems usability used for customers loyalty identification(Lviv Politechnic Publishing House, 2019-04-18) Bilova, Mariia; Trehubenko, Oleksandr; National Technical University «Kharkiv Polytechnic Institute»On the background of software (SW) increase in quantity and complexity and SW versions change, a friendly interface allows enhancing SW competitiveness, reduction in SW development costs, increase in SW users number and users satisfaction, as well as reduction in costs needed for users training and support. The product using which users achieve the goals set and solve various issues in an efficient way, is deemed to be a user-friendly software product.The purpose of the article is to study existing methods for assessing the application usability and analyzing the features of using the main software usability indicators on the example of software for customer loyalty of 'Infotech' consumer society.Item A(n) Assumption in machine learning(Lviv Politechnic Publishing House, 2019-04-18) Klyushin, Dmitry; Lyashko, Sergey; Zub, Stanislav; Taras Shevchenko National University of KyivThe commonly used statistical tools in machine learning are two-sample tests for verifying hypotheses on homogeneity, for example, for estimation of corpushomogeneity, testing text authorship and so on. Often, they are effective only for sufficiently large sample (n> 100) and have limited application in situations where the size of samples is small (n < 30). To solve the problem for small samples, methods of reproducing samples are often used: jackknife and bootstrap. We propose and investigate a family of homogeneity measures based on A(n) assumption that are effective both for small and large samples.Item WikiWars-UA: Ukrainian corpus annotated with temporal expressions(Lviv Politechnic Publishing House, 2019-04-18) Grabar, Natalia; Hamon, Thierry; CNRS, Univ. Lille, UMR 81G3 - STL - Savoirs Textes Langage, F-59000 Lille, France; LIMSI, CNRS, Université Paris-Saclay. F-91405 Orsay, France; Université Paris 13. Sorbonne Paris Cité. F-93430 Villetaneuse. FranceReliability of tools and reproducibility of study results are important features of modern Natural Language Processing (NLP) tools and methods. The scientific research is indeed increasingly coming under criticism for the lack of reproducibility of results. First step towards the reproducibility is related to the availability of freely usable tools and corpora. In our work, we are interested in automatic processing of unstructured documents for the extraction of temporal information. Our main objective is to create reference annotated corpus with temporal information related to dates (absolute and relative), periods, time, etc. in Ukrainian, and to their normalization. The approach relies on the adaptation of existing application, automatic pre-annotation of WikiWars corpus in Ukrainian and its manual correction. The reference corpus permits to reliably evaluate the current version of the automatic temporal annotator and to prepare future work on these topics.Item Зміст до "Computational Linguistics and Intelligent Systems"(Lviv Politechnic Publishing House, 2019-04-18)Item Knowledge-based Big Data Cleanup method(Lviv Politechnic Publishing House, 2019-04-18) Berko, Andrii; Lviv Polytechnic National UniversityUnlike traditional databases, Big Data stored as NoSQL data resources. Therefore such resources are not ready for efficient use in its original form in most cases. It is due to the availability of various kinds of data anomalies. Most of these anomalies are such as data duplication, ambiguity, inaccuracy, contradiction, absence, the incompleteness of data, etc. To eliminate such incorrectness, data source special cleanup procedures are needed. Data cleanup process requires additional information about the composition, content, meaning, and function of this Big Data resource. Using the special knowledge base can provide a resolving of such problem.Item Author index, Reviewers(Lviv Politechnic Publishing House, 2019-04-18)Item Extraction of semantic relations from Wikipedia text corpus(Lviv Politechnic Publishing House, 2019-04-18) Shanidze, Olexandr; Petrasova, Svitlana; National Technical University "Kharkiv Polytechnic Institute"This paper proposes the algorithm for automatic extraction of semantic relations using the rule-based approach. The authors suggest identifying certain verbs (predicates) between a subject and an object of expressions to obtain a sequence of semantic relations in the designed text corpus of Wikipedia articles. The synsets from WordNet are applied to extract semantic relations between concepts and their synonyms from the text corpus.