Моделі та засоби автоматизованого визначення статистичного профілю україномовних текстів

Теслюк, В. М.; Казимира, І. Я.; Кордіяка, Ю. М.; Рибак, І. Р.; Teslyuk, V. M.; Kazymyra, I.; Kordiiaka, Yu. M.; Rybak, I. R.

doi:doi.org/10.23939/ujit2022.01.037

Моделі та засоби автоматизованого визначення статистичного профілю україномовних текстів

dc.citation.epage	43
dc.citation.issue	1
dc.citation.journalTitle	Український журнал інформаційних технологій
dc.citation.spage	37
dc.citation.volume	4
dc.contributor.affiliation	Національний університет “Львівська політехніка”
dc.contributor.affiliation	Lviv Polytechnic National University
dc.contributor.author	Теслюк, В. М.
dc.contributor.author	Казимира, І. Я.
dc.contributor.author	Кордіяка, Ю. М.
dc.contributor.author	Рибак, І. Р.
dc.contributor.author	Teslyuk, V. M.
dc.contributor.author	Kazymyra, I.
dc.contributor.author	Kordiiaka, Yu. M.
dc.contributor.author	Rybak, I. R.
dc.coverage.placename	Львів
dc.coverage.placename	Lviv
dc.date.accessioned	2024-03-20T09:41:07Z
dc.date.available	2024-03-20T09:41:07Z
dc.date.created	2022-02-28
dc.date.issued	2022-02-28
dc.description.abstract	У роботі вирішується актуальне завдання із вдосконалення професійного програмного забезпечення для статистичного аналізу тексту відповідно до потреб фахівців. Проаналізовано особливості і перспективи статистичних досліджень у мовознавстві та розроблено інформаційну технологію (ІТ) визначення статистичного профілю україномовних текстів. Проведено комплексну роботу над моделюванням програмної системи, яку представлено у відповідних схемах і діаграмах, що цілісно відображають функціонування та призначення розробленого продукту. Розглядаються математичні та системні основи статистичного аналізу для автоматизації професійного опрацювання текстів українською мовою, в контексті впровадження пропонованої інформаційної технології. Побудовано структурну схему проектного рішення та визначено головні вимоги до апаратного забезпечення. Розроблено компоненти інформаційної технології та запропоновано структуру програмної системи, які ґрунтуються на модульному принципі. Розроблено математичне забезпечення ІТ, яке базується на методах прикладної статистики та дає змогу визначити основні характеристики (статистичний профіль) досліджуваних україномовних текстів. Окрім цього, розроблено алгоритмічне та програмне забезпечення ІТ, для реалізації якого використано Python. Наведено результати дослідження україномовних текстів та їх статистичні профілі, продемонстровано, що розроблена інформаційна технологія забезпечує опрацювання україномовних текстів з високим рівнем автоматизації. Отримані результати можна розглядати як внесок у розвиток наукових досліджень у лінгвістиці, завдяки якому створюються умови для вивчення авторських текстів різного стилю та ефективного використання професійних навичок та знань широким колом користувачів.
dc.description.abstract	The paper deals with the urgent issue of improving the professional software for text statistical analysis in accordance with the needs of specialists. Peculiarities and prospects of statistical research in linguistics are analyzed and information technology (IT) for determining the statistical profile of Ukrainian-language texts is developed. Complex work on modelling the software system was carried out, it was presented in the corresponding schemes and diagrams, which integrally reflect the functioning and purpose of the developed product. Mathematical and system bases of statistical analysis aimed at automation of professional processing of Ukrainian-language texts, in the context of introducing the offered information technology are considered. The structural scheme of the project decision is constructed and the main requirements for hardware are defined. The components of information technology are developed, and the software system structure is proposed, which is based on the modular principle. Mathematical support for IT has been developed, it is based on the methods of applied statistics and allows determining the main characteristics (statistical profile) of the studied Ukrainian-language texts. In addition, the algorithms and software for IT have been developed using Python. The results of research on Ukrainian-language texts and their statistical profiles are given, it is shown that the developed information technology provides processing of Ukrainian-language texts with a high level of automation. The obtained results can be considered as a contribution to the development of scientific research in linguistics, which creates conditions for the study of authors texts of different styles and the effective use of professional skills and knowledge by a wide range of users. The scientific novelty of the work is that a model of automated determination of the statistical profile of Ukrainian language texts has been developed, which provides an opportunity for a comprehensive study of the corpus of Ukrainian-language texts. The obtained results are also of practical significance, as the structural scheme of IT has been developed, software tools of information technology for automation of the determining the statistical profile of Ukrainian-language texts have been implemented, and the results of text investigation have been analyzed.
dc.format.extent	37-43
dc.format.pages	7
dc.identifier.citation	Моделі та засоби автоматизованого визначення статистичного профілю україномовних текстів / В. М. Теслюк, І. Я. Казимира, Ю. М. Кордіяка, І. Р. Рибак // Український журнал інформаційних технологій. — Львів : Видавництво Львівської політехніки, 2022. — Том 4. — № 1. — С. 37–43.
dc.identifier.citationen	Models and tools for automated determining the statistical profile of ukrainian-language texts / V. M. Teslyuk, I. Kazymyra, Yu. M. Kordiiaka, I. R. Rybak // Ukrainian Journal of Information Technology. — Lviv : Lviv Politechnic Publishing House, 2022. — Vol 4. — No 1. — P. 37–43.
dc.identifier.doi	doi.org/10.23939/ujit2022.01.037
dc.identifier.issn	2707-1898
dc.identifier.uri	https://ena.lpnu.ua/handle/ntb/61520
dc.language.iso	uk
dc.publisher	Видавництво Львівської політехніки
dc.publisher	Lviv Politechnic Publishing House
dc.relation.ispartof	Український журнал інформаційних технологій, 1 (4), 2022
dc.relation.ispartof	Ukrainian Journal of Information Technology, 1 (4), 2022
dc.relation.references	[1] Bisikalo, O. V., & Kravchuk, I. A. (2010, November). Analysis of the morphological structure of the word based on the associative-statistical approach. Journal of Vinnytsia Polytechnic Institute, 4, 134–136. Retrieved from: www.visnyk.vntu.edu.ua/index.php/visnyk/article/view/1495
dc.relation.references	[2] Buk, S. N., & Rovenchak, A. A. (2004). Rank-Frequency Analysis for Functional Style Corpora of Ukrainian. Journal of Quantitative Linguistics, 11(3), 161–71. https://doi.org/10.1080/0929617042000314912
dc.relation.references	[3] Grabar, N., & Thierry, H. (2017, April). Creation of a multilingual aligned corpus with Ukrainian as the target language and its exploitation. Computational linguistics and intelligent systems (COLINS 2017): proceedings of the 1st International conference, National Technical University "KhPI", 10–19. Retrieved from: http://ena.lp.edu.ua:8080/handle/ntb/39454
dc.relation.references	[4] Grodniewicz, J. P. (2021). The process of linguistic understanding. Synthese, 198, 11463–11481. https://doi.org/10.1007/s11229-020-02807-9
dc.relation.references	[5] Hlushchenko, V. A. (2010). Linguistic method and its structure. Linguistics, 6, 32–44. Retrieved from: http://nbuv.gov.ua/UJRN/MoZn_2010_6_5
dc.relation.references	[6] Hlybovets, A. M., & Tochytsky, V. V. (2017). Algorithm of tokenization and steaming for texts in Ukrainian. NaUKMA Research Papers Computer Science, 198, 4–8. Retrieved from: http://nbuv.gov.ua/UJRN/NaUKMAkn_2017_198_4
dc.relation.references	[7] Hoherchak, H., Darchuk, N., & Kryvyi, S. (2021). Representation, Analysis, and Extraction of Knowledge from Unstructured Natural Language Texts. Cybern Syst Anal, 57, 481–500. https://doi.org/10.1007/s10559-021-00373-7
dc.relation.references	[8] Khomytska, I. Y., Teslyuk, V. M., Bazylevych, I. B., & Beregovskyi, V. V. (2020). The statistical models and software for authorial style differentiation in english prose. Scientific Bulletin of UNFU, 30(5), 135–139. https://doi.org/10.36930/40300522
dc.relation.references	[9] Lawson, A. E., Oehrtman, M., & Jensen, J. (2008) Connecting Science and Mathematics: The Nature of Scientific and Statistical Hypothesis Testing. Int J of Sci and Math Educ, 6, 405–416. https://doi.org/10.1007/s10763-007-9108-5
dc.relation.references	[10] Levchenko, O., & Dilai, M. (2021). A Method of Automated Corpus-Based Identification of Metaphors for Compiling a Dictionary of Metaphors: A Case Study of the Emotion Conceptual Domain. 2021 IEEE 16th International Conference on Computer Sciences and Information Technologies (CSIT), 52–55. https://doi.org/10.1109/CSIT52700.2021.9648667
dc.relation.references	[11] Levchenko, O., Holtvian, V., & Dilai, M. (2021). Statistical profiles of Ukrainian prose fiction: Gender aspect. 2021 IEEE 16th International Conference on Computer Sciences and Information Technologies (CSIT), 97–100. https://doi.org/10.1109/CSIT52700.2021.9648668
dc.relation.references	[12] Levchenko, O., Tyshchenko, O., & Dilai, M. (2021). Automated identification of metaphors in annotated corpus (Based on substance terms). CEUR Workshop Proceedings, 2870(3), 16–31. Retrieved from: http://ceur-ws.org/Vol-2870/paper3.pdf
dc.relation.references	[13] Lupenko, S. A., Khomiv, B. A., & Sverstyuk, A. S. (2011) Comparative analysis of mathematical models, methods and methods for evaluating opinions in text data from Internet resources. Bulletin of Khmelnytsky National University. 6, 7–16. Retrieved from: http://ceur-ws.org/Vol-2870/paper3.pdf http://journals.khnu.km.ua/vestnik/zmisthtm/2011-6-t.htm
dc.relation.references	[14] Lytvyn, V., Vysotska, V., Uhryn, D., Hrendus, M., & Naum, O. (2018). Analysis of statistical methods for stable combinations determination of keywords identification. Eastern-European Journal of Enterprise Technologies, 2 (2 (92)), 23–37. https://doi.org/10.15587/1729-4061.2018.126009
dc.relation.references	[15] Nikonenko, A. O. (2012). Review of computer-linguistic methods of processing natural language texts. Artificial Intelligence, 4, 235–244. Retrieved from: http://dspace.nbuv.gov.ua/handle/123456789/57737
dc.relation.references	[16] Ostapova, I.V., Shirokov, V.A., Luchik, A. A., & Yablochkov, N. M. The study of the functioning of word equivalents in the text on the material of the Ukrainian National Linguistic Corpus. Speech Technology, (1-2), 114–120.
dc.relation.references	[17] Parshak, K. D. (2014). Text as an object of linguistic research. Scientific journal of M. P. Dragomanov National Pedagogical University. Series 10: Problems of grammar and lexicology of the Ukrainian language, 11, 196–199. Retrieved from: http://nbuv.gov.ua/UJRN/Nchnpu_10_2014_11_46
dc.relation.references	[18] Perebyinis, V. S., (1967) Statistical style settings. Kyiv: Naukova Dumka.
dc.relation.references	[19] Romaniuk, S. (2015). Application of statistical methods in linguistic research. Scientific Proceedings of Ostroh Academy National University: Philology Series, 54, 134–137. Retrieved from: http://eprints.oa.edu.ua/id/eprint/4185
dc.relation.references	[20] Rovenchak, A., & Buk, S. (2011). Application of a quantum ensemble model to linguistic analysis. Physica A: Statistical Mechanics and its Applications, 390(7), 1326–1331. https://doi.org/10.1016/j.physa.2010.12.009
dc.relation.references	[21] Shyrokov, V., Ostapova, I., &Yakymenko, K. (2014) Indexing the etymological lexicographic systems Cognitives Studies. Warsaw : SOW Publishing House, 13–23. https://doi.org/10.11649/cs.2014.001
dc.relation.references	[22] Tkachenko, O., & Humeniuk, M. (2020). Aspects of visualization of statistical and scientific data. Digital platform: information technologies in the socio-cultural sphere, 3(2), 134–147. https://doi.org/10.31866/2617-796x.3.2.2020.220584
dc.relation.references	[23] Zaiats, V. M., & Zaiats, M. M. (2010). Methods of comparing statistical characteristics in the formation of samples in linguistics. Journal of Lviv Polytechnic National University "Information Systems and Networks", 673, 296–305. Retrieved from: http://ena.lp.edu.ua:8080/bitstream/ntb/6753/1/33.pdf
dc.relation.referencesen	[1] Bisikalo, O. V., & Kravchuk, I. A. (2010, November). Analysis of the morphological structure of the word based on the associative-statistical approach. Journal of Vinnytsia Polytechnic Institute, 4, 134–136. Retrieved from: www.visnyk.vntu.edu.ua/index.php/visnyk/article/view/1495
dc.relation.referencesen	[2] Buk, S. N., & Rovenchak, A. A. (2004). Rank-Frequency Analysis for Functional Style Corpora of Ukrainian. Journal of Quantitative Linguistics, 11(3), 161–71. https://doi.org/10.1080/0929617042000314912
dc.relation.referencesen	[3] Grabar, N., & Thierry, H. (2017, April). Creation of a multilingual aligned corpus with Ukrainian as the target language and its exploitation. Computational linguistics and intelligent systems (COLINS 2017): proceedings of the 1st International conference, National Technical University "KhPI", 10–19. Retrieved from: http://ena.lp.edu.ua:8080/handle/ntb/39454
dc.relation.referencesen	[4] Grodniewicz, J. P. (2021). The process of linguistic understanding. Synthese, 198, 11463–11481. https://doi.org/10.1007/s11229-020-02807-9
dc.relation.referencesen	[5] Hlushchenko, V. A. (2010). Linguistic method and its structure. Linguistics, 6, 32–44. Retrieved from: http://nbuv.gov.ua/UJRN/MoZn_2010_6_5
dc.relation.referencesen	[6] Hlybovets, A. M., & Tochytsky, V. V. (2017). Algorithm of tokenization and steaming for texts in Ukrainian. NaUKMA Research Papers Computer Science, 198, 4–8. Retrieved from: http://nbuv.gov.ua/UJRN/NaUKMAkn_2017_198_4
dc.relation.referencesen	[7] Hoherchak, H., Darchuk, N., & Kryvyi, S. (2021). Representation, Analysis, and Extraction of Knowledge from Unstructured Natural Language Texts. Cybern Syst Anal, 57, 481–500. https://doi.org/10.1007/s10559-021-00373-7
dc.relation.referencesen	[8] Khomytska, I. Y., Teslyuk, V. M., Bazylevych, I. B., & Beregovskyi, V. V. (2020). The statistical models and software for authorial style differentiation in english prose. Scientific Bulletin of UNFU, 30(5), 135–139. https://doi.org/10.36930/40300522
dc.relation.referencesen	[9] Lawson, A. E., Oehrtman, M., & Jensen, J. (2008) Connecting Science and Mathematics: The Nature of Scientific and Statistical Hypothesis Testing. Int J of Sci and Math Educ, 6, 405–416. https://doi.org/10.1007/s10763-007-9108-5
dc.relation.referencesen	[10] Levchenko, O., & Dilai, M. (2021). A Method of Automated Corpus-Based Identification of Metaphors for Compiling a Dictionary of Metaphors: A Case Study of the Emotion Conceptual Domain. 2021 IEEE 16th International Conference on Computer Sciences and Information Technologies (CSIT), 52–55. https://doi.org/10.1109/CSIT52700.2021.9648667
dc.relation.referencesen	[11] Levchenko, O., Holtvian, V., & Dilai, M. (2021). Statistical profiles of Ukrainian prose fiction: Gender aspect. 2021 IEEE 16th International Conference on Computer Sciences and Information Technologies (CSIT), 97–100. https://doi.org/10.1109/CSIT52700.2021.9648668
dc.relation.referencesen	[12] Levchenko, O., Tyshchenko, O., & Dilai, M. (2021). Automated identification of metaphors in annotated corpus (Based on substance terms). CEUR Workshop Proceedings, 2870(3), 16–31. Retrieved from: http://ceur-ws.org/Vol-2870/paper3.pdf
dc.relation.referencesen	[13] Lupenko, S. A., Khomiv, B. A., & Sverstyuk, A. S. (2011) Comparative analysis of mathematical models, methods and methods for evaluating opinions in text data from Internet resources. Bulletin of Khmelnytsky National University. 6, 7–16. Retrieved from: http://ceur-ws.org/Vol-2870/paper3.pdf http://journals.khnu.km.ua/vestnik/zmisthtm/2011-6-t.htm
dc.relation.referencesen	[14] Lytvyn, V., Vysotska, V., Uhryn, D., Hrendus, M., & Naum, O. (2018). Analysis of statistical methods for stable combinations determination of keywords identification. Eastern-European Journal of Enterprise Technologies, 2 (2 (92)), 23–37. https://doi.org/10.15587/1729-4061.2018.126009
dc.relation.referencesen	[15] Nikonenko, A. O. (2012). Review of computer-linguistic methods of processing natural language texts. Artificial Intelligence, 4, 235–244. Retrieved from: http://dspace.nbuv.gov.ua/handle/123456789/57737
dc.relation.referencesen	[16] Ostapova, I.V., Shirokov, V.A., Luchik, A. A., & Yablochkov, N. M. The study of the functioning of word equivalents in the text on the material of the Ukrainian National Linguistic Corpus. Speech Technology, (1-2), 114–120.
dc.relation.referencesen	[17] Parshak, K. D. (2014). Text as an object of linguistic research. Scientific journal of M. P. Dragomanov National Pedagogical University. Series 10: Problems of grammar and lexicology of the Ukrainian language, 11, 196–199. Retrieved from: http://nbuv.gov.ua/UJRN/Nchnpu_10_2014_11_46
dc.relation.referencesen	[18] Perebyinis, V. S., (1967) Statistical style settings. Kyiv: Naukova Dumka.
dc.relation.referencesen	[19] Romaniuk, S. (2015). Application of statistical methods in linguistic research. Scientific Proceedings of Ostroh Academy National University: Philology Series, 54, 134–137. Retrieved from: http://eprints.oa.edu.ua/id/eprint/4185
dc.relation.referencesen	[20] Rovenchak, A., & Buk, S. (2011). Application of a quantum ensemble model to linguistic analysis. Physica A: Statistical Mechanics and its Applications, 390(7), 1326–1331. https://doi.org/10.1016/j.physa.2010.12.009
dc.relation.referencesen	[21] Shyrokov, V., Ostapova, I., &Yakymenko, K. (2014) Indexing the etymological lexicographic systems Cognitives Studies. Warsaw : SOW Publishing House, 13–23. https://doi.org/10.11649/cs.2014.001
dc.relation.referencesen	[22] Tkachenko, O., & Humeniuk, M. (2020). Aspects of visualization of statistical and scientific data. Digital platform: information technologies in the socio-cultural sphere, 3(2), 134–147. https://doi.org/10.31866/2617-796x.3.2.2020.220584
dc.relation.referencesen	[23] Zaiats, V. M., & Zaiats, M. M. (2010). Methods of comparing statistical characteristics in the formation of samples in linguistics. Journal of Lviv Polytechnic National University "Information Systems and Networks", 673, 296–305. Retrieved from: http://ena.lp.edu.ua:8080/bitstream/ntb/6753/1/33.pdf
dc.relation.uri	https://doi.org/10.1080/0929617042000314912
dc.relation.uri	http://ena.lp.edu.ua:8080/handle/ntb/39454
dc.relation.uri	https://doi.org/10.1007/s11229-020-02807-9
dc.relation.uri	http://nbuv.gov.ua/UJRN/MoZn_2010_6_5
dc.relation.uri	http://nbuv.gov.ua/UJRN/NaUKMAkn_2017_198_4
dc.relation.uri	https://doi.org/10.1007/s10559-021-00373-7
dc.relation.uri	https://doi.org/10.36930/40300522
dc.relation.uri	https://doi.org/10.1007/s10763-007-9108-5
dc.relation.uri	https://doi.org/10.1109/CSIT52700.2021.9648667
dc.relation.uri	https://doi.org/10.1109/CSIT52700.2021.9648668
dc.relation.uri	http://ceur-ws.org/Vol-2870/paper3.pdf
dc.relation.uri	http://journals.khnu.km.ua/vestnik/zmisthtm/2011-6-t.htm
dc.relation.uri	https://doi.org/10.15587/1729-4061.2018.126009
dc.relation.uri	http://dspace.nbuv.gov.ua/handle/123456789/57737
dc.relation.uri	http://nbuv.gov.ua/UJRN/Nchnpu_10_2014_11_46
dc.relation.uri	http://eprints.oa.edu.ua/id/eprint/4185
dc.relation.uri	https://doi.org/10.1016/j.physa.2010.12.009
dc.relation.uri	https://doi.org/10.11649/cs.2014.001
dc.relation.uri	https://doi.org/10.31866/2617-796x.3.2.2020.220584
dc.relation.uri	http://ena.lp.edu.ua:8080/bitstream/ntb/6753/1/33.pdf
dc.rights.holder	© Національний університет “Львівська політехніка”, 2022
dc.subject	опрацювання даних
dc.subject	статистичний аналіз
dc.subject	лінгвістика тексту
dc.subject	інформаційна система
dc.subject	автоматизація
dc.subject	data processing
dc.subject	statistical analysis
dc.subject	linguistics of the text
dc.subject	information system
dc.subject	automation
dc.title	Моделі та засоби автоматизованого визначення статистичного профілю україномовних текстів
dc.title.alternative	Models and tools for automated determining the statistical profile of ukrainian-language texts
dc.type	Article

Files

Original bundle

Now showing 1 - 2 of 2

Name:: 2022v4n1_Teslyuk_V_M-Models_and_tools_for_automated_37-43.pdf
Size:: 5.63 MB
Format:: Adobe Portable Document Format

Download

Name:: 2022v4n1_Teslyuk_V_M-Models_and_tools_for_automated_37-43__COVER.png
Size:: 1.54 MB
Format:: Portable Network Graphics

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.88 KB
Format:: Plain Text
Description:

Download

Collections

Ukrainian Journal of Information Technology. – 2022. – Vol. 4, No. 1