Development of a unified output format for text parsers in the ontology construction system from text documents

Chornyi, Andrii; Dosyn , Dmytro

Development of a unified output format for text parsers in the ontology construction system from text documents

dc.contributor.affiliation	Lviv Polytechnic National University
dc.contributor.author	Chornyi, Andrii
dc.contributor.author	Dosyn , Dmytro
dc.coverage.placename	Львів
dc.date.accessioned	2025-10-28T11:10:10Z
dc.date.issued	2025
dc.date.submitted	2025
dc.description.abstract	The challenge of effectively constructing ontologies from text documents remains unresolved, posing a critical gap in modern knowledge extraction methodologies. One of the primary obstacles is the lack of a standardized output format across different NLP tools, particularly text parsers, which serve as the foundational step in multi-stage knowledge extraction processes. While several widely used text parsers exist, each excels in specific functions, making it beneficial to leverage multiple parsers for more comprehensive ontology construction. However, this approach introduces the issue of reconciling their disparate output formats. To address this challenge, we propose using a graph database to store parser outputs in a subject predicate-object triple format, enabling seamless integration and further processing through rule-based transformations using SPARQL queries. A key advantage of this approach is the ability to execute new transformation rules dynamically, allowing for greater flexibility and efficiency in ontology generation. As part of our research, we developed an intelligent agent in Java capable of constructing semantic graphs from natural language text using a rule-based approach. The agent was employed to evaluate the relationship between the execution time of syntax-semantic transformation rules and variables such as text corpus size and dataset sample dimensions. This evaluation was made possible through the implementation of first-level reflection for the studied transformation rule. The results demonstrate that our approach – standardizing parser outputs via a graph database – roves effective in terms of both computational complexity and processing speed. By streamlining the ontology construction process, our method paves the way for advanced automated learning of intelligent agents based on textual information, unlocking new possibilities for modern science in the realm of knowledge extraction and representation. Проблема відсутності ефективних засобів побудови онтологій з текстових документів все ще залишається невирішеною. Її розв'язання стикається з низкою викликів, зокрема, відсутністю єдиного формату вихідних даних різних NLP інструментів, зокрема текстових парсерів, які є першою ланкою в багатоетапному процесі видобування знань. На сьогоднішній день існує декілька популярних текстових парсерів, кожен з яких має свої особливості та переваги у реалізації окремих функцій. З метою ефективнішого вирішення проблеми побудови онтології з тексту доцільним є використання декількох текстових парсерів, що породжує проблему узгодження форматів вихідних даних цих NLP інструментів. Для вирішення задачі уніфікації формату вихідних даних текстових парсерів, запропоновано використання графової бази даних для їх збереження у форматі триплета суб’єкт предикат-об’єкт з метою подальшого опрацювання з використанням правило-орієнтованих трансформацій на основі SPARQL запитів. Суттєвою перевагою такого підходу є можливість виконання кожного нового правила "на льоту". В рамках дослідження розроблено інтелектуального агента на мові Java, здатного будувати семантичні графи з природомовного тексту на основі правило-орієнтованого підходу. За допомогою розробленого інтелектуального агента проведено оцінку залежності часу виконання правила синтаксично-семантичної трансформації від об’єму текстового корпусу та розмірів вибірок даних. Дане оцінювання стало можливим за рахунок імплементованої рефлексії першого рівня для досліджуваного правила трансформації. За результатами дослідження, запропонований підхід уніфікації вихідних даних текстових парсерів з використанням графової бази даних показав свою ефективність з точки зору складності операції та швидкодії. Розроблений підхід побудови онтології з тексту відкриває перед сучасною наукою нові горизонти для автоматизованого навчання інтелектуального агента на основі текстової інформації.
dc.format.pages	170-188
dc.identifier.citation	Chornyi A. Development of a unified output format for text parsers in the ontology construction system from text documents / Andrii Chornyi, Dmytro Dosyn // Вісник Національного університету “Львівська політехніка”. Серія: Інформаційні системи та мережі. — Львів : Видавництво Львівської політехніки, 2025. — № 17. — С. 170–188.
dc.identifier.uri	https://ena.lpnu.ua/handle/ntb/115420
dc.language.iso	en
dc.publisher	Національний університет «Львівська політехніка»
dc.relation.references	1. Apache Open NLP Website. (n.d.). (Apache) Retrieved from https://opennlp.apache.org/ 2. Asim, M. N., Wasim, M., Khan, M. U., Mahmood, W., & Abbasi, H. M. (2018). A survey of ontology learning techniques and applications. Database: The Journal of Biological Databases and Curation, 2018(bay101). doi:10.5120/2610-3642 3. Basaraba, I., Bets, I., & Bets, Y. (2024). Current trends in the recognition and decoding of phraseological units. Current Issues of the Humanities, 74(1), 211-216. doi:10.24919/2308-4863/74-1-29 4. Chornyi, A. (2024). Development of an adequate intellectual agent for a wide subject area as a model for further scientific research. Abstract. Retrieved from https://www.academia.edu/127201897 5. CoreNLP vs Apache OpenNLP. (n.d.). (Awsome Java) Retrieved from https://java.libhunt.com/compare-corenlp vs-apache-opennlp 6. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019, June). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), pp. 4171-4186. doi:10.48550/arXiv.1810.04805 7. Doroshenko, A. (2018). Development of information technology for intellectual analysis of factographic information. Bionics of Intelligence, 1 (90), 116-121. doi:10.11591/eei.v11i5.3075 8. Dosyn, D., & Lytvyn, V. (2021). Models and methods for determining the usefulness of ontological knowledge: Monograph. Lviv: "Novyy svit – 2000". 9. Dosyn, D., Daradkeh, Y., Kovalevych, V., Luchkevych, M., & Kis, Y. (2022). Domain Ontology Learning using Link Grammar Parser and WordNet. MoMLeT+DS 2022: 4-th International Workshop on Modern Machine Learning Technologies and Data Science. Leiden-Lviv, The Netherlands-Ukraine. Retrieved from https://ceur ws.org/Vol-3312/paper2.pdf 10. GATE website. (n.d.). Retrieved from https://gate.ac.uk/ 11. Haiko, C. (2023). Ontology-driven means for processing and presentation of large arrays of unstructured texts. Innovative Technologies and Scientific Solutions for Industries, 2(24), 27-38. doi:10.30837/ITSSI.2023.24.027 12. Hlybovets, M., & Bobko, O. (2012). The methods of automatic ontology generation. NaUKMA Research Papers. Computer Science, 138, 61-67. Retrieved from https://ekmair.ukma.edu.ua/handle/123456789/1917 13. Kumari, P. (2024, October 26). 7 Top NLP Libraries For NLP Development. Retrieved from https://www.labellerr.com/blog/top-7-nlp-libraries-for-nlp-development 14. Linked Open Data Cloud. (n.d.). Retrieved from https://www.lod-cloud.net/ 15. Lytvyn, V., & Cherna, T. (2014). The problem of automated development of a basic ontology. Journal of Lviv Polytechnic National University "Information Systems and Networks", 805, 306–315. 16. Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, (pp. 55-60). Baltimore, Maryland, USA. doi:10.3115/v1/P14 5010 17. Mousavi, H., Kerr, D., Iseli, M., & Zaniolo, C. (2014). Harvesting Domain Specific Ontologies from Text. International Conference on Semantic Computing. Newport Beach, CA, USA. doi:10.1109/ICSC.2014.12 18. Nanavati, J., & Ghodasara, Y. (2015, November). A Comparative Study of Stanford NLP and Apache. International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, 5(5), 57-60. Retrieved from https://www.ijsce.org/wp-content/uploads/papers/v5i5/E2744115515.pdf 19. NTLK website. (n.d.). (NLTK Project) Retrieved from https://www.nltk.org/ 20. Schmitt, X., Kubler, S., Robert, J., Papadakis, M., & LeTraon, Y. (2019). A Replicable Comparison Study of NER Software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate. Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS). Granada, Spain. doi:10.1109/SNAMS.2019.8931850 21. Shaptala, R. (2023). Dictionary embeddings for document classification in low-resource natural language processing. – Qualification scientific work as manuscript. Kyiv. Отримано з https://ela.kpi.ua/items/14de271d 5971-4cdc-92e6-8e645336332d University 22. Shvorob, I. (2015). Comparative analysis of methods for syntactic parsing of texts. Journal of Lviv Polytechnic National "Information Systems and Networks", 814, 197-202. Retrieved from http://nbuv.gov.ua/UJRN/VNULPICM_2015_814_22 23. spaCy website. (n.d.). Retrieved from https://spacy.io/ 24. Stanford CoreNLP website. (n.d.). Retrieved from https://stanfordnlp.github.io/CoreNLP/ 25. Vovnianka, R., Dosyn, D., & Kovalevych, V. (2014). The method of knowledge extraction from text documents. Journal of Lviv Polytechnic National University "Information Systems and Networks", 783, 302–312. 26. Yunchyk, V., Kunanets, N., Pasichnyk, V., & Fedoniuk, A. (2021, 10). Analysis of artificial intellectual agents for e-learning systems. Journal of Lviv Polytechnic National University "Information Systems and Networks", 10, 41 57. doi:10.23939/sisn2021.10.041 27. Zezula, T. (2020, August 29). 15 Natural Language Processing Libraries Worth a Try. Retrieved from https://www.tomaszezula.com/natural-language-processing-libraries 28. Zlatareva, N., & Amin, D. (2021). Processing Natural Language Queries in Semantic Web Applications. The 7th World Congress on Electrical Engineering and Computer Systems and Science (EECSS’21). doi:10.11159/cist21.108
dc.relation.uri	https://doi.org/10.23939/sisn2025.17.170
dc.subject	natural language processing, ontology, automatic ontology construction, automated learning, syntax-semantic patterns, опрацювання природної мови, онтологія, автоматична побудова онтології, автоматизоване навчання, синтаксично-семантичні шаблони.
dc.subject.udc	004.89
dc.title	Development of a unified output format for text parsers in the ontology construction system from text documents
dc.title.alternative	Розроблення єдиного формату вихідних даних для текстових парсерів в системі побудови онтології з текстових документів
dc.type	Article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: maket25066219052025ves-173-191.pdf
Size:: 605.42 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Вісник Національного університету "Львівська політехніка". Інформаційні системи та мережі. – 2025. – Випуск 17