Метод і модель опрацювання текстової інформації на навченому трансформері для бази знань

Литвин, Василь; Тимчук, Володимир; Lytvyn, Vasyl; Tymchuk, Volodymyr

doi:doi.org/10.23939/sisn2023.14.210

Метод і модель опрацювання текстової інформації на навченому трансформері для бази знань

dc.citation.epage	224
dc.citation.issue	14
dc.citation.journalTitle	Вісник Національного університету “Львівська політехніка”. Серія: Інформаційні системи та мережі
dc.citation.spage	210
dc.contributor.affiliation	Національний університет “Львівська політехніка”
dc.contributor.affiliation	Національна академія сухопутних військ імені гетьмана Петра Сагайдачного
dc.contributor.affiliation	Lviv Polytechnic National University
dc.contributor.affiliation	Hetman Petro Sahaidachnyi National Army Academy
dc.contributor.author	Литвин, Василь
dc.contributor.author	Тимчук, Володимир
dc.contributor.author	Lytvyn, Vasyl
dc.contributor.author	Tymchuk, Volodymyr
dc.coverage.placename	Львів
dc.coverage.placename	Lviv
dc.date.accessioned	2025-09-12T07:21:53Z
dc.date.created	2023-02-28
dc.date.issued	2023-02-28
dc.description.abstract	Невпорядкована база знань формується із різних множин нестандартизованих документів. У системі підтримки прийняття рішень ключовим є своєчасний доступ до інформації із бази знань. У статті описано модель інформаційно-пошукової системи щодо роботи з множиною знань, поданих у форматі PDF, одному із основних у військово-спеціалізованих базах знань. Модель розроблено на навченому трансформері із забезпеченням міжмовного перекладу, що загалом формує метод обробки текстової інформації.
dc.description.abstract	To form a knowledge base is complicated problem traditionally. There are a lot kind of objects that are possibly used for forming a knowledge base. These objects may have different structures, formats, ways of data representation, languages. The simple conjunction is not effective and suitable. In general case the knowledge base has got as an unordered knowledge base. There are uncategorized documents in such unordered knowledge base with different formats that causes the special and particular approaches for recognition, systematization and next processing of some textual information. It’s why the complexes of automation for all stages of processing are complicated. Naturally it is a restriction for some kind of the decision support system, especially in military or other applications with key time factor (to get a quick and exact access to the knowledge base in decision support system). So, we analyzed the mentioned restrictions and conditions for forming a knowledge base in the paper. We depicted that the ontology of knowledge base both in general and specific cases includes such operations as data collection, data regularization, extraction of knowledge, data conversion for matrix representation, data language processing, tokenization, output generation for a request and machine learning for information-retrieval system optimization. There is a model of information-retrieval system for knowledge base with widely- used PDF-documents that is proposed in the paper. We made the model using open learned transformer and Llama Index framework to decrease the time demands in the information-retrieval system. Also, we included the language processing models for translation the specific textual information from Ukrainian into English and back. As a result, we got the method and the model for processing the textual information from PDF-document in Ukrainian that could be effective in any decision support system. The method ensures the reading, tokenization, translation, analysis and retrieve generation of the data in Ukrainian. The model showed its simple, stable and exact estimations, but there are also some disadvantages, high time installation/compilation and little language defaults are some of them. The results encourage us to continue the research and to get the statistics set to analyze the model estimation more properly.
dc.format.extent	210-224
dc.format.pages	15
dc.identifier.citation	Литвин В. Метод і модель опрацювання текстової інформації на навченому трансформері для бази знань / Василь Литвин, Володимир Тимчук // Вісник Національного університету “Львівська політехніка”. Серія: Інформаційні системи та мережі. — Львів : Видавництво Львівської політехніки, 2023. — № 14. — С. 210–224.
dc.identifier.citationen	Lytvyn V. The method and the model for processing textual information on a learned transformer for information-retrieval system / Vasyl Lytvyn, Volodymyr Tymchuk // Information Systems and Networks. — Lviv : Lviv Politechnic Publishing House, 2023. — No 14. — P. 210–224.
dc.identifier.doi	doi.org/10.23939/sisn2023.14.210
dc.identifier.uri	https://ena.lpnu.ua/handle/ntb/111705
dc.language.iso	uk
dc.publisher	Видавництво Львівської політехніки
dc.publisher	Lviv Politechnic Publishing House
dc.relation.ispartof	Вісник Національного університету “Львівська політехніка”. Серія: Інформаційні системи та мережі, 14, 2023
dc.relation.ispartof	Information Systems and Networks, 14, 2023
dc.relation.references	1. Вовнянка, Р., Досин, Д., Ковалевич, В. (2014). Метод видобування знань з текстових документів. Вісник Національного університету “Львівська політехніка”. Серія: “Інформаційні системи та мережі”, № 783, 303–312.
dc.relation.references	2. Литвин, В. (2011). Бази знань інтелектуальних систем підтримки прийняття рішень. Львів: Вид-во Нац. ун-ту “Львівська політехніка”. 240 с.
dc.relation.references	3. Вавіленкова, А. (2013). Аналіз методів обробки текстової інформації. Вісник НТУ “ХПІ”, № 39 (1012).
dc.relation.references	4. Литвин, В. (2013). Метод видобування знань з природомовних текстів для автоматизованої розбудови онтологій. Автоматизовані системи управління та прилади автоматики, № 164, 67–72.
dc.relation.references	5. Палагін, О., Петренко М. (2017). Розбудова абстрактної моделі мовно-онтологічної інформаційної системи. Математичні машини і системи, № 1, 42–50.
dc.relation.references	6. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. URL: https://www.deeplearningbook.org/.
dc.relation.references	7. Schmidt, Robin M. (2019). Recurrent Neural Networks (RNNs): A gentle Introduction and Overview. Computer Science. Machine Learning. URL: https://arxiv.org/abs/1912.05911v1.
dc.relation.references	8. Rahman, M., Islam, M., Sassi, R. et al. (2019). Convolutional neural networks performance comparison for handwritten Bengali numerals recognition. SN Appl. Sci. 1, 1660. URL: https://doi.org/10.1007/s42452-019-1682-y.
dc.relation.references	9. Brown, T. B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A. & others (2020), 'Language models are few-shot learners'. URL: arXiv preprint arXiv:2005.14165.
dc.relation.references	10. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1, 4171–4186. URL: https://aclanthology.org/N19-1423.pdf.
dc.relation.references	11. Gomez, A. N., Jones, L., Kaiser, Ł., Parmar, N., Polosukhin, I., Shazeer, N., Uszkoreit, J., Vaswani, A. (2017). Attention is All You Need. In 31st Conf. on Neural Information Processing Systems. URL: arXiv:1706.03762v5.
dc.relation.references	12. He, K.; Zhang, X.; Ren, S.; Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.
dc.relation.references	13. Graves, A. (2013). Generating sequences with recurrent neural networks. URL: arXiv:1308.0850.
dc.relation.references	14. Ba, J.; Kiros, J. and Hinton, G. (2016). Layer normalization. URL: arXiv:1607.06450.
dc.relation.references	15. Gehring, J.; Auli, M.; Grangier, D.; Yarats, D. and Dauphin, Y. (2017). Convolutional sequence to sequence learning. URL: arXiv:1705.03122v2.
dc.relation.references	16. Алімпієв, А., Пєвцов, Г., Гриб Д. та ін. (2019). Озброєння і військова техніка Російської Федерації: довідник учасника АТО. За заг. ред. А. Алімпієва. Харків, 1112.
dc.relation.referencesen	1. Vovnianka, R., Dosyn, D., Kovalevych, V. (2014). Metod vydobuvannia znan z tekstovykh dokumentiv. Visnyk Natsionalnoho universytetu "Lvivska politekhnika". Serie: "Informatsiini systemy ta merezhi", No 783, 303–312.
dc.relation.referencesen	2. Lytvyn, V. (2011). Bazy znan intelektualnykh system pidtrymky pryiniattia rishen. Lviv: Vyd-vo Nats. un-tu "Lvivska politekhnika". 240 p.
dc.relation.referencesen	3. Vavilenkova, A. (2013). Analiz metodiv obrobky tekstovoi informatsii. Visnyk NTU "KhPI", No 39 (1012).
dc.relation.referencesen	4. Lytvyn, V. (2013). Metod vydobuvannia znan z pryrodomovnykh tekstiv dlia avtomatyzovanoi rozbudovy ontolohii. Avtomatyzovani systemy upravlinnia ta prylady avtomatyky, No 164, 67–72.
dc.relation.referencesen	5. Palahin, O., Petrenko M. (2017). Rozbudova abstraktnoi modeli movno-ontolohichnoi informatsiinoi systemy. Matematychni mashyny i systemy, No 1, 42–50.
dc.relation.referencesen	6. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. URL: https://www.deeplearningbook.org/.
dc.relation.referencesen	7. Schmidt, Robin M. (2019). Recurrent Neural Networks (RNNs): A gentle Introduction and Overview. Computer Science. Machine Learning. URL: https://arxiv.org/abs/1912.05911v1.
dc.relation.referencesen	8. Rahman, M., Islam, M., Sassi, R. et al. (2019). Convolutional neural networks performance comparison for handwritten Bengali numerals recognition. SN Appl. Sci. 1, 1660. URL: https://doi.org/10.1007/s42452-019-1682-y.
dc.relation.referencesen	9. Brown, T. B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A. & others (2020), 'Language models are few-shot learners'. URL: arXiv preprint arXiv:2005.14165.
dc.relation.referencesen	10. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1, 4171–4186. URL: https://aclanthology.org/N19-1423.pdf.
dc.relation.referencesen	11. Gomez, A. N., Jones, L., Kaiser, Ł., Parmar, N., Polosukhin, I., Shazeer, N., Uszkoreit, J., Vaswani, A. (2017). Attention is All You Need. In 31st Conf. on Neural Information Processing Systems. URL: arXiv:1706.03762v5.
dc.relation.referencesen	12. He, K.; Zhang, X.; Ren, S.; Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.
dc.relation.referencesen	13. Graves, A. (2013). Generating sequences with recurrent neural networks. URL: arXiv:1308.0850.
dc.relation.referencesen	14. Ba, J.; Kiros, J. and Hinton, G. (2016). Layer normalization. URL: arXiv:1607.06450.
dc.relation.referencesen	15. Gehring, J.; Auli, M.; Grangier, D.; Yarats, D. and Dauphin, Y. (2017). Convolutional sequence to sequence learning. URL: arXiv:1705.03122v2.
dc.relation.referencesen	16. Alimpiiev, A., Pievtsov, H., Hryb D. and other (2019). Ozbroiennia i viiskova tekhnika Rosiiskoi Federatsii: dovidnyk uchasnyka ATO. by gen. ed. A. Alimpiieva. Kharkiv, 1112.
dc.relation.uri	https://www.deeplearningbook.org/
dc.relation.uri	https://arxiv.org/abs/1912.05911v1
dc.relation.uri	https://doi.org/10.1007/s42452-019-1682-y
dc.relation.uri	https://aclanthology.org/N19-1423.pdf
dc.rights.holder	© Національний університет “Львівська політехніка”, 2023
dc.rights.holder	© Литвин В., Тимчук В., 2023
dc.subject	система обробки інформації
dc.subject	система підтримки прийняття рішень
dc.subject	метод обробки мови та тексту на навченому трансформері
dc.subject	машинне навчання
dc.subject	онтологія баз даних
dc.subject	множини знань
dc.subject	deep learning machine in data-processing system
dc.subject	information-retrieval system
dc.subject	decision support system
dc.subject	method for processing textual information
dc.subject	ontology of knowledge base
dc.subject	extraction of knowledge
dc.subject.udc	004.89
dc.subject.udc	004.738.5
dc.subject.udc	004.415.3
dc.subject.udc	004.82(045)
dc.title	Метод і модель опрацювання текстової інформації на навченому трансформері для бази знань
dc.title.alternative	The method and the model for processing textual information on a learned transformer for information-retrieval system
dc.type	Article

Files

Original bundle

Now showing 1 - 2 of 2

Name:: 2023n14_Lytvyn_V-The_method_and_the_model_210-224.pdf
Size:: 9.91 MB
Format:: Adobe Portable Document Format

Download

Name:: 2023n14_Lytvyn_V-The_method_and_the_model_210-224__COVER.png
Size:: 371.54 KB
Format:: Portable Network Graphics

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.82 KB
Format:: Plain Text
Description:

Download

Collections

Вісник Національного університету "Львівська політехніка". Інформаційні системи та мережі. – 2023. – Випуск 14