Метод первинної обробки слабоструктурованих медичних даних

Бичко, Дмитро; Шендрик, Віра; Парфененко, Юлія; Bychko, Dmytro; Shendryk, Vira; Parfenenko, Yuliia

doi:doi.org/10.23939/sisn2020.08.001

Метод первинної обробки слабоструктурованих медичних даних

dc.citation.epage	10
dc.citation.issue	8
dc.citation.journalTitle	Вісник Національного університету "Львівська політехніка". Інформаційні системи та мережі
dc.citation.spage	1
dc.contributor.affiliation	Сумський державний університет
dc.contributor.affiliation	Sumy State University
dc.contributor.author	Бичко, Дмитро
dc.contributor.author	Шендрик, Віра
dc.contributor.author	Парфененко, Юлія
dc.contributor.author	Bychko, Dmytro
dc.contributor.author	Shendryk, Vira
dc.contributor.author	Parfenenko, Yuliia
dc.coverage.placename	Львів
dc.coverage.placename	Lviv
dc.date.accessioned	2022-05-24T11:49:11Z
dc.date.available	2022-05-24T11:49:11Z
dc.date.created	2020-03-01
dc.date.issued	2020-03-01
dc.description.abstract	У статті розглянуто підхід до первинної обробки слабоструктурованих текстових даних медичних протоколів, що зберігаються та розповсюджуються у вигляді файлів у pdf-форматі. Актуальність цієї роботи зумовлена відсутністю універсальної структури подання медичних протоколів та методів їхньої обробки. У ході роботи вирішено задачу первинної обробки даних клінічних протоколів на прикладі уніфікованого клінічного протоколу первинної, вторинної (спеціалізованої) та третинної (високоспеціалізованої) медичної допомоги. Розроблено метод первинної обробки даних для створення чіткої структури симптомів хвороби. Першим етапом структуризації даних клінічного протоколу запропоновано розділення інформації з протоколу на чотири базові частини, що дозволяє пришвидшити його конвертування в інші формати. Цей процес реалізовано за допомогою алгоритму, який розроблено мовою програмування С#. Запропонований алгоритм реалізує парсинг інформації з файлу, що представлений у pdf-форматі, та перетворює її у файл txt. Після цього виконується обробка одержаної інформації, що полягає у синтаксичному аналізі тексту протоколу та виділенні структурних частин протоколу, що відповідають заголовкам розділів: титульний аркуш; вступ; перелік скорочень, що використовуються у протоколі; основна частина протоколу; перелік літературних джерел. Назву хвороби у медичному протоколі ідентифікують, порівнюючи дані з протоколу та переліком назв захворювань, що представлені у світовій класифікації МКХ-10. Було проаналізовано заголовки “Вступ”, “Перелік скорочень, що використовуються у протоколі” та основної частини протоколу і запропоновано алгоритм видалення малоінформативних розділів з початку протоколу, наприклад, літературних джерел. Також розроблено алгоритм пошуку інформації в основній частині медичного протоколу шляхом обробки вхідних даних за таблицями, схемами, заголовками, словами, фразами та спеціальними символами. У результаті роботи алгоритмів обробки клінічного протоколу формується новий файл клінічного протоколу, що має приблизно втричі менший обсяг порівняно з початковим файлом. Він містить у собі лише змістовну інформацію з клінічних протоколів, що прискорить подальшу роботу з цим файлом, а саме його використання в системі підтримки прийняття медичних рішень. Представлено картку хвороби на основі медичного протоколу у форматі JSON.
dc.description.abstract	The article deals with the approach to the primary processing of poorly structured medical protocol textual data stored and disseminated as pdf files. The relevance of this work is due to the lack of a universal structure for the presentation of medical protocols and methods of their processing. In the course of the work, the problem of primary processing of clinical protocol data was solved by the example of a unified clinical protocol of primary, secondary (specialized) and tertiary (highly specialized) medical care. The method of primary data processing was developed to create a clear structure of the symptoms of the disease. The first step in structuring clinical protocol data is to divide the protocol information into four basic parts, which allows it to be quickly converted to other formats. This process is implemented using an algorithm developed in C programming language. The proposed algorithm parses the information from a pdf file and converts it to a txt file. After that, the received information is processed, which consists in the syntactic analysis of the text of the protocol and selection of the structural parts of the protocol corresponding to the headings of the sections: title page; introduction; a list of abbreviations used in the protocol; the main part of the protocol; list of literary sources. The identification of the disease name in the medical protocol is performed by comparing the protocol data and the list of disease names, presented in the world classification MKH-10. The headings “Introduction”, “List of abbreviations used in the protocol” and the main part of the protocol were analyzed and the algorithm for removing uninformed sections from the beginning of the protocol, for example, literature sources, was proposed. An algorithm for finding information in the main part of the medical protocol by processing input data by: tables, diagrams, headings, words, phrases and special symbols are also proposed. As a result of the clinical protocol processing algorithms, a new clinical protocol file is generated, which is three times smaller than the original file. It contains only meaningful information from clinical protocols that will speed up further work on this file, namely its use in medical decision support. The disease card based on a medical protocol in JSON format is presented.
dc.format.extent	1-10
dc.format.pages	10
dc.identifier.citation	Бичко Д. Метод первинної обробки слабоструктурованих медичних даних / Дмитро Бичко, Віра Шендрик, Юлія Парфененко // Вісник Національного університету "Львівська політехніка". Інформаційні системи та мережі. — Львів : Видавництво Львівської політехніки, 2020. — № 8. — С. 1–10.
dc.identifier.citationen	Bychko D. The method of primary processing of poorly structured medical data / Dmytro Bychko, Vira Shendryk, Yuliia Parfenenko // Visnyk Natsionalnoho universytetu "Lvivska politekhnika". Informatsiini systemy ta merezhi. — Lviv : Vydavnytstvo Lvivskoi politekhniky, 2020. — No 8. — P. 1–10.
dc.identifier.doi	doi.org/10.23939/sisn2020.08.001
dc.identifier.uri	https://ena.lpnu.ua/handle/ntb/56905
dc.language.iso	uk
dc.publisher	Видавництво Львівської політехніки
dc.relation.ispartof	Вісник Національного університету "Львівська політехніка". Інформаційні системи та мережі, 8, 2020
dc.relation.references	1. Jensen, K., Soguero-Ruiz, C., Oyvind Mikalsen, K., Lindsetmo, R., Kouskoumvekaki, I., Girolami, M., Augestad, K. M. (2017). Analysis of free text in electronic health records for identification of cancer patient trajectories. Scientific Reports, 7(1). doi:10.1038/srep46226
dc.relation.references	2. Kung, R., Ma, A., Dever, J. B., Vadivelu, J., Cherk, E., Koola, J. D., Ho, S. B. (2015). Mo1043 a natural language processing Alogrithm for identification of patients with cirrhosis from electronic medical records. Gastroenterology, 148(4), S-1071–S-1072. doi:10.1016/s0016-5085(15)33662-3
dc.relation.references	3. Li, D., Azoulay, P., & Sampat, B. N. (2017). The applied value of public investments in biomedical research. Science, 356(6333), 78–81. doi:10.1126/science.aal0010
dc.relation.references	4. Patel, R., Lloyd, T., Jackson, R., Ball, M., Shetty, H., Broadbent, M., Taylor, M. (2015). Mood instability is a common feature of mental health disorders and is associated with poor clinical outcomes. BMJ Open, 5(5), e007504–e007504. doi:10.1136/bmjopen-2014-007504
dc.relation.references	5. Wi, C., Sohn, S., Rolfes, M. C., Seabright, A., Ryu, E., Voge, G., Juhn, Y. J. (2017). Application of a natural language processing algorithm to asthma ascertainment. An automated chart review. American Journal of Respiratory and Critical Care Medicine, 196(4), 430–437. doi:10.1164/rccm.201610-2006oc
dc.relation.references	6. Afzal, N., Sohn, S., Abram, S., Scott, C. G., Chaudhry, R., Liu, H., Arruda-Olson, A. M. (2017). Mining peripheral arterial disease cases from narrative clinical notes using natural language processing. Journal of Vascular Surgery, 65(6), 1753–1761. doi:10.1016/j.jvs.2016.11.031
dc.relation.references	7. O365devx. (n.d.). Working with XML Schemas in InfoPath. Technical documentation, API, and code examples \| Microsoft Docs. https://docs.microsoft.com/en-us/office/client-developer/infopath/form-templates/working-withxml-schemas-in-infopath
dc.relation.references	8. The Latest MML (Medical Markup Language) Version 2.3 – XML-Based Standard for Medical Data Exchange/Storage. (n.d.). ResearchGate. https://www.researchgate.net/publication/10675074_The_Latest_MML_Medical_Markup_Language_Version_23_-_XML-Based_Standard_for_Medical_Data_ExchangeStorag e
dc.relation.references	9. Parsing PDF Files using iTextSharp (C, .NET). (n.d.). Square PDF .NET. https://www.squarepdf.net/parsing-pdf-files-using-itextsharp
dc.relation.referencesen	1. Jensen, K., Soguero-Ruiz, C., Oyvind Mikalsen, K., Lindsetmo, R., Kouskoumvekaki, I., Girolami, M., Augestad, K. M. (2017). Analysis of free text in electronic health records for identification of cancer patient trajectories. Scientific Reports, 7(1). doi:10.1038/srep46226
dc.relation.referencesen	2. Kung, R., Ma, A., Dever, J. B., Vadivelu, J., Cherk, E., Koola, J. D., Ho, S. B. (2015). Mo1043 a natural language processing Alogrithm for identification of patients with cirrhosis from electronic medical records. Gastroenterology, 148(4), S-1071-S-1072. doi:10.1016/s0016-5085(15)33662-3
dc.relation.referencesen	3. Li, D., Azoulay, P., & Sampat, B. N. (2017). The applied value of public investments in biomedical research. Science, 356(6333), 78-81. doi:10.1126/science.aal0010
dc.relation.referencesen	4. Patel, R., Lloyd, T., Jackson, R., Ball, M., Shetty, H., Broadbent, M., Taylor, M. (2015). Mood instability is a common feature of mental health disorders and is associated with poor clinical outcomes. BMJ Open, 5(5), e007504–e007504. doi:10.1136/bmjopen-2014-007504
dc.relation.referencesen	5. Wi, C., Sohn, S., Rolfes, M. C., Seabright, A., Ryu, E., Voge, G., Juhn, Y. J. (2017). Application of a natural language processing algorithm to asthma ascertainment. An automated chart review. American Journal of Respiratory and Critical Care Medicine, 196(4), 430–437. doi:10.1164/rccm.201610-2006oc
dc.relation.referencesen	6. Afzal, N., Sohn, S., Abram, S., Scott, C. G., Chaudhry, R., Liu, H., Arruda-Olson, A. M. (2017). Mining peripheral arterial disease cases from narrative clinical notes using natural language processing. Journal of Vascular Surgery, 65(6), 1753–1761. doi:10.1016/j.jvs.2016.11.031
dc.relation.referencesen	7. O365devx. (n.d.). Working with XML Schemas in InfoPath. Technical documentation, API, and code examples \| Microsoft Docs. https://docs.microsoft.com/en-us/office/client-developer/infopath/form-templates/working-with-xmlschemas-in-infopath
dc.relation.referencesen	8. The Latest MML (Medical Markup Language) Version 2.3 – XML-Based Standard for Medical Data Exchange/Storage. (n.d.). ResearchGate. https://www.researchgate.net/publication/10675074_The_Latest_MML_Medical_Markup_Language_Version_23_-_XMLBased_Standard_for_Medical_Data_ExchangeStorage
dc.relation.referencesen	9. Parsing PDF Files using iTextSharp (C, .NET). (n.d.). Square PDF .NET. https://www.squarepdf.net/parsing-pdffiles-using-itextsharp
dc.relation.uri	https://docs.microsoft.com/en-us/office/client-developer/infopath/form-templates/working-withxml-schemas-in-infopath
dc.relation.uri	https://www.researchgate.net/publication/10675074_The_Latest_MML_Medical_Markup_Language_Version_23_-_XML-Based_Standard_for_Medical_Data_ExchangeStorag
dc.relation.uri	https://www.squarepdf.net/parsing-pdf-files-using-itextsharp
dc.relation.uri	https://docs.microsoft.com/en-us/office/client-developer/infopath/form-templates/working-with-xmlschemas-in-infopath
dc.relation.uri	https://www.researchgate.net/publication/10675074_The_Latest_MML_Medical_Markup_Language_Version_23_-_XMLBased_Standard_for_Medical_Data_ExchangeStorage
dc.relation.uri	https://www.squarepdf.net/parsing-pdffiles-using-itextsharp
dc.rights.holder	© Національний університет “Львівська політехніка”, 2020
dc.rights.holder	© Бичко Д. В., Шендрик В. В., Парфененко Ю. В., 2020
dc.subject	слабоструктуровані медичні дані
dc.subject	клінічний протокол
dc.subject	первинна обробка
dc.subject	природомовні тексти
dc.subject	метод
dc.subject	псевдокод
dc.subject	poorly structured medical data
dc.subject	clinical protocol
dc.subject	primary processing
dc.subject	naturalistic texts
dc.subject	method
dc.subject	pseudocode
dc.subject.udc	004.421.6
dc.title	Метод первинної обробки слабоструктурованих медичних даних
dc.title.alternative	The method of primary processing of poorly structured medical data
dc.type	Article

Files

Original bundle

Now showing 1 - 2 of 2

Name:: 2020n8_Bychko_D-The_method_of_primary_processing_1-10.pdf
Size:: 761.95 KB
Format:: Adobe Portable Document Format

Download

Name:: 2020n8_Bychko_D-The_method_of_primary_processing_1-10__COVER.png
Size:: 407.08 KB
Format:: Portable Network Graphics

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.87 KB
Format:: Plain Text
Description:

Download

Collections

Вісник Національного університету "Львівська політехніка". Інформаційні системи та мережі. – 2020. – Випуск 8