A statistical approach to coronavirus classification based on nucleotide distributions
| dc.citation.epage | 994 | |
| dc.citation.issue | 4 | |
| dc.citation.journalTitle | Математичне моделювання та обчислення | |
| dc.citation.spage | 987 | |
| dc.contributor.affiliation | Львівський національний університет імені Івана Франка | |
| dc.contributor.affiliation | Ivan Franko National University of Lviv | |
| dc.contributor.affiliation | SoftServe, Inc. | |
| dc.contributor.author | Гусєв, М. | |
| dc.contributor.author | Ровенчак, А. | |
| dc.contributor.author | Husiev, M. | |
| dc.contributor.author | Rovenchak, A. | |
| dc.coverage.placename | Львів | |
| dc.coverage.placename | Lviv | |
| dc.date.accessioned | 2026-03-02T07:29:08Z | |
| dc.date.created | 2024-02-24 | |
| dc.date.issued | 2024-02-24 | |
| dc.description.abstract | Метою цього дослідження є аналіз конкретних геномів, а саме РНК коронавірусів, на основі параметрів, отриманих із розподілу нуклеотидних послідовностей у їхніх РНК. Вірусна РНК була розділена на нуклеотидні послідовності, отримані шляхом зміни однієї нуклеотидної основи (аденін) на пробіл, причому порожні послідовності позначено як x. Для послідовностей побудовано статистичні спектри. Вони показали три чіткі піки, які були послідовними для досліджуваних видів. Розраховано параметри на основі ранґово-частотного розподілу отриманих нуклеотидних послідовностей, довжини послідовностей та деякі інші статистичні параметри. На підставі цих параметрів було визначено головні компоненти, які лягли в основу групування досліджуваних вірусів. Найбільш релевантні параметри сформували модель наївного класифікатора Баєса, що аналізує ймовірність належності вірусу до певної групи вірусів у моделі. | |
| dc.description.abstract | The objective of this study is to analyze specific genomes, namely the RNA of coronaviruses, based on the parameters obtained from the distributions of nucleotide sequences in their RNA. The viral RNA was subjected to distribution based on nucleotide sequences obtained by changing one nucleotide base (adenine) into a “whitespace”, with empty sequences denoted as “x”. Statistical spectra were constructed in such cases. They exhibited three distinct peaks that were consistent across the studied species. Parameters based on the rank–frequency distributions of the obtained nucleotide sequences, sequence lengths, and some other statistical parameters were calculated. Based on these parameters, the principal components were built, which were the basis for the grouping of the studied viruses. The most relevant parameters formed the model of a na¨ıve Bayes classifier, which analyzes the probability of the virus belonging to a certain group of viruses in the model. | |
| dc.format.extent | 987-994 | |
| dc.format.pages | 8 | |
| dc.identifier.citation | Husiev M. A statistical approach to coronavirus classification based on nucleotide distributions / M. Husiev, A. Rovenchak // Mathematical Modeling and Computing. — Lviv : Lviv Politechnic Publishing House, 2024. — Vol 11. — No 4. — P. 987–994. | |
| dc.identifier.citationen | Husiev M. A statistical approach to coronavirus classification based on nucleotide distributions / M. Husiev, A. Rovenchak // Mathematical Modeling and Computing. — Lviv : Lviv Politechnic Publishing House, 2024. — Vol 11. — No 4. — P. 987–994. | |
| dc.identifier.doi | 10.23939/mmc2024.04.987 | |
| dc.identifier.uri | https://ena.lpnu.ua/handle/ntb/124711 | |
| dc.language.iso | en | |
| dc.publisher | Видавництво Львівської політехніки | |
| dc.publisher | Lviv Politechnic Publishing House | |
| dc.relation.ispartof | Математичне моделювання та обчислення, 4 (11), 2024 | |
| dc.relation.ispartof | Mathematical Modeling and Computing, 4 (11), 2024 | |
| dc.relation.references | [1] ArtimeO.,DeDomenicoM.Fromtheoriginoflifetopandemics: emergentphenomenaincomplexsystems. PhilosophicalTransactionsoftheRoyalSocietyA:MathematicalPhysicalandEngineeringSciences. 380 (2227),20200410(2022). | |
| dc.relation.references | [2] CanforaG.,MercaldoF., SantoneA.Anovel classificationtechniquebasedon formalmethods. ACM Transactionson Knowledge DiscoveryfromData. 17(8),1–30(2023). | |
| dc.relation.references | [3] Raman R., Gupta N., Jeppu Y. Framework for formal verification of machine learning based complex system-of-systems. Insight. 26 (1), 91–102 (2023). | |
| dc.relation.references | [4] Holovatch Y., Kenna R., Thurner S. Complex systems: physics beyond physics. European Journal of Physics. 38 (2), 023002 (2017). | |
| dc.relation.references | [5] Newman M. Networks. Oxford University Press; 2nd edition (2018). | |
| dc.relation.references | [6] Tabish M., Azim S., Hussain M. A., Rehman S. U., Sarwar T., Ishqi H. M. Bioinformatics approaches in studying microbial diversity. In: Malik A., Grohmann E., Alves M. (eds.) Management of Microbial Resources in the Environment, pp. 119–140. Springer, Dordrecht (2013). | |
| dc.relation.references | [7] Borkin L. J., Litvinchuk S. N., Rosanov Yu. M., Skorinov D. V. On cryptic species (an example of amphibians). Entomological Review. 84 (Suppl 1), S75–S98 (2004). | |
| dc.relation.references | [8] Husev M., Rovenchak A. On the verge of life: Distribution of nucleotide sequences in viral RNAs. Biosemiotics. 14 (2), 253–269 (2021). | |
| dc.relation.references | [9] Husev M., Rovenchak A. Parametrization of rank-frequency distributions of nucleotide sequences in virus RNAs. Visnyk Lviv Univ. Ser. Phys. 58, 72–84 (2021). | |
| dc.relation.references | [10] Looi M.-K. Covid-19: Scientists sound alarm over new BA.2.86 “Pirola” variant. BMJ. 2023, p1964 (2023). | |
| dc.relation.references | [11] Meo S. A., Meo A. S., Klonoff D. C. Omicron new variant BA.2.86 (Pirola): Epidemiological, biological, and clinical characteristics– a global data-based analysis. European Review for Medical and Pharmaco logical Sciences. 27 (19), 9470–9476 (2023). | |
| dc.relation.references | [12] Hemo M. K., Islam M. A. JN.1 as a new variant of COVID-19– editorial. Annals of Medicine & Surgery. 86 (4), 1833–1835 (2024). | |
| dc.relation.references | [13] Abou-Nouh H., El Khomsi M. Viable control of COVID-19 spread with vaccination. Mathematical Modeling and Computing. 11 (1), 203–210 (2024). | |
| dc.relation.references | [14] Chen Yuzhou, Gel Y. R., Marathe M. V., Poor H. V. A simplicial epidemic model for COVID-19 spread analysis. Proceedings of the National Academy of Sciences. 121 (1), e2313171120 (2024). | |
| dc.relation.references | [15] Rovenchak A. Telling apart Felidae and Ursidae from the distribution of nucleotides in mitochondrial DNA. Modern Physics Letters B. 32 (05), 1850057 (2018). | |
| dc.relation.references | [16] Shannon C. E. A mathematical theory of communication. The Bell System Technical Journal. 27 (3), 379–423 (1948). | |
| dc.relation.references | [17] Kelih E., Anti´c G., Grzybek P., Stadlober E. Classification of author and/or genre? The impact of word length. In: Weihs C., Gaul W. (eds.), Classification– the Ubiquitous Challenge, pp. 498–505. Springer Verlag, Berlin–Heidelberg (2005). | |
| dc.relation.references | [18] Z¨ornig P., Kelih E., Fuks L. Classification of Serbian texts based on lexical characteristics and multivariate statistical analysis. Glottotheory. 7 (1), 41–66 (2016). | |
| dc.relation.references | [19] Rovenchak A., Rovenchak O. Quantifying comprehensibility of Christmas and Easter addresses from the Ukrainian Greek Catholic Church hierarchs. Glottometrics. 41, 57–66 (2018). | |
| dc.relation.references | [20] Rovenchak A. Approaches to the classification of complex systems: Words, texts, and more. In: Holovatch Yu. (ed.), Order, Disorder and Criticality, vol. 7, pp. 209–246. World Scientific (2023). | |
| dc.relation.references | [21] Chua K. C., Chandran V., Acharya U. R., Lim C. M. Application of higher order statistics/spectra in biomedical signals A review. Medical Engineering & Physics. 32 (7), 679–689 (2010). | |
| dc.relation.references | [22] Bland M., Altman D. Statistics notes: Measurement error. BMJ. 312 (7047), 1654 (1996). | |
| dc.relation.references | [23] Tipping M. E., Bishop C. M. Probabilistic principal component analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology. 61 (3), 611–622 (1999). | |
| dc.relation.references | [24] Jolliffe I. T., Cadima J. Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 374 (2065), 20150202 (2016). | |
| dc.relation.references | [25] Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., Duchesnay E. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research. 12, 2825–2830 (2011). | |
| dc.relation.references | [26] Principal component analysis (PCA). https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html. | |
| dc.relation.referencesen | [1] ArtimeO.,DeDomenicoM.Fromtheoriginoflifetopandemics: emergentphenomenaincomplexsystems. PhilosophicalTransactionsoftheRoyalSocietyA:MathematicalPhysicalandEngineeringSciences. 380 (2227),20200410(2022). | |
| dc.relation.referencesen | [2] CanforaG.,MercaldoF., SantoneA.Anovel classificationtechniquebasedon formalmethods. ACM Transactionson Knowledge DiscoveryfromData. 17(8),1–30(2023). | |
| dc.relation.referencesen | [3] Raman R., Gupta N., Jeppu Y. Framework for formal verification of machine learning based complex system-of-systems. Insight. 26 (1), 91–102 (2023). | |
| dc.relation.referencesen | [4] Holovatch Y., Kenna R., Thurner S. Complex systems: physics beyond physics. European Journal of Physics. 38 (2), 023002 (2017). | |
| dc.relation.referencesen | [5] Newman M. Networks. Oxford University Press; 2nd edition (2018). | |
| dc.relation.referencesen | [6] Tabish M., Azim S., Hussain M. A., Rehman S. U., Sarwar T., Ishqi H. M. Bioinformatics approaches in studying microbial diversity. In: Malik A., Grohmann E., Alves M. (eds.) Management of Microbial Resources in the Environment, pp. 119–140. Springer, Dordrecht (2013). | |
| dc.relation.referencesen | [7] Borkin L. J., Litvinchuk S. N., Rosanov Yu. M., Skorinov D. V. On cryptic species (an example of amphibians). Entomological Review. 84 (Suppl 1), S75–S98 (2004). | |
| dc.relation.referencesen | [8] Husev M., Rovenchak A. On the verge of life: Distribution of nucleotide sequences in viral RNAs. Biosemiotics. 14 (2), 253–269 (2021). | |
| dc.relation.referencesen | [9] Husev M., Rovenchak A. Parametrization of rank-frequency distributions of nucleotide sequences in virus RNAs. Visnyk Lviv Univ. Ser. Phys. 58, 72–84 (2021). | |
| dc.relation.referencesen | [10] Looi M.-K. Covid-19: Scientists sound alarm over new BA.2.86 "Pirola" variant. BMJ. 2023, p1964 (2023). | |
| dc.relation.referencesen | [11] Meo S. A., Meo A. S., Klonoff D. C. Omicron new variant BA.2.86 (Pirola): Epidemiological, biological, and clinical characteristics– a global data-based analysis. European Review for Medical and Pharmaco logical Sciences. 27 (19), 9470–9476 (2023). | |
| dc.relation.referencesen | [12] Hemo M. K., Islam M. A. JN.1 as a new variant of COVID-19– editorial. Annals of Medicine & Surgery. 86 (4), 1833–1835 (2024). | |
| dc.relation.referencesen | [13] Abou-Nouh H., El Khomsi M. Viable control of COVID-19 spread with vaccination. Mathematical Modeling and Computing. 11 (1), 203–210 (2024). | |
| dc.relation.referencesen | [14] Chen Yuzhou, Gel Y. R., Marathe M. V., Poor H. V. A simplicial epidemic model for COVID-19 spread analysis. Proceedings of the National Academy of Sciences. 121 (1), e2313171120 (2024). | |
| dc.relation.referencesen | [15] Rovenchak A. Telling apart Felidae and Ursidae from the distribution of nucleotides in mitochondrial DNA. Modern Physics Letters B. 32 (05), 1850057 (2018). | |
| dc.relation.referencesen | [16] Shannon C. E. A mathematical theory of communication. The Bell System Technical Journal. 27 (3), 379–423 (1948). | |
| dc.relation.referencesen | [17] Kelih E., Anti´c G., Grzybek P., Stadlober E. Classification of author and/or genre? The impact of word length. In: Weihs C., Gaul W. (eds.), Classification– the Ubiquitous Challenge, pp. 498–505. Springer Verlag, Berlin–Heidelberg (2005). | |
| dc.relation.referencesen | [18] Z¨ornig P., Kelih E., Fuks L. Classification of Serbian texts based on lexical characteristics and multivariate statistical analysis. Glottotheory. 7 (1), 41–66 (2016). | |
| dc.relation.referencesen | [19] Rovenchak A., Rovenchak O. Quantifying comprehensibility of Christmas and Easter addresses from the Ukrainian Greek Catholic Church hierarchs. Glottometrics. 41, 57–66 (2018). | |
| dc.relation.referencesen | [20] Rovenchak A. Approaches to the classification of complex systems: Words, texts, and more. In: Holovatch Yu. (ed.), Order, Disorder and Criticality, vol. 7, pp. 209–246. World Scientific (2023). | |
| dc.relation.referencesen | [21] Chua K. C., Chandran V., Acharya U. R., Lim C. M. Application of higher order statistics/spectra in biomedical signals A review. Medical Engineering & Physics. 32 (7), 679–689 (2010). | |
| dc.relation.referencesen | [22] Bland M., Altman D. Statistics notes: Measurement error. BMJ. 312 (7047), 1654 (1996). | |
| dc.relation.referencesen | [23] Tipping M. E., Bishop C. M. Probabilistic principal component analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology. 61 (3), 611–622 (1999). | |
| dc.relation.referencesen | [24] Jolliffe I. T., Cadima J. Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 374 (2065), 20150202 (2016). | |
| dc.relation.referencesen | [25] Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., Duchesnay E. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research. 12, 2825–2830 (2011). | |
| dc.relation.referencesen | [26] Principal component analysis (PCA). https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html. | |
| dc.relation.uri | https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html | |
| dc.rights.holder | © Національний університет “Львівська політехніка”, 2024 | |
| dc.subject | ранґово-частотний розподіл | |
| dc.subject | параметризація | |
| dc.subject | коронавірус | |
| dc.subject | статистичні спектри | |
| dc.subject | головні компоненти | |
| dc.subject | наївний класифікатор Баєса | |
| dc.subject | rank–frequency distribution | |
| dc.subject | parametrization | |
| dc.subject | coronavirus | |
| dc.subject | statistical spectra | |
| dc.subject | principal components | |
| dc.subject | na¨ıve Bayes classifier | |
| dc.title | A statistical approach to coronavirus classification based on nucleotide distributions | |
| dc.title.alternative | Статистичний підхід до класифікації коронавірусів на основі розподілу нуклеотидів | |
| dc.type | Article |
Files
Original bundle
License bundle
1 - 1 of 1