A statistical approach to coronavirus classification based on nucleotide distributions

dc.citation.epage994
dc.citation.issue4
dc.citation.journalTitleМатематичне моделювання та обчислення
dc.citation.spage987
dc.contributor.affiliationЛьвівський національний університет імені Івана Франка
dc.contributor.affiliationIvan Franko National University of Lviv
dc.contributor.affiliationSoftServe, Inc.
dc.contributor.authorГусєв, М.
dc.contributor.authorРовенчак, А.
dc.contributor.authorHusiev, M.
dc.contributor.authorRovenchak, A.
dc.coverage.placenameЛьвів
dc.coverage.placenameLviv
dc.date.accessioned2026-03-02T07:29:08Z
dc.date.created2024-02-24
dc.date.issued2024-02-24
dc.description.abstractМетою цього дослідження є аналіз конкретних геномів, а саме РНК коронавірусів, на основі параметрів, отриманих із розподілу нуклеотидних послідовностей у їхніх РНК. Вірусна РНК була розділена на нуклеотидні послідовності, отримані шляхом зміни однієї нуклеотидної основи (аденін) на пробіл, причому порожні послідовності позначено як x. Для послідовностей побудовано статистичні спектри. Вони показали три чіткі піки, які були послідовними для досліджуваних видів. Розраховано параметри на основі ранґово-частотного розподілу отриманих нуклеотидних послідовностей, довжини послідовностей та деякі інші статистичні параметри. На підставі цих параметрів було визначено головні компоненти, які лягли в основу групування досліджуваних вірусів. Найбільш релевантні параметри сформували модель наївного класифікатора Баєса, що аналізує ймовірність належності вірусу до певної групи вірусів у моделі.
dc.description.abstractThe objective of this study is to analyze specific genomes, namely the RNA of coronaviruses, based on the parameters obtained from the distributions of nucleotide sequences in their RNA. The viral RNA was subjected to distribution based on nucleotide sequences obtained by changing one nucleotide base (adenine) into a “whitespace”, with empty sequences denoted as “x”. Statistical spectra were constructed in such cases. They exhibited three distinct peaks that were consistent across the studied species. Parameters based on the rank–frequency distributions of the obtained nucleotide sequences, sequence lengths, and some other statistical parameters were calculated. Based on these parameters, the principal components were built, which were the basis for the grouping of the studied viruses. The most relevant parameters formed the model of a na¨ıve Bayes classifier, which analyzes the probability of the virus belonging to a certain group of viruses in the model.
dc.format.extent987-994
dc.format.pages8
dc.identifier.citationHusiev M. A statistical approach to coronavirus classification based on nucleotide distributions / M. Husiev, A. Rovenchak // Mathematical Modeling and Computing. — Lviv : Lviv Politechnic Publishing House, 2024. — Vol 11. — No 4. — P. 987–994.
dc.identifier.citationenHusiev M. A statistical approach to coronavirus classification based on nucleotide distributions / M. Husiev, A. Rovenchak // Mathematical Modeling and Computing. — Lviv : Lviv Politechnic Publishing House, 2024. — Vol 11. — No 4. — P. 987–994.
dc.identifier.doi10.23939/mmc2024.04.987
dc.identifier.urihttps://ena.lpnu.ua/handle/ntb/124711
dc.language.isoen
dc.publisherВидавництво Львівської політехніки
dc.publisherLviv Politechnic Publishing House
dc.relation.ispartofМатематичне моделювання та обчислення, 4 (11), 2024
dc.relation.ispartofMathematical Modeling and Computing, 4 (11), 2024
dc.relation.references[1] ArtimeO.,DeDomenicoM.Fromtheoriginoflifetopandemics: emergentphenomenaincomplexsystems. PhilosophicalTransactionsoftheRoyalSocietyA:MathematicalPhysicalandEngineeringSciences. 380 (2227),20200410(2022).
dc.relation.references[2] CanforaG.,MercaldoF., SantoneA.Anovel classificationtechniquebasedon formalmethods. ACM Transactionson Knowledge DiscoveryfromData. 17(8),1–30(2023).
dc.relation.references[3] Raman R., Gupta N., Jeppu Y. Framework for formal verification of machine learning based complex system-of-systems. Insight. 26 (1), 91–102 (2023).
dc.relation.references[4] Holovatch Y., Kenna R., Thurner S. Complex systems: physics beyond physics. European Journal of Physics. 38 (2), 023002 (2017).
dc.relation.references[5] Newman M. Networks. Oxford University Press; 2nd edition (2018).
dc.relation.references[6] Tabish M., Azim S., Hussain M. A., Rehman S. U., Sarwar T., Ishqi H. M. Bioinformatics approaches in studying microbial diversity. In: Malik A., Grohmann E., Alves M. (eds.) Management of Microbial Resources in the Environment, pp. 119–140. Springer, Dordrecht (2013).
dc.relation.references[7] Borkin L. J., Litvinchuk S. N., Rosanov Yu. M., Skorinov D. V. On cryptic species (an example of amphibians). Entomological Review. 84 (Suppl 1), S75–S98 (2004).
dc.relation.references[8] Husev M., Rovenchak A. On the verge of life: Distribution of nucleotide sequences in viral RNAs. Biosemiotics. 14 (2), 253–269 (2021).
dc.relation.references[9] Husev M., Rovenchak A. Parametrization of rank-frequency distributions of nucleotide sequences in virus RNAs. Visnyk Lviv Univ. Ser. Phys. 58, 72–84 (2021).
dc.relation.references[10] Looi M.-K. Covid-19: Scientists sound alarm over new BA.2.86 “Pirola” variant. BMJ. 2023, p1964 (2023).
dc.relation.references[11] Meo S. A., Meo A. S., Klonoff D. C. Omicron new variant BA.2.86 (Pirola): Epidemiological, biological, and clinical characteristics– a global data-based analysis. European Review for Medical and Pharmaco logical Sciences. 27 (19), 9470–9476 (2023).
dc.relation.references[12] Hemo M. K., Islam M. A. JN.1 as a new variant of COVID-19– editorial. Annals of Medicine & Surgery. 86 (4), 1833–1835 (2024).
dc.relation.references[13] Abou-Nouh H., El Khomsi M. Viable control of COVID-19 spread with vaccination. Mathematical Modeling and Computing. 11 (1), 203–210 (2024).
dc.relation.references[14] Chen Yuzhou, Gel Y. R., Marathe M. V., Poor H. V. A simplicial epidemic model for COVID-19 spread analysis. Proceedings of the National Academy of Sciences. 121 (1), e2313171120 (2024).
dc.relation.references[15] Rovenchak A. Telling apart Felidae and Ursidae from the distribution of nucleotides in mitochondrial DNA. Modern Physics Letters B. 32 (05), 1850057 (2018).
dc.relation.references[16] Shannon C. E. A mathematical theory of communication. The Bell System Technical Journal. 27 (3), 379–423 (1948).
dc.relation.references[17] Kelih E., Anti´c G., Grzybek P., Stadlober E. Classification of author and/or genre? The impact of word length. In: Weihs C., Gaul W. (eds.), Classification– the Ubiquitous Challenge, pp. 498–505. Springer Verlag, Berlin–Heidelberg (2005).
dc.relation.references[18] Z¨ornig P., Kelih E., Fuks L. Classification of Serbian texts based on lexical characteristics and multivariate statistical analysis. Glottotheory. 7 (1), 41–66 (2016).
dc.relation.references[19] Rovenchak A., Rovenchak O. Quantifying comprehensibility of Christmas and Easter addresses from the Ukrainian Greek Catholic Church hierarchs. Glottometrics. 41, 57–66 (2018).
dc.relation.references[20] Rovenchak A. Approaches to the classification of complex systems: Words, texts, and more. In: Holovatch Yu. (ed.), Order, Disorder and Criticality, vol. 7, pp. 209–246. World Scientific (2023).
dc.relation.references[21] Chua K. C., Chandran V., Acharya U. R., Lim C. M. Application of higher order statistics/spectra in biomedical signals A review. Medical Engineering & Physics. 32 (7), 679–689 (2010).
dc.relation.references[22] Bland M., Altman D. Statistics notes: Measurement error. BMJ. 312 (7047), 1654 (1996).
dc.relation.references[23] Tipping M. E., Bishop C. M. Probabilistic principal component analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology. 61 (3), 611–622 (1999).
dc.relation.references[24] Jolliffe I. T., Cadima J. Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 374 (2065), 20150202 (2016).
dc.relation.references[25] Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., Duchesnay E. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research. 12, 2825–2830 (2011).
dc.relation.references[26] Principal component analysis (PCA). https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html.
dc.relation.referencesen[1] ArtimeO.,DeDomenicoM.Fromtheoriginoflifetopandemics: emergentphenomenaincomplexsystems. PhilosophicalTransactionsoftheRoyalSocietyA:MathematicalPhysicalandEngineeringSciences. 380 (2227),20200410(2022).
dc.relation.referencesen[2] CanforaG.,MercaldoF., SantoneA.Anovel classificationtechniquebasedon formalmethods. ACM Transactionson Knowledge DiscoveryfromData. 17(8),1–30(2023).
dc.relation.referencesen[3] Raman R., Gupta N., Jeppu Y. Framework for formal verification of machine learning based complex system-of-systems. Insight. 26 (1), 91–102 (2023).
dc.relation.referencesen[4] Holovatch Y., Kenna R., Thurner S. Complex systems: physics beyond physics. European Journal of Physics. 38 (2), 023002 (2017).
dc.relation.referencesen[5] Newman M. Networks. Oxford University Press; 2nd edition (2018).
dc.relation.referencesen[6] Tabish M., Azim S., Hussain M. A., Rehman S. U., Sarwar T., Ishqi H. M. Bioinformatics approaches in studying microbial diversity. In: Malik A., Grohmann E., Alves M. (eds.) Management of Microbial Resources in the Environment, pp. 119–140. Springer, Dordrecht (2013).
dc.relation.referencesen[7] Borkin L. J., Litvinchuk S. N., Rosanov Yu. M., Skorinov D. V. On cryptic species (an example of amphibians). Entomological Review. 84 (Suppl 1), S75–S98 (2004).
dc.relation.referencesen[8] Husev M., Rovenchak A. On the verge of life: Distribution of nucleotide sequences in viral RNAs. Biosemiotics. 14 (2), 253–269 (2021).
dc.relation.referencesen[9] Husev M., Rovenchak A. Parametrization of rank-frequency distributions of nucleotide sequences in virus RNAs. Visnyk Lviv Univ. Ser. Phys. 58, 72–84 (2021).
dc.relation.referencesen[10] Looi M.-K. Covid-19: Scientists sound alarm over new BA.2.86 "Pirola" variant. BMJ. 2023, p1964 (2023).
dc.relation.referencesen[11] Meo S. A., Meo A. S., Klonoff D. C. Omicron new variant BA.2.86 (Pirola): Epidemiological, biological, and clinical characteristics– a global data-based analysis. European Review for Medical and Pharmaco logical Sciences. 27 (19), 9470–9476 (2023).
dc.relation.referencesen[12] Hemo M. K., Islam M. A. JN.1 as a new variant of COVID-19– editorial. Annals of Medicine & Surgery. 86 (4), 1833–1835 (2024).
dc.relation.referencesen[13] Abou-Nouh H., El Khomsi M. Viable control of COVID-19 spread with vaccination. Mathematical Modeling and Computing. 11 (1), 203–210 (2024).
dc.relation.referencesen[14] Chen Yuzhou, Gel Y. R., Marathe M. V., Poor H. V. A simplicial epidemic model for COVID-19 spread analysis. Proceedings of the National Academy of Sciences. 121 (1), e2313171120 (2024).
dc.relation.referencesen[15] Rovenchak A. Telling apart Felidae and Ursidae from the distribution of nucleotides in mitochondrial DNA. Modern Physics Letters B. 32 (05), 1850057 (2018).
dc.relation.referencesen[16] Shannon C. E. A mathematical theory of communication. The Bell System Technical Journal. 27 (3), 379–423 (1948).
dc.relation.referencesen[17] Kelih E., Anti´c G., Grzybek P., Stadlober E. Classification of author and/or genre? The impact of word length. In: Weihs C., Gaul W. (eds.), Classification– the Ubiquitous Challenge, pp. 498–505. Springer Verlag, Berlin–Heidelberg (2005).
dc.relation.referencesen[18] Z¨ornig P., Kelih E., Fuks L. Classification of Serbian texts based on lexical characteristics and multivariate statistical analysis. Glottotheory. 7 (1), 41–66 (2016).
dc.relation.referencesen[19] Rovenchak A., Rovenchak O. Quantifying comprehensibility of Christmas and Easter addresses from the Ukrainian Greek Catholic Church hierarchs. Glottometrics. 41, 57–66 (2018).
dc.relation.referencesen[20] Rovenchak A. Approaches to the classification of complex systems: Words, texts, and more. In: Holovatch Yu. (ed.), Order, Disorder and Criticality, vol. 7, pp. 209–246. World Scientific (2023).
dc.relation.referencesen[21] Chua K. C., Chandran V., Acharya U. R., Lim C. M. Application of higher order statistics/spectra in biomedical signals A review. Medical Engineering & Physics. 32 (7), 679–689 (2010).
dc.relation.referencesen[22] Bland M., Altman D. Statistics notes: Measurement error. BMJ. 312 (7047), 1654 (1996).
dc.relation.referencesen[23] Tipping M. E., Bishop C. M. Probabilistic principal component analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology. 61 (3), 611–622 (1999).
dc.relation.referencesen[24] Jolliffe I. T., Cadima J. Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 374 (2065), 20150202 (2016).
dc.relation.referencesen[25] Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., Duchesnay E. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research. 12, 2825–2830 (2011).
dc.relation.referencesen[26] Principal component analysis (PCA). https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html.
dc.relation.urihttps://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
dc.rights.holder© Національний університет “Львівська політехніка”, 2024
dc.subjectранґово-частотний розподіл
dc.subjectпараметризація
dc.subjectкоронавірус
dc.subjectстатистичні спектри
dc.subjectголовні компоненти
dc.subjectнаївний класифікатор Баєса
dc.subjectrank–frequency distribution
dc.subjectparametrization
dc.subjectcoronavirus
dc.subjectstatistical spectra
dc.subjectprincipal components
dc.subjectna¨ıve Bayes classifier
dc.titleA statistical approach to coronavirus classification based on nucleotide distributions
dc.title.alternativeСтатистичний підхід до класифікації коронавірусів на основі розподілу нуклеотидів
dc.typeArticle

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
2024v11n4_Husiev_M-A_statistical_approach_to_987-994.pdf
Size:
1.17 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
2024v11n4_Husiev_M-A_statistical_approach_to_987-994__COVER.png
Size:
456.86 KB
Format:
Portable Network Graphics

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.79 KB
Format:
Plain Text
Description: