Метод дедублікації та розподілу даних у хмарних сховищах під час резервного копіювання даних

Русин, Б. П.; Погрелюк, Л. В.; Висоцька, В. А.; Осипов, М. М.; Rusyn, Bohdan; Pohreliuk, Liubomyr; Vysotska, Victoria; Osypov, Mykhailo

Метод дедублікації та розподілу даних у хмарних сховищах під час резервного копіювання даних

dc.citation.epage	12
dc.citation.issue	6
dc.citation.journalTitle	Вісник Національного університету "Львівська політехніка". Інформаційні системи та мережі
dc.citation.spage	1
dc.contributor.affiliation	Фізико-механічний інститут імені Г. В. Карпенка НАН України
dc.contributor.affiliation	Національний університет “Львівська політехніка”
dc.contributor.affiliation	Karpenko Physico-Mechanical Institute of the NAS Ukraine
dc.contributor.affiliation	Lviv Polytechnic National University
dc.contributor.author	Русин, Б. П.
dc.contributor.author	Погрелюк, Л. В.
dc.contributor.author	Висоцька, В. А.
dc.contributor.author	Осипов, М. М.
dc.contributor.author	Rusyn, Bohdan
dc.contributor.author	Pohreliuk, Liubomyr
dc.contributor.author	Vysotska, Victoria
dc.contributor.author	Osypov, Mykhailo
dc.coverage.placename	Львів
dc.coverage.placename	Lviv
dc.date.accessioned	2020-03-25T08:37:19Z
dc.date.available	2020-03-25T08:37:19Z
dc.date.created	2019-02-26
dc.date.issued	2019-02-26
dc.description.abstract	Розроблено інтелектуальну систему дедублікації та поширення даних у хмарних сховищах. Сформоване програмне забезпечення має зручний інтерфейс, який дає змогу створювати резервні копії та відновлювати дані. Здійснено аналітичний огляд методологічних засад дослідження, проаналізовано різні підходи до резервного копіювання даних із використанням дедублікації та розподілу даних у хмарному сховищі, висвітлено їхні переваги та недоліки. Детально розглянуто переваги та недоліки сучасних технологій дедублікації даних. Цей аналіз довів ефективність розроблення та впровадження інтелектуальної системи дедублікації та розподілу даних у хмарному сховищі. Виконано систематичний аналіз предметної області. Сформульовано мету функціонування та розвитку системи, мету та місце функціонування системи, визначено очікувані ефекти від впровадження програмного продукту. Розроблено та детально описано концептуальну модель системи. Наведено детальні діаграми прецедентів, стану переходів, послідовностей, компонентів та класів, що разом дають змогу визначити поведінку системи, встановити та сформулювати необхідні бізнес-процеси. Проаналізовано (наведено недоліки та переваги використання різних підходів) та вибрано ефективні методи розв’язання задач: гібридна дедублікація на рівні блока, розбиття даних на основі цифрового відбитка Рабіна, розподіл даних на основі хеш-значень одиниці дублювання та використання розподіленого індексу. Під час аналізу розв’язків задач вибрано мову програмування Rust для написання клієнтської частини, мову програмування Scala для серверної частини, Akka для управління розподіленими обчисленнями та Amazon S3 як хмарне зберігання. Розроблено інтелектуальну систему дедублікації та розподілу даних у хмарному сховищі, здійснено опис програмного забезпечення, розглянуто етапи роботи користувача. Проведено тестування роботи спроєктованої системи та створено кілька контрольних зразків, проаналізовано результати.
dc.description.abstract	An intellectual system of data deduplication and distribution in cloud storage facilities was developed. The resulting software has a user-friendly interface that allows you to backup and restore data. An analytical review of the methodological principles of the research is carried out, existing approaches to data backup with the use of data deduplication and distribution in cloud storage are analyzed, their advantages and disadvantages are highlighted. The advantages and disadvantages of modern data deduplication technologies are considered in details. This analysis has proved the efficiency of the design and implementation of the intellectual system of data deduplication and distribution in cloud storage. A systematic analysis of the subject domain is performed. The purpose of functioning and development of the system, purpose and place of functioning of the system is formulated, the expected effects from the introduction of the software product are determined. A conceptual model of the system has been developed and described in detail. The detailed diagrams of precedents, states of transitions, sequences, components and classes are given, which together allowed to determine the system’s behavior, to define and formulate the necessary business processes. It is analyzed the disadvantages and advantages of using different approaches and the effective methods of solving problems are selected: hybrid deduplication at the block level, data splitting on the basis of Rabin’s digital imprint, data distribution based on the hash values of the duplication units, and the use of the distributed index. During the analysis of task solutions, the Rust programming language for writing a client part, Scala programming language for the server part, Akka for distributed computing management and Amazon S3 as cloud storage are selected. The intellectual system of deduplication and distribution of data in cloud storage is developed, the software description is described, the steps for the user’s operation are considered. The testing of the work of the designed system is carried out and several control samples were given, the results are analyzed.
dc.format.extent	1-12
dc.format.pages	12
dc.identifier.citation	Метод дедублікації та розподілу даних у хмарних сховищах під час резервного копіювання даних / Б. П. Русин, Л. В. Погрелюк, В. А. Висоцька, М. М. Осипов // Вісник Національного університету "Львівська політехніка". Інформаційні системи та мережі. — Львів : Видавництво Львівської політехніки, 2019. — № 6. — С. 1–12.
dc.identifier.citationen	Method of data dedublication and distribution in cloud warehouses during data backup / Bohdan Rusyn, Liubomyr Pohreliuk, Victoria Vysotska, Mykhailo Osypov // Visnyk Natsionalnoho universytetu "Lvivska politekhnika". Informatsiini systemy ta merezhi. — Lviv : Lviv Politechnic Publishing House, 2019. — No 6. — P. 1–12.
dc.identifier.uri	https://ena.lpnu.ua/handle/ntb/47801
dc.language.iso	uk
dc.publisher	Видавництво Львівської політехніки
dc.publisher	Lviv Politechnic Publishing House
dc.relation.ispartof	Вісник Національного університету "Львівська політехніка". Інформаційні системи та мережі, 6, 2019
dc.relation.references	1. Understanding Data Deduplication. (2018). Retrieved 28, 2019, from https://www.druva.com/ understanding-data-deduplication
dc.relation.references	2. Explaining deduplication rates and single-instance storage to clients. (2008). Retrieved 28, 2019, from https://searchitchannel.techtarget.com/tip/Explaining-deduplication-rates-and-single-instance-storage-to-clients
dc.relation.references	3. Inline vs. post-processing deduplication appliances.(2008). Retrieved 28, 2019, from https://searchdatabackup. techtarget.com/tip/Inline-vs-post-processing-deduplication-appliances
dc.relation.references	4. Introduction to Data Deduplication. (2008). Retrieved 28, 2019, from https://www.petri.com/datadeduplication-introduction
dc.relation.references	5. Rabin, M. O. (1981). Fingerprinting by random polynomials : Center for Research in Computing Technology Harvard University Report – Harvard.
dc.relation.references	6. Tanenbaum, A. S., & van Steen, M. (2017). Distributed Systems. Upper Saddle River : Pearson Prentice Hall.
dc.relation.references	7. Amdahl, G. (1967). The validity of the single processor approach to achieving large-scale computing capabilities. Atlantic City : Proceedings of AFIPS.
dc.relation.references	8. Using StorReduce for cloud-based data deduplication. (2008). Retrieved 28, 2019, from https://cloud. google.com/solutions/partners/storreduce-cloud-deduplication
dc.relation.references	9. OpenDedup Overview. (2008). Retrieved 2019, from https://opendedup.org/odd/overview/
dc.relation.references	10. Rumbaugh, J., Jacobson, I., & Booch, G. (1999). The unified modeling language reference manual. Addison Wesley Longman Inc.
dc.relation.references	11. Rolling hash, Rabin Karp, palindromes, rsync and others. (2008). Retrieved 28, 2019, from https://www.infoarena.ro/blog/rolling-hash
dc.relation.references	12. Vysotska, V., Chyrun, L., & Lytvyn, V. (2016). Methods based on ontologies for information resources processing. LAP Lambert Academic Publishing.
dc.relation.references	13. Vysotska, V., & Shakhovska, N. (2018). Information technologies of gamification for training and recruitment. Saarbrücken, Germany: LAP LAMBERT Academic Publishing.
dc.relation.references	14. Висоцька, В. А. (2008). Особливості проєктування та впровадження систем електронної комерції.
dc.relation.references	15. Vysotska, V., & Lytvyn, V. (2018). Web resources processing based on ontologies. Saarbrücken, Germany: LAP LAMBERT Academic Publishing.
dc.relation.references	16. Vysotska, V. (2018). Tekhnolohiyi elektronnoyi komertsiyi ta Internet-marketynhu. Saarbrücken, Germany: LAP LAMBERT Academic Publishing.
dc.relation.references	17. Vysotska, V. (2018). Internet systems design and development based on Web Mining and NLP. Saarbrücken, Germany: LAP LAMBERT Academic Publishing.
dc.relation.references	18. Vysotska, V. (2018). Computer linguistics for online marketing in information technology: monograph. Saarbrücken, Germany: LAP LAMBERT Academic Publishing.
dc.relation.references	19. Lytvyn, V., Vysotska, V., Wojcik, W., & Dosyn, D. (2017). A method of construction of automated basic ontology. In Computational linguistics andintelligent systems (COLINS 2017). National Technical University “KhPI”.
dc.relation.references	20. Lytvyn, V., Vysotska, V., Chyrun, L., Smolarz, A., & Naum, O. (2017). Intelligent system structure for Web resources processing and analysis. In Computational linguistics andintelligent systems (COLINS 2017). National Technical University “KhPI”.
dc.relation.references	21. Berko, A., Vysotska, V., & Chyrun, L. (2014). Features of information resources processing in electronic content commerce. Applied Computer Science, 10.
dc.relation.references	22. Берко, А. Ю., Висоцька, В. А., & Рішняк, І. В. (2008). Методи та засоби оцінювання ризиків безпеки інформації в системах електронної комерції.
dc.relation.references	23. Vysotska, V., & Chyrun, L. (2013). Web Content Processing Method for Electronic Business Systems. International Journal of Computers & Technology, 12(2), 3211–3220.
dc.relation.references	24. Висоцька, В. А., Чирун, Л. Б., & Чирун, Л. В. (2011). Моделювання етапів життєвого циклу комерційного web-контенту.
dc.relation.references	25. Берко, А. Ю., Висоцька, В. А., & Чирун, Л. В. (2004). Алгоритми опрацювання інформаційних ресурсів в системах електронної комерції.
dc.relation.references	26. Vysotska, V., & Chyrun, L. (2011). Commercial Web Content Lifecycle Model.
dc.relation.references	27. Берко, А., & Висоцька, В. А. (2009). Проектування навігаційного графу web-сторінок бази даних систем електронної контент-комерції.
dc.relation.references	28. Берко, А. Ю., & Висоцька, В. А. (2009). Семантична інтеграція неповних та неточних даних. Системи обробки інформації, (7), 93–98.
dc.relation.references	29. Берко, А. Ю., & Висоцька, В. А. (2007). Моделі та методи проектування інформаційних систем електронної комерції. Автоматизированные системы управления и приборы автоматики, (138).
dc.relation.references	30. Алєксєєва, К. А., Берко, А. Ю., & Висоцька, В. А. (2015). Управління Web-ресурсами за умов невизначеності. Технологический аудит и резервы производства, (2 (2)), 4–7.
dc.relation.references	31. Vysotska, V., & Chyrun, L. (2014). Designing features of architecture for e-commerce systems [Electronic resource]. MEST Journal, 2(1), 57–70.
dc.relation.references	32. Vysotska, V., & Chyrun, L. (2014). Set-theoretic models and unified methods of information resources processing in e-business systems. Applied Computer Science, 10.
dc.relation.referencesen	1. Understanding Data Deduplication. (2018). Retrieved 28, 2019, from https://www.druva.com/ understanding-data-deduplication
dc.relation.referencesen	2. Explaining deduplication rates and single-instance storage to clients. (2008). Retrieved 28, 2019, from https://searchitchannel.techtarget.com/tip/Explaining-deduplication-rates-and-single-instance-storage-to-clients
dc.relation.referencesen	3. Inline vs. post-processing deduplication appliances. (2008). Retrieved 28, 2019, from https://searchdatabackup.techtarget.com/tip/Inline-vs-post-processing-deduplication-appliances
dc.relation.referencesen	4. Introduction to Data Deduplication. (2008). Retrieved 28, 2019, from https://www.petri.com/datadeduplication-introduction
dc.relation.referencesen	5. Rabin, M. O. (1981). Fingerprinting by random polynomials : Center for Research in Computing Technology Harvard University Report – Harvard.
dc.relation.referencesen	6. Tanenbaum, A. S., & van Steen, M. (2017). Distributed Systems. Upper Saddle River : Pearson Prentice Hall.
dc.relation.referencesen	7. Amdahl, G. (1967). The validity of the single processor approach to achieving large-scale computing capabilities. Atlantic City : Proceedings of AFIPS.
dc.relation.referencesen	8. Using StorReduce for cloud-based data deduplication. (2008). Retrieved 28, 2019, from https://cloud.google.com/solutions/partners/storreduce-cloud-deduplication
dc.relation.referencesen	9. OpenDedup Overview. (2008). Retrieved 2019, from https://opendedup.org/odd/overview/
dc.relation.referencesen	10. Rumbaugh, J., Jacobson, I., & Booch, G. (1999). The unified modeling language reference manual. Addison Wesley Longman Inc.
dc.relation.referencesen	11. Rolling hash, Rabin Karp, palindromes, rsync and others. (2008). Retrieved 28, 2019, from https://www.infoarena.ro/blog/rolling-hash
dc.relation.referencesen	12. Vysotska, V., Chyrun, L., & Lytvyn, V. (2016). Methods based on ontologies for information resources processing. LAP Lambert Academic Publishing.
dc.relation.referencesen	13. Vysotska, V., & Shakhovska, N. (2018). Information technologies of gamification for training and recruitment. Saarbrücken, Germany: LAP LAMBERT Academic Publishing.
dc.relation.referencesen	14. Vysotska, V. (2008). Osoblyvosti proektuvannya ta vprovadzhennya system elektronnoyi komertsiyi.
dc.relation.referencesen	15. Vysotska, V., & Lytvyn, V. (2018). Web resources processing based on ontologies. Saarbrücken, Germany: LAP LAMBERT Academic Publishing.
dc.relation.referencesen	16. Vysotska, V. (2018). Tekhnolohiyi elektronnoyi komertsiyi ta Internet-marketynhu. Saarbrücken, Germany: LAP LAMBERT Academic Publishing.
dc.relation.referencesen	17. Vysotska, V. (2018). Internet systems design and development based on Web Mining and NLP. Saarbrücken, Germany: LAP LAMBERT Academic Publishing.
dc.relation.referencesen	18. Vysotska, V. (2018). Computer linguistics for online marketing in information technology: Monograph. Saarbrücken, Germany: LAP LAMBERT Academic Publishing.
dc.relation.referencesen	19. Lytvyn, V., Vysotska, V., Wojcik, W., & Dosyn, D. (2017). A method of construction of automated basic ontology. In Computational linguistics andintelligent systems (COLINS 2017). National Technical University “KhPI”.
dc.relation.referencesen	20. Lytvyn, V., Vysotska, V., Chyrun, L., Smolarz, A., & Naum, O. (2017). Intelligent system structure for Web resources processing and analysis. In Computational linguistics andintelligent systems (COLINS 2017). National Technical University “KhPI”.
dc.relation.referencesen	21. Berko, A., Vysotska, V., & Chyrun, L. (2014). Features of information resources processing in electronic content commerce. Applied Computer Science, 10.
dc.relation.referencesen	22. Berko, A., Vysotska, V., & Rishnyak, I. (2008). Metody ta zasoby otsinyuvannya ryzykiv bezpeky informatsiyi v systemakh elektronnoyi komertsiyi.
dc.relation.referencesen	23. Vysotska, V., & Chyrun, L. (2013). Web Content Processing Method for Electronic Business Systems. International Journal of Computers & Technology, 12(2), 3211–3220.
dc.relation.referencesen	24. Vysotska, V., Chyrun, L., & Chyrun, L. (2011). Modelyuvannya etapiv zhyttyevoho tsyklu komertsiynoho web-kontentu.
dc.relation.referencesen	25. Berko, A., Vysotska, V., & Chyrun, L. (2004). Alhorytmy opratsyuvannya informatsiynykh resursiv v systemakh elektronnoyi komertsiyi.
dc.relation.referencesen	26. Vysotska, V., & Chyrun, L. (2011). Commercial Web Content Lifecycle Model.
dc.relation.referencesen	27. Berko, A., & Vysotska, V. (2009). Proektuvannya navihatsiynoho hrafu web-storinok bazy danykh system elektronnoyi kontent-komertsiyi.
dc.relation.referencesen	28. Berko, A., & Vysotska, V. (2009). Semantychna intehratsiya nepovnykh ta netochnykh danykh. Systemy obrobky informatsiyi, (7), 93–98.
dc.relation.referencesen	29. Berko, A., & Vysotska, V. (2007). Modeli ta metody proektuvannya informatsiynykh system elektronnoyi komertsiyi. Avtomatyzyrovannye systemy upravlenyya y prybory avtomatyky, (138).
dc.relation.referencesen	30. Alekseeva, K., Berko, A., & Vysotska, V. (2015). Upravlinnya Web-resursamy za umov nevyznachenosti. Tekhnolohycheskyy audyt y rezervy proyzvodstva, (2 (2)), 4–7.
dc.relation.referencesen	31. Vysotska, V., & Chyrun, L. (2014). Designing features of architecture for e-commerce systems [Electronic resource]. MEST Journal, 2(1), 57–70.
dc.relation.referencesen	32. Vysotska, V., & Chyrun, L. (2014). Set-theoretic models and unified methods of information resources processing in e-business systems. Applied Computer Science, 10.
dc.relation.uri	https://www.druva.com/
dc.relation.uri	https://searchitchannel.techtarget.com/tip/Explaining-deduplication-rates-and-single-instance-storage-to-clients
dc.relation.uri	https://searchdatabackup
dc.relation.uri	https://www.petri.com/datadeduplication-introduction
dc.relation.uri	https://cloud
dc.relation.uri	https://opendedup.org/odd/overview/
dc.relation.uri	https://www.infoarena.ro/blog/rolling-hash
dc.relation.uri	https://searchdatabackup.techtarget.com/tip/Inline-vs-post-processing-deduplication-appliances
dc.relation.uri	https://cloud.google.com/solutions/partners/storreduce-cloud-deduplication
dc.rights.holder	© Національний університет “Львівська політехніка”, 2019
dc.rights.holder	© Русин Б. П., Погрелюк Л. В., Висоцька В. А., Осипов М. М., 2019
dc.subject	дедублікація даних
dc.subject	розподіл даних
dc.subject	хмарне середовище
dc.subject	cloud computing
dc.subject	алгоритм Рабіна
dc.subject	хешування даних
dc.subject	гібридна дедублікація
dc.subject	data deduplication
dc.subject	data sharing
dc.subject	cloud environment
dc.subject	cloud computing
dc.subject	Rabbin algorithm
dc.subject	data hashing
dc.subject	hybrid deduplication
dc.subject.udc	004.9
dc.title	Метод дедублікації та розподілу даних у хмарних сховищах під час резервного копіювання даних
dc.title.alternative	Method of data dedublication and distribution in cloud warehouses during data backup
dc.type	Article

Files

Original bundle

Now showing 1 - 2 of 2

Name:: 2019n6_Rusyn_B-Method_of_data_dedublication_1-12.pdf
Size:: 1.3 MB
Format:: Adobe Portable Document Format

Download

Name:: 2019n6_Rusyn_B-Method_of_data_dedublication_1-12__COVER.png
Size:: 423.37 KB
Format:: Portable Network Graphics

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 3.08 KB
Format:: Plain Text
Description:

Download

Collections

Вісник Національного університету "Львівська політехніка". Інформаційні системи та мережі. – 2019. – Випуск 6