Implementation of an Apache Spark computing cluster based on Raspberry PI microcomputers

dc.citation.epage97
dc.citation.issue2
dc.citation.journalTitleВимірювальна техніка та метрологія
dc.citation.spage92
dc.contributor.affiliationLviv Polytechnic National University
dc.contributor.affiliationLviv Polytechnic National University
dc.contributor.authorVlakh-Vyhrynovska, Halyna
dc.contributor.authorBoretskyi, Bohdan
dc.coverage.placenameЛьвів
dc.coverage.placenameLviv
dc.date.accessioned2025-11-25T13:14:00Z
dc.date.created2025-06-20
dc.date.issued2025-06-20
dc.description.abstractThe paper presents the implementation of an Apache Spark distributed computing cluster based on Raspberry Pi microcomputers. The solution consists of three Raspberry Pi 4 devices (one master node and two worker nodes), each equipped with 8 GB of RAM and a high-speed network connection. The cluster configuration was optimized by adjusting the SPARK_WORKER_MEMORY and SPARK_WORKER_CORES parameters to maximize the use of available hardware resources. Secure communication between nodes was established through authentication using 4096-bit SSH keys. The functionality of the cluster was tested using a test application that demonstrated efficient distribution of computational load across nodes. The developed solution costs $400, which is four times less than the cost of using equivalent cloud resources for one year. The results show that the Raspberry Pi cluster provides all the necessary capabilities for practical learning of distributed computing technologies, offering physical access to all system components at a low cost.
dc.format.extent92-97
dc.format.pages6
dc.identifier.citationVlakh-Vyhrynovska H. Implementation of an Apache Spark computing cluster based on Raspberry PI microcomputers / Halyna Vlakh-Vyhrynovska, Bohdan Boretskyi // Measuring Equipment and Metrology. — Lviv : Lviv Politechnic Publishing House, 2025. — Vol 86. — No 2. — P. 92–97.
dc.identifier.citation2015Vlakh-Vyhrynovska H., Boretskyi B. Implementation of an Apache Spark computing cluster based on Raspberry PI microcomputers // Measuring Equipment and Metrology, Lviv. 2025. Vol 86. No 2. P. 92–97.
dc.identifier.citationenAPAVlakh-Vyhrynovska, H., & Boretskyi, B. (2025). Implementation of an Apache Spark computing cluster based on Raspberry PI microcomputers. Measuring Equipment and Metrology, 86(2), 92-97. Lviv Politechnic Publishing House..
dc.identifier.citationenCHICAGOVlakh-Vyhrynovska H., Boretskyi B. (2025) Implementation of an Apache Spark computing cluster based on Raspberry PI microcomputers. Measuring Equipment and Metrology (Lviv), vol. 86, no 2, pp. 92-97.
dc.identifier.doihttps://doi.org/10.23939/istcmtm2025.02.092
dc.identifier.urihttps://ena.lpnu.ua/handle/ntb/121871
dc.language.isoen
dc.publisherВидавництво Львівської політехніки
dc.publisherLviv Politechnic Publishing House
dc.relation.ispartofВимірювальна техніка та метрологія, 2 (86), 2025
dc.relation.ispartofMeasuring Equipment and Metrology, 2 (86), 2025
dc.relation.referencesen[1] F. Dai, M. A. Hossain, and Y. Wang, “State of the Art in Parallel and Distributed Systems: Emerging Trends and Challenges”, Electronics, vol. 14, No. 4, p. 677, Feb. 2025.DOI: 10.3390/electronics14040677.
dc.relation.referencesen[2] V. Thesma, G. C. Rains, and J. Mohammadpour Velni, “Development of a Low-Cost Distributed Computing Pipeline for High-Throughput Cotton Phenotyping”, Sensors, vol. 24, No. 3, p. 970, Feb. 2024. DOI: 10.3390/s24030970.
dc.relation.referencesen[3] A. Alakuu and D. K. Dake, “Cloud Computing in Education: A review of Architecture, Applications, and Integration Challenges”, IJCA, vol. 186, No. 66, pp. 49–65, Feb. 2025. DOI: 10.5120/ijca2025924472.
dc.relation.referencesen[4] S. Younus, K. Kumar, I. A. Kandhro, A. A. Laghari, and A. Ali, “Systematic Analysis of On Premise and Cloud Services”, IJCC, vol. 13, No. 3, p. 10063641, 2024. DOI:10.1504/IJCC.2024.10063641.
dc.relation.referencesen[5] A. A. Abdulle, A. Farah Ali, and R. H. Abdullah, “Cost- Benefit Analysis of Public Cloud Versus In-House Computing”, IJETT, vol. 70, No. 6, pp. 300–307, Jun. 2022.DOI: 10.14445/22315381/IJETT-V70I6P231.
dc.relation.referencesen[6] A. Katal, S. Dahiya, and T. Choudhury, “Energy efficiency in cloud computing data centers: a survey on software technologies”, Cluster Comput, vol. 26, No. 3, pp. 1845–1875, Jun. 2023. DOI: 10.1007/s10586-022-03713-0.
dc.relation.referencesen[7] G. Agapito and M. Cannataro, “An Overview on the Challenges and Limitations Using Cloud Computing in Healthcare Corporations”, BDCC, vol. 7, No. 2, p. 68, Apr.2023. DOI: 10.3390/bdcc7020068.
dc.relation.referencesen[8] P. K. Donta, I. Murturi, V. Casamayor Pujol, B. Sedlak, and S. Dustdar, “Exploring the Potential of Distributed Computing Continuum Systems”, Computers, vol. 12, No. 10, p. 198, Oct. 2023. DOI: 10.3390/computers12100198.
dc.relation.referencesen[9] “Spark Overview”. Apache Software Foundation [Online]. Available: https://spark.apache.org/docs/latest/
dc.relation.referencesen[10] P. Sewal and Hari Singh, “Performance Comparison of Apache Spark and Hadoop for Machine Learning based iterative GBTR on HIGGS and Covid-19 Datasets”, SCPE, vol. 25, no. 3, pp. 1373–1386, Apr. 2024. DOI:10.12694/scpe.v25i3.2687.
dc.relation.referencesen[11] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica., “Spark: Cluster Computing with Working Sets”,2010 [Online]. Available: https://www.usenix.org/ legacy/event/hotcloud10/tech/full_papers/Zaharia.pdf
dc.relation.referencesen[12] N. Ahmed, A. L. C. Barczak, M. A. Rashid, and T. Susnjak, “A parallelization model for performance characterization of Spark Big Data jobs on Hadoop clusters”, J. Big Data,vol. 8, no. 1, p. 107, Dec. 2021. DOI: 10.1186/s40537-021-00499-7.
dc.relation.referencesen[13] Z.-D. Zhang et al., “TopADDPi: An Affordable and Sustainable Raspberry Pi Cluster for Parallel-Computing Topology Optimization”, Processes, vol. 13, No. 3, p. 633, Feb. 2025. DOI: 10.3390/pr13030633.
dc.relation.referencesen[14] M. Cloutier, C. Paradis, and V. Weaver, “A Raspberry Pi Cluster Instrumented for Fine-Grained Power Measurement”, Electronics, vol. 5, No. 4, p. 61, Sep. 2016. DOI: 10.3390/electronics5040061.
dc.relation.referencesen[15] E. Shoop, S. J. Matthews, R. Brown, and J. C. Adams, “Hands-on parallel & distributed computing with Raspberry Pi devices and clusters”, Journal of Parallel and Distributed Computing, vol. 196, p. 104996, Feb. 2025.DOI: 10.1016/j.jpdc.2024.104996.
dc.relation.referencesen[16] “Spark Configuration.” Apache Software Foundation [Online]. Available: https://spark.apache.org/docs/latest/ configuration.html
dc.relation.referencesen[17] “Amazon EC2 On-Demand Pricing.” AWS [Online]. Available: https://aws.amazon.com/ec2/pricing/on-demand/
dc.relation.urihttps://spark.apache.org/docs/latest/
dc.relation.urihttps://www.usenix.org/
dc.relation.urihttps://aws.amazon.com/ec2/pricing/on-demand/
dc.rights.holder© Національний університет „Львівська політехніка“, 2025
dc.subjectApache Spark
dc.subjectdistributed computing
dc.subjectRaspberry Pi
dc.subjectmicrocomputers
dc.subjectcluster
dc.subjectbig data processing
dc.titleImplementation of an Apache Spark computing cluster based on Raspberry PI microcomputers
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2025v86n2_Vlakh-Vyhrynovska_H-Implementation_92-97.pdf
Size:
194 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.75 KB
Format:
Plain Text
Description: