PSOBER: PSO based entity resolution

dc.citation.epage583
dc.citation.issue4
dc.citation.spage573
dc.contributor.affiliationУніверситет Султана Мулая Слімана
dc.contributor.affiliationSultan Moulay Slimane University
dc.contributor.authorАассем, Й.
dc.contributor.authorГафіді, І.
dc.contributor.authorХалфі, Г.
dc.contributor.authorАбутабіт, Н.
dc.contributor.authorAassem, Y.
dc.contributor.authorHafidi, I.
dc.contributor.authorKhalfi, H.
dc.contributor.authorAboutabit, N.
dc.coverage.placenameЛьвів
dc.coverage.placenameLviv
dc.date.accessioned2023-11-01T07:49:13Z
dc.date.available2023-11-01T07:49:13Z
dc.date.created2021-03-01
dc.date.issued2021-03-01
dc.description.abstractПов’язування об’єктів — це задача зіставлення записів у базі даних з відповідними об’єктами. Задача пов’язування об’єктів є множиною задач через відсутність повної інформації в записах, варіантний розподіл записів для різних об’єктів, а іноді і перекривання записів різних об’єктів. У цій роботі запропоновано метод вирішення цієї проблеми без необхідності зовнішнього контролю. Вищезгадана задача подається як задача про розбиття. Після цього, запропоновано методику на основі алгоритму оптимізації для вирішення задачі пов’язування об’єктів. Запропонований підхід дозволяє визначити розподіл записів за категоріями. Порівняльний аналіз із генетичним алгоритмом за наборами даних доводить ефективність запропонованого підходу.
dc.description.abstractEntity Resolution is the task of mapping the records within a database to their corresponding entities. The entity resolution problem presents a lot of challenges because of the absence of complete information in records, variant distribution of records for different entities and sometimes overlaps between records of different entities. In this paper, we have proposed an unsupervised method to solve this problem. The previously mentioned problem is set as a partitioning problem. Thereafter, an optimization algorithm-based technique is proposed to solve the entity resolution problem. The presented approach enables the partitioning of records across entities. A comparative analysis with the genetic algorithm over datasets proves the efficiency of the considered approach.
dc.format.extent573-583
dc.format.pages11
dc.identifier.citationPSOBER: PSO based entity resolution / Y. Aassem, I. Hafidi, H. Khalfi, N. Aboutabit // Mathematical Modeling and Computing. — Lviv : Lviv Politechnic Publishing House, 2021. — Vol 8. — No 4. — P. 573–583.
dc.identifier.citationenPSOBER: PSO based entity resolution / Y. Aassem, I. Hafidi, H. Khalfi, N. Aboutabit // Mathematical Modeling and Computing. — Lviv : Lviv Politechnic Publishing House, 2021. — Vol 8. — No 4. — P. 573–583.
dc.identifier.doi10.23939/mmc2021.04.573
dc.identifier.urihttps://ena.lpnu.ua/handle/ntb/60432
dc.language.isoen
dc.publisherВидавництво Львівської політехніки
dc.publisherLviv Politechnic Publishing House
dc.relation.ispartofMathematical Modeling and Computing, 4 (8), 2021
dc.relation.references[1] Yin X., Han J., Yu P. S. Object Distinction: Distinguishing Objects with Identical Names. IEEE 23rd International Conference on Data Engineering. 1242–1246 (2007).
dc.relation.references[2] Christen P., Goiser K. Quality and Complexity Measures for Data Linkage and Deduplication. Quality Measures in Data Mining. 127–151 (2007).
dc.relation.references[3] Hern´andez M. A., Stolfo S. J. The merge/purge problem for large databases. ACM SIGMOD Record. 24 (2), 127–138 (2007).
dc.relation.references[4] Mishra S., Mondal S., Saha S. Entity matching technique for bibliographic database. Database and expert systems applications. DEXA 2013. 34–41 (2013).
dc.relation.references[5] Draisbach U., Naumann F., Szott S., Wonneberg O. Adaptive Windows for Duplicate Detection. 2012 IEEE 28th International Conference on Data Engineering. 1073–1083 (2012).
dc.relation.references[6] Christen P. Data Matching: Concepts and Techniques for Record Linkage. Entity Resolution and Duplicate Detection. Springer (2012).
dc.relation.references[7] Aassem Y., Hafidi I., Aboutabit N. Enhanced Duplicate Count Strategy: Towards New Algorithms to Improve Duplicate Detection. NISS2020: Proceedings of the 3rd International Conference on Networking, Information Systems & Security. Article No. 58, 1–7 (2020).
dc.relation.references[8] Benkhaled H., Berrabah D., Boufares F. A novel approach to improve the Record Linkage process. 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT). 1504–1509 (2019).
dc.relation.references[9] De Carvalho D. M., Laender A. H. F., Goncalves M. A., Da Silva A. S. A genetic programming approach to record deduplication. IEEE Transactions on Knowledge and Data Engineerin. 24 (3), 399–412 (2012).
dc.relation.references[10] Isele R., Bizer C. Learning expressive linkage rules using genetic programming. Proceedings of the VLDB Endowmen. 5 (11), 1638–1649 (2012).
dc.relation.references[11] Lyaqini S., Nachaoui M., Quafafou M. Non-smooth classification model based on new smoothing technique. Journal of Physics: Conference Series. 1743 (1), 012025 (2021).
dc.relation.references[12] Golberg D. E. Genetic algorithms in search, optimization, and machine learning. Addion Wesley Professional (1989).
dc.relation.references[13] Ribeiro Filho J. L., Treleaven P. C., Alippi C. Genetic algorithm programming environments. Computer. 27 (6), 28–43 (1994).
dc.relation.references[14] Mishra S., Saha S., Mondal S. GAEMTBD: Genetic algorithm based entity matching techniques for bibliographic databases. Applied Intelligence. 47, 197–230 (2017).
dc.relation.references[15] Eberhart R. C., Kennedy J. A new optimizer using particle swarm theory. MHS’95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science. 39–43 (1995).
dc.relation.references[16] Cali´nski T., Harabasz J. A dendrite method for cluster analysis. Communications in Statistics. 3 (1), 1–27 (1972).
dc.relation.references[17] Tang J., Zhang J., Yao L., Li J., Zhang L., Su Z. Arnetminer: extraction and mining of academic social networks. KDD ’08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 990–998 (2008).
dc.relation.references[18] Tang J., Fong A. C. M., Wang B., Zhang J. A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering. 24 (6), 975–987 (2012).
dc.relation.references[19] Wang X., Tang J., Cheng H., Yu P. S. ADANA: Active name disambiguation. 2011 IEEE 11th International Conference on Data Mining. 794–803 (2011).
dc.relation.references[20] Nachaoui M. Parameter learning for combined first and second order total variation for image reconstruction. Advanced Mathematical Models & Applications. 5 (1), 53–69 (2020).
dc.relation.references[21] Wang J., Li G., Yu J. X., Feng J. Entity matching: how similar is similar. Proceedings of the VLDB Endowment. 4 (10), 622–633 (2011).
dc.relation.references[22] Sun Y., Wu T., Yin Z., Cheng H., Han J., Yin X., Zhao P. BibNetMiner: mining bibliographic information networks. SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 1341–1344 (2008).
dc.relation.references[23] DeRose P., Shen W., Chen F., Lee Y., Burdick D., Doan A., Ramakrishnan R. DBLife: A community information management platform for the database research community. CIDR. 169–172 (2007).
dc.relation.references[24] Jin H., Huang L., Yuan P. Name disambiguation using semantic association clustering. 2009 IEEE International Conference on e-Business Engineering. 42–48 (2009).
dc.relation.references[25] Mishra S., Saha S., Mondal S. Cluster validation techniques for bibliographic databases. Proceedings of the 2014 IEEE Students’ Technology Symposium. 93–98 (2014).
dc.relation.references[26] Rousseeuw P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics. 20, 53–65 (1987).
dc.relation.references[27] Xie X. L., Beni G. A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence. 13 (8), 841–847 (1991).
dc.relation.references[28] Mishra S., Saha S., Mondal S. On validation of clustering techniques for bibliographic databases. 2014 22nd International Conference on Pattern Recognition. 3150–3155 (2014).
dc.relation.references[29] Cramer N. L. A representation for the adaptive generation of simple sequential programs. Proceedings of the First International Conference on Genetic Algorithms. 183–187 (1985).
dc.relation.references[30] Holland J. H. Adaptation in natural and artificial systems. MIT (1975).
dc.relation.references[31] De Carvalho M. G., Laender A. H., Goncalves M. A., Da Silva A. A genetic programming approach to record deduplication. IEEE Transactions on Knowledge and Data Engineering. 24 (3), 399–412 (2012).
dc.relation.references[32] Isele R., Bizer C. Learning expressive linkage rules using genetic programming. Proceedings of the VLDB Endowment. 5 (11), 1638–1649 (2012).
dc.relation.references[33] Wagner R. A., Fischer M. J. The String-to-String Correction Problem. Journal of the ACM. 21 (1), 168–173 (1974).
dc.relation.references[34] Kondrak G. N-gram similarity and distance. Proceedings of the 12th international conference on String Processing and Information Retrieval. 115–126 (2005).
dc.relation.references[35] Hsu W. J., Du M. W. Computing a longest common subsequence for a set of strings. BIT Numerical Mathematics. 24, 45–59 (1984).
dc.relation.references[36] Christen P., Churches T. Febrl–Freely extensible biomedical record linkage. ANU Computer Science Technical Reports (2002).
dc.relation.referencesen[1] Yin X., Han J., Yu P. S. Object Distinction: Distinguishing Objects with Identical Names. IEEE 23rd International Conference on Data Engineering. 1242–1246 (2007).
dc.relation.referencesen[2] Christen P., Goiser K. Quality and Complexity Measures for Data Linkage and Deduplication. Quality Measures in Data Mining. 127–151 (2007).
dc.relation.referencesen[3] Hern´andez M. A., Stolfo S. J. The merge/purge problem for large databases. ACM SIGMOD Record. 24 (2), 127–138 (2007).
dc.relation.referencesen[4] Mishra S., Mondal S., Saha S. Entity matching technique for bibliographic database. Database and expert systems applications. DEXA 2013. 34–41 (2013).
dc.relation.referencesen[5] Draisbach U., Naumann F., Szott S., Wonneberg O. Adaptive Windows for Duplicate Detection. 2012 IEEE 28th International Conference on Data Engineering. 1073–1083 (2012).
dc.relation.referencesen[6] Christen P. Data Matching: Concepts and Techniques for Record Linkage. Entity Resolution and Duplicate Detection. Springer (2012).
dc.relation.referencesen[7] Aassem Y., Hafidi I., Aboutabit N. Enhanced Duplicate Count Strategy: Towards New Algorithms to Improve Duplicate Detection. NISS2020: Proceedings of the 3rd International Conference on Networking, Information Systems & Security. Article No. 58, 1–7 (2020).
dc.relation.referencesen[8] Benkhaled H., Berrabah D., Boufares F. A novel approach to improve the Record Linkage process. 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT). 1504–1509 (2019).
dc.relation.referencesen[9] De Carvalho D. M., Laender A. H. F., Goncalves M. A., Da Silva A. S. A genetic programming approach to record deduplication. IEEE Transactions on Knowledge and Data Engineerin. 24 (3), 399–412 (2012).
dc.relation.referencesen[10] Isele R., Bizer C. Learning expressive linkage rules using genetic programming. Proceedings of the VLDB Endowmen. 5 (11), 1638–1649 (2012).
dc.relation.referencesen[11] Lyaqini S., Nachaoui M., Quafafou M. Non-smooth classification model based on new smoothing technique. Journal of Physics: Conference Series. 1743 (1), 012025 (2021).
dc.relation.referencesen[12] Golberg D. E. Genetic algorithms in search, optimization, and machine learning. Addion Wesley Professional (1989).
dc.relation.referencesen[13] Ribeiro Filho J. L., Treleaven P. C., Alippi C. Genetic algorithm programming environments. Computer. 27 (6), 28–43 (1994).
dc.relation.referencesen[14] Mishra S., Saha S., Mondal S. GAEMTBD: Genetic algorithm based entity matching techniques for bibliographic databases. Applied Intelligence. 47, 197–230 (2017).
dc.relation.referencesen[15] Eberhart R. C., Kennedy J. A new optimizer using particle swarm theory. MHS’95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science. 39–43 (1995).
dc.relation.referencesen[16] Cali´nski T., Harabasz J. A dendrite method for cluster analysis. Communications in Statistics. 3 (1), 1–27 (1972).
dc.relation.referencesen[17] Tang J., Zhang J., Yao L., Li J., Zhang L., Su Z. Arnetminer: extraction and mining of academic social networks. KDD ’08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 990–998 (2008).
dc.relation.referencesen[18] Tang J., Fong A. C. M., Wang B., Zhang J. A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering. 24 (6), 975–987 (2012).
dc.relation.referencesen[19] Wang X., Tang J., Cheng H., Yu P. S. ADANA: Active name disambiguation. 2011 IEEE 11th International Conference on Data Mining. 794–803 (2011).
dc.relation.referencesen[20] Nachaoui M. Parameter learning for combined first and second order total variation for image reconstruction. Advanced Mathematical Models & Applications. 5 (1), 53–69 (2020).
dc.relation.referencesen[21] Wang J., Li G., Yu J. X., Feng J. Entity matching: how similar is similar. Proceedings of the VLDB Endowment. 4 (10), 622–633 (2011).
dc.relation.referencesen[22] Sun Y., Wu T., Yin Z., Cheng H., Han J., Yin X., Zhao P. BibNetMiner: mining bibliographic information networks. SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 1341–1344 (2008).
dc.relation.referencesen[23] DeRose P., Shen W., Chen F., Lee Y., Burdick D., Doan A., Ramakrishnan R. DBLife: A community information management platform for the database research community. CIDR. 169–172 (2007).
dc.relation.referencesen[24] Jin H., Huang L., Yuan P. Name disambiguation using semantic association clustering. 2009 IEEE International Conference on e-Business Engineering. 42–48 (2009).
dc.relation.referencesen[25] Mishra S., Saha S., Mondal S. Cluster validation techniques for bibliographic databases. Proceedings of the 2014 IEEE Students’ Technology Symposium. 93–98 (2014).
dc.relation.referencesen[26] Rousseeuw P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics. 20, 53–65 (1987).
dc.relation.referencesen[27] Xie X. L., Beni G. A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence. 13 (8), 841–847 (1991).
dc.relation.referencesen[28] Mishra S., Saha S., Mondal S. On validation of clustering techniques for bibliographic databases. 2014 22nd International Conference on Pattern Recognition. 3150–3155 (2014).
dc.relation.referencesen[29] Cramer N. L. A representation for the adaptive generation of simple sequential programs. Proceedings of the First International Conference on Genetic Algorithms. 183–187 (1985).
dc.relation.referencesen[30] Holland J. H. Adaptation in natural and artificial systems. MIT (1975).
dc.relation.referencesen[31] De Carvalho M. G., Laender A. H., Goncalves M. A., Da Silva A. A genetic programming approach to record deduplication. IEEE Transactions on Knowledge and Data Engineering. 24 (3), 399–412 (2012).
dc.relation.referencesen[32] Isele R., Bizer C. Learning expressive linkage rules using genetic programming. Proceedings of the VLDB Endowment. 5 (11), 1638–1649 (2012).
dc.relation.referencesen[33] Wagner R. A., Fischer M. J. The String-to-String Correction Problem. Journal of the ACM. 21 (1), 168–173 (1974).
dc.relation.referencesen[34] Kondrak G. N-gram similarity and distance. Proceedings of the 12th international conference on String Processing and Information Retrieval. 115–126 (2005).
dc.relation.referencesen[35] Hsu W. J., Du M. W. Computing a longest common subsequence for a set of strings. BIT Numerical Mathematics. 24, 45–59 (1984).
dc.relation.referencesen[36] Christen P., Churches T. Febrl–Freely extensible biomedical record linkage. ANU Computer Science Technical Reports (2002).
dc.rights.holder© Національний університет “Львівська політехніка”, 2021
dc.subjectпов’язування об’єктів
dc.subjectіндекс валідності кластера
dc.subjectметод рою частинок
dc.subjectміра відстані
dc.subjectгенетичний алгоритм
dc.subjectнекерований алгоритм
dc.subjectentity resolution
dc.subjectcluster validity index
dc.subjectparticle swarm optimization
dc.subjectdistance measure
dc.subjectgenetic algorithm
dc.subjectunsupervised algorithm
dc.titlePSOBER: PSO based entity resolution
dc.title.alternativePSOBER: пов’язування об’єктів на основі PSO
dc.typeArticle

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
2021v8n4_Aassem_Y-PSOBER_PSO_based_entity_573-583.pdf
Size:
1.04 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
2021v8n4_Aassem_Y-PSOBER_PSO_based_entity_573-583__COVER.png
Size:
446.9 KB
Format:
Portable Network Graphics

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.84 KB
Format:
Plain Text
Description: