Research on the state-of-the-art deep learning based models for face detection and recognition

Проблема побудови системи розпізнавання обличчя стикається з численними викликами, такими як зміни освітлення, пози і вирази обличчя. Основні етапи цього процесу – виявлення, вирівнювання, виділення ознак та представлення обличчя. Кожен з цих етапів має критичне значення для досягнення точної ідентифікації. У статті проаналізовано та порівняно сучасні алгоритми та моделі для виявлення і розпізнавання облич за їх здатністю правильно ідентифікувати справжні позитивні (TP) та справжні негативні (TN) випадки, мінімізуючи хибні негативні (FN) та хибні позитивні (FP) випадки розпізнавання облич. Класичні алгоритми та прості моделі, такі як MediaPipe, забезпечують найвищу швидкодію, але за рахунок меншої точності. Навпаки, складніші моделі, такі як RetinaFace, забезпечують більшу точність за рахунок зниження швидкодії. Для систем, які пріори- тезують максимальну точність виявлення і мінімізацію пропущених облич, рекомендовано такі моделі, як DSFD або RetinaFace-Resnet50, незважаючи на їх повільну роботу та непридатність для реального часу. Якщо основною метою є максимальна швидкість виявлення і прийнятне пропускання облич у неконтрольованих умовах, тоді доцільно вибрати рішення SSD для розпізнавання облич. Для додатків, що потребують балансу між швидкістю та точністю, оптимальною є модель RetinaFace-MobilenetV1, яка забезпечує швидкість виявлення в реальному часі та задовільну точність. Модель ArcFace демонструє найкращі результати із показником TP – 0,92 та TN – 0,91, що вказує на високу точність і у визначенні правильної особи, й у відхиленні невідповідних зображень. ArcFace також підтримує низький рівень FP – 0,09. Наступний FaceNet з показником TP – 0,89, TN якого вражає – 0,94, що демонструє здатність уникати неправильних збігів. Натомість у VGGFace, DeepFace та OpenFace помірні показники TP між 0,61 та 0,78, у поєднанні з вищими рівнями FN та FP. Модель DeepID демонструє найнижчу продуктивність з показником TP – 0,47 та TN – 0,60, що відображає значні труднощі у точному розпізнаванні. Висновки підкреслюють важливість вибору моделей на основі точності, швидкості та ресурсних вимог, пропонуючи RetinaFace та ArcFace/FaceNet як хороші варіанти компромісу.
The problem of building a face recognition pipeline faces numerous challenges such as changes in lighting, pose, and facial expressions. The main stages of the pipeline include detection, alignment, feature extraction, and face representation. Each of these stages is critically important for achieving accurate recognition. The article analyzes and compares modern algorithms and models for face detection and recognition in terms of their ability to correctly identify true positives (TP) and true negatives (TN) while minimizing false negatives (FN) and false positives (FP) in facial recognition. Classical algorithms and lightweight models, such as MediaPipe, offer the highest speeds but sacrifice some accuracy. Conversely, heavier models like RetinaFace deliver greater accuracy at the expense of speed. For systems prioritizing maximum detection accuracy and minimizing missed faces, models like DSFD or RetinaFace-Resnet50 are recommended, despite their slow performance and unsuitability for real-time detection. If the primary goal is maximum detection speed and occasional missed faces in uncontrolled conditions are acceptable, an SSD face recognition solution is preferable. For applications requiring a balanced approach to speed and accuracy, the RetinaFace- MobilenetV1 model is optimal in terms of real-time detection speed and satisfactory accuracy. The ArcFace model demonstrates superior performance with a TP rate of 0.92 and a TN rate of 0.91, indicating a high accuracy in both identifying the correct person and rejecting mismatched images. ArcFace also maintains a low FP rate of 0.09. FaceNet follows with a TP rate of 0.89 and an impressive TN rate of 0.94, showcasing its proficiency in avoiding incorrect matches. In contrast, VGGFace, DeepFace, and OpenFace show moderate TP rates between 0.61 and 0.78, coupled with higher FN and FP rates. The DeepID model exhibits the lowest performance, with a TP rate of 0.47 and a TN rate of 0.60, reflecting substantial difficulties in accurate identification. The conclusions emphasize the importance of selecting models based on accuracy, speed, and resource requirements, suggesting RetinaFace and ArcFace/FaceNet as good trade-off options.

Keywords

детектування облич, розпізнавання облич, згорткові нейронні мережі, виділення ознак, face detection, face recognition, convolutional neural networks, feature extraction

Citation

Research on the state-of-the-art deep learning based models for face detection and recognition / A. Sydor, D. Balazh, Yu. Vitrovyi, O. Kapshii, O. Karpin, T. Maksymyuk // Infocommunication technologies and electronic engineering. — Lviv : Lviv Politechnic Publishing House, 2024. — Vol 4. — No 2. — P. 49–59.

URI

https://ena.lpnu.ua/handle/ntb/116928

Collections

Infocommunication Technologies and Electronic Engineering. – 2024. – Vol. 4, No. 2

Full item page

Research on the state-of-the-art deep learning based models for face detection and recognition

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By