Improved deep CNN for facial recognition tasks

Abstract

To address the issues of insufficient accuracy and significant impact of image noise on model performance in current facial recognition systems, this study proposes a new framework that integrates wavelet transform, an improved multi-task cascaded convolutional neural network (MTCNN), and genetic algorithms (GA). This framework utilizes wavelet transform for image denoising; optimizes MTCNN by introducing a hybrid threshold function and a confidence candidate box retention mechanism to effectively address information loss issues; and applies GA for feature compensation and optimization. Experimental results demonstrate that this method significantly improves the accuracy, recall rate, regression value, and model stability of facial recognition, effectively enhancing the system’s robustness under noise interference, and provides an effective solution for optimizing facial recognition performance.

Keywords

accuracy data noise feature extraction genetic algorithm multi-task convolutional neural network

Introduction

Facial recognition technology has always been highly concerned and extensively applied in many fields, including security systems, social media, finance, healthcare, etc.¹ With the continuous development of computer vision and deep learning fields, significant progress has been made in practical recognition technology. Deep learning methods are the main research direction currently.² Facial recognition is commonly used in several applications, such as providing security access control at airports, commercial buildings, residential areas, and other places. Facial recognition is used for identity verification in banks and other financial institutions, which is used for mobile banking login. Personalized services or recommendations based on customer’s facial data can be provided in retail and marketing. Facial recognition can help police identify suspects or find missing persons, etc. Despite its great potential, facial recognition technology faces some technical and ethical challenges, such as recognition accuracy in different lighting and postures, as well as privacy and data protection. The Multi-Task Cascaded Convolutional Neural Network (MTCNN) algorithm is a deep learning algorithm for facial detection that uses cascaded deep Convolutional Neural Network (CNN) to achieve multi-task processing of faces, including facial detection, keypoint localization, and bounding box regression.³ MTCNN has achieved an efficient facial recognition system by effectively handling the information flow between different tasks.⁴ Rathnayake et al. addressed the challenges posed by the regional dependency of fruit types in automatic fruit recognition. To fill the gap in existing research on convolutional neural network algorithms, which did not fully cover the 131 categories of the Fruit-360 dataset and had poor computational efficiency, they proposed a novel study based on a cascaded adaptive network fuzzy reasoning system. The research results show that this method achieves a relative accuracy rate of 98.36%, with high weighted accuracy, recall rate, and F-score, and demonstrates better applicability and computational efficiency compared to the latest algorithms.⁵ It is evident that this method can be used for fruit species identification analysis, but further exploration is needed for facial recognition analysis. Shakrani KV et al. proposed the use of open-source computer vision (OpenCV) and convolutional neural network (CNN) technologies for real-time recognition and detection of mask wearing in public places, airports, and army bases. The research results indicate that by constructing a dataset containing specific directories and subdirectories, and applying data augmentation preprocessing methods, the proposed model trained using TensorFlow and Keras achieved a training accuracy of 0.93, validation accuracy of 0.94, and classification accuracy of 0.95.⁶ It can be seen that the new method can achieve recognition and analysis of different scenarios, but further exploration and analysis are still needed for the effectiveness of face recognition.

This study aims to improve the accuracy and overall performance of facial recognition using feature extraction techniques such as MTCNN algorithm and wavelet transform, as well as Genetic Algorithm (GA) to improve the system model. The study proposes an improved MTCNN algorithm that uses a hybrid threshold function for the initial and fixed thresholds of the algorithm. Confidence candidate frames are used to enhance the information representation of the image, in order to solve the losing candidate frames in the algorithm and improve its performance. This study is structured from 4 parts. The first one is an overview of the current domestic and international research content. The second part analyzes the designed algorithm. The third part is performance testing of the system model through experiments. The fourth is a summary analysis of the current research content. The core innovations of this study mainly include the original improvement of the MTCNN framework, innovative application of GA, collaborative innovation of wavelet transform and deep learning, and key innovative evidence in performance verification. The study optimized the face detection process of MTCNN by combining threshold functions and confidence candidate box retention mechanisms, solving the problem of candidate box loss caused by fixed thresholds in traditional methods. At the same time, an information loss compensation mechanism was introduced to enhance the robustness of the model to low-quality images.

Related works

Under the context of artificial intelligence, social recognition technology has become a research hotspot. This technology is widely used in security authentication, monitoring systems, intelligent interaction, and other fields. Zhi Yang Wang et al. put forward a low-rank representation solution to address the neglected specific local structures and noisy samples from different views. This led to decreased recognition ability in multi-view real recognition.⁷ The new method was implemented through a layered Bayesian approach, which constrained and matched it with a linear combination. The experiment outcomes demonstrated that compared with the most advanced classification and clustering methods, this method was more effective. Chen et al. put forward a feature extraction solution supported by neighborhood weighted average and central symmetry to solve the facial recognition under complex lighting conditions.⁸ A feature fusion algorithm was proposed, which combined the advantages of directional gradient histograms. Compared with other latest algorithms, this algorithm had more robust performance under complex lighting conditions. McGugin et al. put forward a method based on 7T ultra high-resolution magnetic resonance imaging to investigate the relationship between facial recognition and vehicle recognition ability with cortical thickness. The research results indicated that individuals with strong facial recognition abilities had relatively thin cortical areas in the selective brain regions of the face.⁹ Those with strong vehicle recognition abilities had relatively thick cortical areas in the same regions. Xue et al. proposed a robust prototype dictionary and robust change dictionary construction method to improve the accuracy of single sample factual recognition per person. The new method utilized dictionary learning methods to obtain atoms. Effective atoms through suggested function indexing methods were selected to construct robust prototype dictionaries and robust change dictionaries. The experimental results showed that RPRV had strong robustness to facial recognition in unconstrained environments, which was superior to state-of-the-art SRC-based methods.¹⁰

He et al. put forward a novel occlusion simulation method for the robustness of facial recognition to occlusion, which involved discarding in carefully selected channels. This method simulated real occlusion through spatial regularization and local perception channel removal. A module was designed to improve the contribution rate of non-occluded areas. The experiment outcomes displayed that the designed method had significant improvements on various benchmarks.¹¹ Liu et al. proposed a multi-factor joint normalization network based on generative adversarial networks to address the challenges faced by unconstrained facial recognition. This method could normalize multiple factors simultaneously, including posture, lighting, and facial expressions. The experiment outcomes said that the proposed solution could synthesize multi-factor normalization results while preserving identity.¹² Xiaoqian et al. conducted an experiment to investigate at what spatial resolution the human brain could recognize familiar faces from unfamiliar individuals. Whether this process depended on the distance between the observer and the face was investigated. The results showed that in blurred images with increased spatial resolution, the neural response in the occipital temporal region appeared and quickly reached saturation at approximately 6.3–8.7 cycles. The neural response disappeared when resolution decreased.¹³ Hao et al. proposed a hyperspectral real recognition solution to address the band misalignment and high data dimensionality in hyperspectral real recognition. The experiment outcomes demonstrated that the algorithm achieved excellent results on three popular hyperspectral facial databases, outperforming most other methods.¹⁴

In summary, in the current field of social recognition, there are many problems such as insufficient accuracy, high noise in facial image data, invasion of personal privacy in social recognition, and low recognition accuracy caused by occlusion. These are key issues in social recognition. How to improve the accuracy of current facial recognition and reduce data noise is the main issue of current research. This study proposes a new wavelet transform MTCNN and GA-based facial recognition system to reduce the impact of noise in facial data recognition. Meanwhile, the GA model is used to improve the feature compensation and expression. The recognition of facial data processing may need to protect the privacy of the individual data. Different races and groups of people directly affect recognition effects. There are differences. Therefore, the research proposes to improve the MTCNN method, which can solve the denoising problem in facial data recognition. The improved method can also improve the recognition performance of the model. GA can realize the feature compensation and data representation in facial recognition. Therefore, GA is used to improve MTCNN for facial recognition.

Facial recognition model for improving deep CNN algorithms

This work focuses on the facial recognition accuracy by building a new facial recognition model. The study analyzes the main framework and structure of the model. Then, the model is improved by adding GA and feature extraction techniques to improve the precision and accuracy of facial image recognition.

Deep CNN algorithm for facial recognition

In facial recognition analysis, it is crucial to analyze and detect the data and location information of the face. To analyze facial data stably, wavelet transform and improved MTCNN facial detection algorithm are used. The MTCNN algorithm is an efficient facial detection and alignment technique widely used to accurately detect faces and their key feature points from images. The advantage of MTCNN lies in its hierarchical structure, which consists of three stages of CNNs connected in series to achieve fast and accurate facial detection. The current system framework structure is shown in Figure 1.

Figure 1.

System model framework structure. (Image source: https://www.1001freedownloads.com/free-photo/boy-face-happy-child).

In Figure 1, the data processing and loading module for facial images includes several operational steps, including facial image segmentation, data normalization, and balanced distribution of facial images. Then, in the data denoising, the image data are first subjected to threshold conversion and processing. Then, the threshold is selected and analyzed to transfer the reconstructed facial image data into the algorithm. The processed facial data can only be input into the simulation model. The obtained facial used to construct the facial image. In the denoising process of the model, analyzing the original image to reduce its resolution size is of great significance. A relative threshold size is set. The eliminated noise data with a threshold coefficient are simply retained. The data below the threshold are set to 0. Finally, the wavelet transform coefficient is used to reconstruct the facial image through the wavelet inverse transform. At this point, the obtained facial data have already eliminated the influence of noise. Wavelet transform is used in the denoising process of face recognition. Firstly, Daubechies series wavelet bases are used to capture local features. Secondly, at the decomposition level, an adaptive hierarchical strategy is adopted, where the first and second layers mainly handle random noise, while the third and higher layers capture structural noise. In terms of threshold processing, a hybrid threshold function is proposed, which uses differential thresholds for different frequency sub bands and optimizes threshold weights through GA to dynamically balance denoising and feature preservation. The schematic diagram of the current MTCNN main stage process framework is shown in Figure 2.

Figure 2.

Schematic diagram of MTCNN main stage process framework. (Image source: https://www.1001freedownloads.com/free-photo/boy-face-happy-child).

From Figure 2, when the detected image information is input, the MTCNN structure will enlarge and shrink the image structure, and build a pyramid model of the image. Then, the three CNN structures of MTCNN are used to detect and adjust the facial image. Finally, the current required facial image data are output. The largest detection structure in the MTCNN model is the selection of facial frames. This structure calculates the intersection of the human facial images based on the detection results, while retaining only the maximum result for the intersection values. At this point, the obtained threshold will not be too low, which will reduce the accuracy of the filtered image. The intersection calculation is shown in equation (1)^15,16

K = \frac{A \cap B}{A \cup B} = \frac{A \cap B}{A + B - A \cap B}

(1)

In equation (1), $A$ represents the area of the trained face selection image. $B$ represents the selection area of the face selection image. $K$ represents the intersection ratio of the facial image. The selection status of the current image is determined by the intersection ratio. The main image processing flow of the MTCNN model is shown in Figure 3.

Figure 3.

The main image processing flow of the MTCNN model. (Image source: https://www.1001freedownloads.com/free-photo/boy-face-happy-child).

From Figure 3, the MTCNN model first obtains the height and width. Then, if the image data are too large, a facial frame is generated. The coordinate information of the image is obtained through confidence recognition. Then, the image data are denoised and input into the network model. The frame is then calculated for fit while removing duplicate facial images. Finally, the final size of the facial wireframe is obtained. The input sample value size of the algorithm at this time is shown in equation (2)^17,18

L^{\det} = - (y_{i}^{\det} \log (p_{i}) + (1 + y_{i}^{\det}) (1 - \log (p_{i})))

(2)

In equation (2), $y_{i}^{\det}$ represents the label value of the facial sample. $p_{i}$ represents the symbol of the network’s facial probability size. $L^{\det}$ represents the symbol of the input sample size. The above equation defines the relationship between input sample size and network processing capability, ensuring that the input data meets the requirements of the model and directly affects the calculation of the loss function and keypoint localization, which is the fundamental step of forward propagation of the model. The loss function value of the facial target wireframe at this time is shown in equation (3)

L^{b o x} = ‖ {\hat{y}}_{i}^{b o x} - y_{i}^{b o x} ‖

(3)

In equation (3), ${\hat{y}}_{i}^{b o x}$ represents the coordinate information of the facial bounding box output by the model, $y_{i}^{b o x}$ represents the wireframe boundary coordinates of the real face, and $L^{b o x}$ represents the loss function value of the facial image. The key points obtained at this point are shown in equation (4)^19,20

L^{l a m d m a r k} = ‖ {\hat{y}}_{i}^{l a m d m a r k} - y_{i}^{l a m d m a r k} ‖

(4)

In equation (4), $L^{l a m d m a r k}$ represents the coordinates of the current facial image keypoints. ${\hat{y}}_{i}^{l a m d m a r k}$ represents the coordinates output by the keypoint network. $y_{i}^{l a m d m a r k}$ represents the true coordinates of the keypoints. Adding up the loss function values of all images yields the weighted loss, as shown in equation (5)

L = \min \sum_{i = 1}^{N} \sum_{j \in (\det, b o x, l a n d m a r k)} a_{j} b_{i}^{j} L_{i}^{j}

(5)

In equation (5), $N$ is the model training total sample size. $a_{j}$ is the weight size of the model. $b_{i}^{j}$ represents the sample type. $L_{i}^{j}$ represents the weight size of different image wireframes. $L$ represents the total loss function value of the model. $\det$ represents detection box, $b o x$ represents bounding box, and $l a n r d m a r k$ represents landmark. Due to the information loss that occurs when the current model directly selects facial images, reducing the algorithmic confidence can reduce the wireframe information loss, as shown in equation (6)^21,22

S_{i} = {\begin{cases} S_{i}, K (M, β_{i}) < N_{i} \\ S_{i} (1 - K (M, β_{i})), K (M, β_{i}) \geq N_{i} \end{cases}

(6)

In equation (6), $N_{i}$ represents the threshold size of the model. $S_{i}$ represents the regression wireframe of the model. $M$ represents the wireframe size with the highest confidence. $β_{i}$ represents the regression wireframe size for facial selection. The final confidence can be obtained by equation (6) to modify the confidence level based on the threshold.

Improvement of facial recognition algorithm and system model building for deep CNN

The study improves the model in the above subsection to improve the recognition accuracy of the current system. This is achieved by adding the computational factor, the original image feature extraction, and GA into the system model. The improved model calculates the initial compensation factor of the MTCNN algorithm to obtain the original image. Then, the feature data through GA are compensated and selected to obtain the weighted image feature map of the feature model. At this time, the obtained facial feature image is the most accurate feature image. Figure 4 shows the process of extracting the initial model feature image.

Figure 4.

Process of initial image extraction of model features. (Image source: https://www.1001freedownloads.com/free-photo/boy-face-happy-child).

In Figure 4, first, the improved feature image converts the inserted facial data to an image with a larger gray value. Second, a new compensation factor is generated on the basis of the gray image. The optimal compensation coefficient is solved through GA for the compensation factor. Then, the obtained optimal compensation coefficient is input into the image recognition system of MTCNN to generate the most original feature description image. The image is then split into multiple sub-image data. A new facial image is obtained by calculating the pixel size of the sub-images. Finally, the obtained image information is dimensionally reduced to obtain the final high-precision image. Among them, GA, with its powerful global search capabilities and adaptability, dynamically adjusts compensation coefficients to perform pixel-level compensation on image sub regions, generating high-precision feature maps that enhance the model’s robustness to low-quality images. GA not only optimizes compensation coefficients but also further improves feature expression capabilities through global optimal solution search, avoiding the feature degradation issues encountered by traditional methods in complex scenes. Many coefficients need to be compensated in the improved system model, which can be used to lift the size of the feature values. The obtained feature images can be represented by compensation factors, as shown in equation (7)²³

H = A + S_{1} \cdot F_{1} + S_{2} \cdot F_{2} + S_{3} \cdot F_{3} + \dots + S_{n} \cdot F_{n}

(7)

In equation (7), $S_{n}$ represents the calculation factor of the image. $F_{n}$ represents the size of the compensation coefficient. $A$ represents the initial calculation factor. The new image obtained after compensation is compared to process and compare the original characteristic information of the compensation more intuitively, as shown in Figure 5.

Figure 5.

Comparison of new images obtained after compensation. (Image source: https://www.1001freedownloads.com/free-photo/boy-face-happy-child).

From Figure 5, the obtained new image data obtain a new histogram data image after improving the recognition rate and data layout. Meanwhile, the pixel value of the generated image also changes. The image value obtained by feature compensation can also generate new image pixels. At this time, the pixel size of the two images depends on the compensation coefficient size and calculation factor. Therefore, when these two values change, the clarity and accuracy of images can be greatly improved. A new compensation factor calculation method needs to be added to obtain a better compensation coefficient, as shown in equation (8)

S_{t 1} (i, j) = {\begin{cases} A (i + 1, j), i \geq 1, i < m \\ S_{t 1} (i - 1, j), i = m \end{cases}

(8)

In equation (8), $S_{t 1}$ represents the left bias value of the image. $i$ represents row index, $j$ represents column index. The other parameter sizes are the same as above. The calculation factor obtained at this point is shown in equation (9)^24,25

S_{1} = | A - S_{t 1} |

(9)

In equation (9), $S_{1}$ represents the initial calculation factor. $A$ represents the initial calculation factor. The right bias matrix of the model at this time is shown in equation (10)

S_{t 2} (i, j) = {\begin{cases} A (i - 1, j), i > 1, i \leq m \\ S_{t 1} (i + 1, j), i = 1 \end{cases}

(10)

In equation (10), $S_{t 2}$ represents the right bias value of the image. The other parameters are the same as above. The upper bias matrix of the model is shown in equation (11)^26,27

S_{t 3} (i, j) = {\begin{cases} A (i - 1, j), i > 1, i \leq n \\ S_{t 3} (i + 1, j), i = 1 \end{cases}

(11)

In equation (11), $S_{t 3}$ represents the upper bias value of the image. The other parameter sizes are consistent with the above. The size of the lower bias matrix is shown in equation (12)

S_{t 4} (i, j) = {\begin{cases} A (i, j + 1), j \geq 1, j \leq n \\ S_{t 4} (i, j - 1), j = n \end{cases}

(12)

In equation (12), $S_{t 4}$ represents the lower bias value of the image. The other parameter sizes are consistent with the above. The final size of the model difference matrix obtained is shown in equation (13)^28–30

S_{5} = (S_{1} - S_{2}) + (S_{3} - S_{4})

(13)

In equation (13), $S_{5}$ represents the size of the final calculation factor obtained. In the improved model, GA is used for image dimension processing due to its strong global search ability and dimension processing ability. Figure 6 shows the current improved model.

Figure 6.

Improved system model process. (Image source: https://www.1001freedownloads.com/free-photo/boy-face-happy-child).

In Figure 6, when the facial image data are input, the image is firstly processed and analyzed on the basis of the three levels of CNN through the MTCNN model. The initial image information is subjected to normalization, segmentation, and equalization. The data information of the image is obtained through the feature extraction after obtaining the new image data. The obtained featured image is inputted into the GA. GA is applied to remove the signal from the image. Signal noise is reduced to get the population optimum of the facial image data. Then, the boundary is pulled based on the confidence of the image. The global optimal solution of the current image data is calculated. Then, the global optimal solution obtained is input into the MTCNN again to calculate the final image data. The dimensionality is reduced. The image is scaled to the size 24*24 through the MTCNN model of the image cropping. The overlap between the image borders is calculated. The image repetition is removed through the model. Then, the final image obtained at this point is a clearer and more accurate facial image. Finally, the facial data are analyzed through simulation experiments to explore the practical feasibility and performance effect of the research algorithm.

The proposed facial recognition system should process biometric data, which may pose a risk of identity exposure, especially in financial security applications. The experiments used the CelebA dataset, which is dominated by celebrities, and the VGGFace2 dataset, which has an imbalanced demographic distribution, which may amplify racial and gender biases. Therefore, in future research improvements, wavelet denoising techniques should be used to remove identity-related features, fairness metrics should be used to test the model across different subgroups, and “privacy by design” principles should be integrated to achieve GDPR-compliant deployment. The formula for testing indicators used in the study is shown in equation (14)

Re c a l l = (\frac{T P}{T P + F N}) \times 100 %

(14)

In equation (14), $Re c a l l$ represents recall rate, $T P$ represents true cases, and $F N$ represents false negative cases. The calculation formula for regression value is to measure the similarity between predicted bounding boxes and real bounding boxes through the intersection and union ratio, and convert it into a percentage form.

Test of the proposed deep CNN for facial recognition

When processing input images using the MTCNN model, pyramid scaling is first performed to construct multi-scale images. The images are then uniformly scaled to 24×24 pixels through cropping. The images are normalized and histogram equalized, and segmentation is performed to balance the distribution of different categories of faces. The CelebA dataset is divided equally into two independent subsets for cross-validation. The dataset used for the research experiments is the CelebA dataset, which consists of 200,000 facial image data. The CelebA dataset includes several different facial expression information. More than 10,000 real face instances include facial images obtained under different environmental conditions. The selected system CPU for the study is 4Cores, with 32 GB of RAM and 100 GB of disk. The initial learning rate of the algorithm is set to 0.01, and the weight value is set to 0.0005. In the pre-experiments, multiple learning rates (0.001, 0.01, and 0.1) are compared. A learning rate of 0.01 avoids excessive oscillation and quickly converges at the beginning of the model. The weight decay parameter is set to 0.0005, which is chosen after testing different values (0.0001, 0.0005, and 0.001) on the validation set. The selected value of 0.0005 provides good results between suppressing overfitting and maintaining dynamic balance in model training. The dataset is divided into two equally sized facial image datasets. The recognition accuracy of the traditional algorithm, Local Binary Patterns (LBP), Internal Gateway Protocol (IGP), Load Average Algorithm (LAA), and the algorithm used in the study is compared to test the feasibility of the current research method model, as shown in Figure 7.

Figure 7.

Accuracy of four algorithms. (a) Dataset 1, (b) Dataset 2.

From Figure 7(a), the accuracy of the algorithm increased with the increase of sample data. However, when the image data is between 250 and 2000, except for the algorithm used in the study, the accuracy change curves of all other algorithms showed significant fluctuations. This may be due to the better denoising effect and algorithmic performance of the research algorithm for image data. However, in Figure 7(b), the accuracy changes of the four algorithms showed an upward trend, without significant fluctuation. This may be due to the increase in dataset improving algorithm performance. However, the algorithm tended to a steady state subsequently. The accuracy values of the research algorithms used in datasets 1 and 2 were higher than those of the other three algorithms. The highest accuracy of the research algorithms used in dataset 1 was 92.3%. The highest accuracy was 94.6% in dataset 2. In dataset 1, the accuracy was 16.0%, 10.0%, and 8.7% higher than that of 76.3%, 82.3%, and 83.6% of LBP, LGP, and LAA algorithms. In dataset 2, the accuracy was 15.3%, 10.1%, and 13.3% higher than that of 79.3%, 84.5%, and 81.3% of LBP, LGP, and LAA. This is due to the fact that the study uses methods that use better noise reduction processing and optimal solution of the data. In Figure 7(a), it is shown that the accuracy of LAA and the algorithm used in the study deviates from a linear relationship and decreases in some cases. This is mainly due to the complexity of the dataset, including factors such as noise, low-quality images, and lighting variations. In addition, when the sample size is small, the performance of the algorithm may also be greatly affected. Figure 7(b) shows a slight deviation from the linear relationship, which may be due to the diversity of the dataset, the degree of model optimization, and the randomness in the training process. To test the effectiveness of the current research method in ablation testing, the recognition accuracy of the research algorithm is compared with the MTCNN and GA. The results are shown in Figure 8.

Figure 8.

Comparison of accuracy of ablation pressure recognition using algorithm models. (a) Dataset 1, (b) Dataset 2.

From Figure 8(a), during the ablation experiment, the algorithm used in the study had the highest accuracy in facial recognition in dataset 1. The accuracy values of the three algorithms increased with the increase of facial data, and then tended to stabilize. This may be caused by the fact that all three methods show an increase in algorithmic performance first after the data increase, followed by a stable performance. When the accuracy change was relatively stable, the highest change value of the algorithm in dataset 1 was 92.3%. The accuracy was about 1.1% and 4.7% higher than the 91.2% of the MTCNN model and 87.6% of the GA model. From Figure 8(b), the accuracy in dataset 2 was 94.6%, which was about 7.2% and 10.4% higher than the MTCNN model’s 87.4% and GA model’s 84.2%. The method used in the research results in better model performance. This may be caused by the fact that the methodology used in the study combines the advantages of both models. The other performance of the current research method, such as the successful recognition, recognition accuracy, recall, and regression value is tested. 1000 facial image data are used to compare the research algorithm with the five algorithms mentioned above. The test outcomes are provided in Table 1.

Table 1.

Performance comparison of different algorithms.

Algorithm model	Recognition success rate	Precision (%)	Recall (%)	Regression value (%)
GA	976	97.6	96.5	98.4
MTCNN	962	96.2	98.4	96.4
LGP	986	98.6	98.1	98.3
LAA	984	98.4	96.5	99.4
LBP	975	97.5	98.5	98.5
Research usage methods	997	99.7	99.5	99.7

From Table 1, the method used achieved the highest recognition success rate of 997 when the facial data were the same. Meanwhile, the model recognition accuracy, recall, and regression values of this method were the highest at 99.7%, 99.5%, and 99.7%, respectively. Compared with other algorithms, the accuracy of the research algorithm was 3.5% higher than the lowest algorithm. The recall was 4% higher than the lowest algorithm. The regression value was 3.3% higher than the lowest algorithm. The performance of the research algorithm is better than other algorithms. The recognition and detection effects of the research algorithm is better. This may be due to the current research using models that can better recognize facial data. To test the impact of algorithm feature coefficients on the algorithm used in the current study, 8 different compensation coefficients are selected for testing. The compensation coefficients are randomly combined to generate 10 different combinations. Subsequently, the accuracy of the research algorithm is compared with the MTCNN and GA, as shown in Table 2.

Table 2.

Comparison test of accuracy of random compensation coefficients for three algorithms.

/	Accuracy (%)
Combination	MTCNN	GA	The research method
1	66.30	69.45	71.23
2	67.42	70.32	73.24
3	67.95	70.62	73.95
4	68.23	65.31	71.25
5	60.23	64.85	73.25
6	61.24	67.35	72.35
7	96.32	69.25	73.95
8	63.95	65.23	72.86
9	64.57	68.21	72.68
10	66.84	69.84	73.84

From Table 2, the proposed method had the highest accuracy value among different compensation coefficient combinations. When combining the compensation coefficients of 3, the accuracy of the research algorithm was 73.95%. Compared with the accuracy of 67.95% in the MTCNN model, it was 6.00% higher. Compared with the accuracy of 70.62% in GA, it was 3.33% higher. The algorithm accuracy is improved after adding compensation coefficients. This may be due to the improvement of the overall validation effect of the model after adding compensation coefficients, which enhances the accuracy of facial image recognition. The half error rate and error rate of the current model, GA, and MTCNN model are analyzed and compared, as shown in Figure 9. The smaller the value of the half error rate, the better the model.

Figure 9.

Comparison of half error rate and error rate of three models. (a) Error rate, (b) Half error rate.

From Figure 9(a), the error rates of the three algorithms increased with the increase of sample size, and then tended to a relatively stable state. The algorithm used in the study had the lowest error rate in a relatively stable state, only 6.2%, which was 1.5% and 1.9% lower than the 7.7% and 8.1% of other algorithms, respectively. The performance of the research algorithm is better, with relatively lower recognition errors. This may be because research algorithms adopt better processing mechanisms when processing data. From Figure 9(b), the half error rate of the three algorithms decreased with the increase of sample size. The lowest half error rate of the research algorithm was 1.9%, which was 0.1% and 0.9% lower than the half error rate values of other algorithms of 2.0% and 2.8%. The research algorithm has better performance. From the comparison, the model in facial data recognition has a more efficient recognition efficiency, while the recognition effect and accuracy are better improved. In Figure 9, the error rate of the MTCNN model shows a small linear deviation. This is mainly because the MTCNN model adopts a multi task cascaded convolutional neural network structure in its design, which can effectively handle multiple subtasks in face detection. To test the stability of the current research method, the loss function changes of five algorithms are compared and tested, as shown in Figure 10.

Figure 10.

Comparison of loss function changes among five algorithm models.

From Figure 10, the size of the loss function value in the five algorithms decreased with the increase of iterations. When the iteration reached a certain value, the loss function value reached a relatively stable state. The algorithm used in the study had a minimum loss function value of only 1.5 when the loss function was relatively stable. Compared with other algorithms, the loss function value was smaller. Therefore, the research algorithm is more stable compared with other algorithms. Facial recognition on different datasets is compared to test the feasibility of the research algorithm on different datasets. Datasets used include Web-Face, VGGFace, LFW, MS-Celeb-1M, and FERET. 1000 data are used for the detection and analysis, as shown in Table 3.

Table 3.

Comparison of test results for different datasets.

Datasets	Number of successful detection	Accuracy, %	Recall, %	F1, %
CASIA-WebFace	996	99.6	99.7	99.7
VGGFace2	998	99.8	99.5	99.7
LFW	999	99.9	99.6	99.6
MS-celeb-1M	997	99.7	99.7	99.7
CelebA	998	99.8	99.8	99.8

From Table 3, in the five datasets, the number of faces recognized by the research algorithm exceeded 990. The recognition accuracy of all the datasets was greater than 99.5%. These results indicate that the research algorithm is able to effectively analyze and recognize the facial data in the dataset. In the tests of the recall rate and the F1 value, the recall and the F1 value of the research algorithm were at a very high level. Therefore, the research algorithm can effectively recognize facial data, which has good recognition performance. The accuracy of different algorithms is compared between the testing set and the training set, as shown in Figure 11.

Figure 11.

Comparison of model test set and training set test results. (a) Training set, (b) Testing set.

From Figure 11(a), in the training set, the accuracy of the four algorithms improves with the increase of data volume. The research algorithm had the highest recognition accuracy of 98.56%, which was 30.02% higher than the lowest LAA algorithm’s 68.54%. This indicates that the research algorithm has better facial recognition in the training set. From Figure 11(b), in the testing set, the change in recognition accuracy of the four algorithms was the same as the change in the training set. The research algorithm has a higher recognition accuracy. The research algorithm has better facial recognition effect in the comparison test between the testing set and the training set, which may be due to its better processing ability for facial data. A comparative analysis of more advanced algorithmic models and research using algorithmic models yielded the results shown in Table 4.

Table 4.

Comparison of performance metrics for different models.

Model/method	Occlusion robustness, %	Lighting adaptability, %	Expression robustness, %	Pose robustness, %
Research using models	92.3	94.6	99.7	90.3
MTCNN baseline	86.2	87.4	96.2	85.5
Channel dropout	90.1	82.3	88.6	78.4
Multi-factor GAN	88.7	91.5	95.8	93.2
Robust dictionary learning	89.3	88.7	92.4	83.6
LBP-HOG fusion	79.5	90.8	84.3	75.2

As shown in Table 4, the research method achieved optimal performance in the three major challenges of occlusion, lighting, and facial expressions. Specifically, it achieved 92.3% robustness in occlusion, 94.6% robustness in lighting, 99.7% robustness in facial expressions, and 90.3% robustness in posture recognition. This demonstrates that the research method has better facial recognition performance. The analysis of failure cases during the testing process is shown in Table 5.

Table 5.

Results of failure case analysis.

Failure type	Root cause	Improvement direction
Poor expression adaptation	GA compensation relies on texture continuity	Add expression-invariant feature extraction
Feature loss under occlusion	Wavelet transform’s layered strategy fails for cross-layer structural occlusion	Introduce attention mechanisms
Cross-race bias	Imbalanced training data distribution	Use race-balanced datasets
Noise amplification in extreme lighting	Threshold function misclassifies directional shadows as structural noise	Fuse illumination-invariant features

From Table 5, it can be seen that most recognition failure cases can be avoided by adjusting the model performance and testing process. Therefore, further adjustments should be made to the model performance and testing recognition process in the future research.

The study statistically validated different algorithm models, specifically verifying the significant differences between the research algorithm and the traditional LBP model. The results showed z = 18.6, p < 0.001. This indicates that the research algorithm achieved a 16.0% improvement in accuracy compared to the LBP model, with significant statistical significance (p < 0.001). In the 95% confidence interval comparison, the confidence interval for the research algorithm is [91.5% and 93.1%], while that for the LBP algorithm is [74.5% and 78.1%]. In the statistical validation of the genetic algorithm, z = 3.96, p < 0.001. All comparisons were significant (p < 0.01), and there were significant statistical differences between the validation effects of the research algorithm and the traditional algorithm (p < 0.01). Figure 12 shows some pseudocode used in the study.

Figure 12.

Pseudocode.

To reduce the complexity of the model used in the study, the processing of each frequency band in the wavelet transform can be performed in parallel. Additionally, the compensation coefficients can be fixed to eliminate the need for online optimization, and wavelet convolution can be accelerated using a GPU. Overall, the proposed wavelet-GA-MTCNN fusion strategy achieves a significant improvement in accuracy at the cost of approximately a 33% increase in runtime, with occlusion robustness improved by 6.1%, but real-time performance reduced to 6.25 FPS. This makes it suitable for security or access control systems that prioritize accuracy over real-time performance. Therefore, further optimization of feature expression efficiency will be pursued in subsequent research. Since the accuracy difference between the training and testing datasets is minimal, it indicates that the model performs well on unseen data. Additionally, the use of multiple diverse and representative datasets, combined with regularization techniques (such as weight decay) and stability validated through multiple experiments, further supports the model’s generalization ability. Therefore, the high accuracy is more likely due to reasonable model design, proper data preprocessing, and the effective application of regularization techniques, rather than overfitting.

Conclusion

The study proposes a new facial recognition model that integrates wavelet transform, MTCNN, and GA, effectively addressing the issues of insufficient accuracy and significant image noise in existing facial recognition systems. Experiments show that on Dataset 1 and 2, the model achieves maximum accuracy rates of 92.3% and 94.6%, respectively, representing improvements of 8.7%–16.0% over traditional algorithms such as LBP and LGP. The model achieves a recognition success rate of 99.7%, with recall and precision rates of 99.5% and 99.7%, respectively, and an error rate as low as 6.2%, demonstrating significantly superior stability compared to the comparison algorithms. The new method significantly improves the accuracy and robustness of face recognition through wavelet transform denoising, GA-optimized feature compensation, and MTCNN framework improvements. However, the model’s adaptability to facial expressions still requires further optimization.

Footnotes

ORCID iD

Yuhua Peng

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Cao

, et al. Neural mechanism of noise affecting face recognition. Neuroscience 2021; 468(3): 211–219.

Madarkar

Sharma

Singh

. Sparse representation for face recognition: a review paper. IET Image Process 2021; 15(9): 1825–1844.

Zaman

FHK

. Locally lateral manifolds of normalised Gabor features for face recognition. IET Comput Vis 2020; 14(4): 122–130.

Kar

Neogi

PPG

. Triangular coil pattern of local radius of gyration face for heterogeneous face recognition. Appl Intell 2020; 50(3): 698–716.

Rathnayake

Dang

, et al. An efficient automatic fruit-360 image identification and recognition using a novel modified cascaded-ANFIS algorithm. Sensors 2022; 22(12): 4401–4402.

Shakrani

Kanyangarara

Parowa

, et al. A deep learning model for face recognition in presence of mask. Acta Inform Malays 2022; 6(2): 38–41.

Wang

Z-Y

Abhadiomhen

Liu

Z-F

, et al. Multi-view intrinsic low-rank representation for robust face recognition and clustering. IET Image Process 2021; 15(14): 3573–3584.

Chen

Gao

, et al. A novel face recognition method based on fusion of LBP and HOG. IET Image Process 2021; 15(14): 3559–3572.

Mcgugin

Newton

Tamber-Rosenau

, et al. Thickness of deep layers in the fusiform face area predicts face recognition. J Cognit Neurosci 2020; 32(7): 1316–1329.

10.

Xue

Ren

. Single sample per person face recognition algorithm based on the robust prototype dictionary and robust variation dictionary construction. IET Image Process 2022; 16(3): 742–754.

11.

Zhang

Shan

, et al. Locality-aware channel-wise dropout for occluded face recognition. IEEE Trans Image Process 2022; 31(5): 788–798.

12.

Liu

Chen

. Multi-factor joint normalisation for face recognition in the wild. IET Comput Vis 2021; 15(6): 405–417.

13.

Xiaoqian

Goffaux

Bruno

. Coarse-to-Fine(r)Automatic familiar face recognition in the human brain. Cerebr Cortex 2021; 32(8): 1560–1573.

14.

Hao

Liu

Xie

. Hyperspectral face recognition with a spatial information fusion for local dynamic texture patterns and collaborative representation classifier. IET Image Process 2021; 15(8): 1617–1618.

15.

Choi

Lee

. Ensemble of deep convolutional neural networks with gabor face representations for face recognition. IEEE Trans Image Process 2020; 29(3): 3270–3281.

16.

Gundogdu

Bianco

. Collaborative similarity metric learning for face recognition in the wild. IET Image Process 2020; 14(9): 1759–1768.

17.

Sadeghzadeh

Ebrahimnezhad

. Pose-invariant face recognition based on matching the occlusion free regions aligned by 3D generic model. IET Comput Vis 2020; 14(5): 268–277.

18.

Koc

Ergin

Gulmezoglu

, et al. Use of gradient and normal vectors for face recognition. IET Image Process 2020; 14(10): 2121–2129.

19.

Yan

Rossion

. A robust neural familiar face recognition response in a dynamic (periodic) stream of unfamiliar faces. Cortex 2020; 132(10): 281–295.

20.

Liu

Jiang

Zhang

, et al. Semi-supervised uncorrelated dictionary learning for color face recognition. IET Comput Vis 2020; 14(3): 92–100.

21.

Mishra

Kumar

Singh

. MmLwThV framework: a masked face periocular recognition system using thermo-visible fusion. Appl Intell 2022; 53(3): 2471–2487.

22.

Hebbi

Mamatha

. Comprehensive dataset building and recognition of isolated handwritten Kannada characters using machine learning models. Artif Intell Appl 2023; 1(3): 179–190.

23.

Adams

Azikwe

Zubair

. Artificial neural network analysis of some selected KDD CUP99 dataset for intrusion detection. Acta inform Malays 2022; 6(2): 55–61.

24.

Sadhin

. Real-time tomato leaf disease classification using convolutional neural network. Acta inform Malays 2023; 7(1): 29–32.

25.

Hidayat

Elviani

Situmorang

, et al. Face recognition for automatic border control: a systematic literature review. IEEE Access 2024; 12(1): 37288–37309.

26.

Melzi

Tolosana

Vera-Rodriguez

, et al. FRCSyn-onGoing: benchmarking and comprehensive evaluation of real and synthetic data to improve face recognition systems. Inf Fusion 2024; 107(6): 102322–102323.

27.

Opanasenko

Fazilov

Mirzaev

, et al. An ensemble approach to face recognition in access control systems. J Mob Multimed 2024; 20(3): 749–768.

28.

Srivastava

Bag

. Modern-day marketing concepts based on face recognition and neuro-marketing: a review and future research directions. Benchmark Int J 2024; 31(2): 410–438.

29.

George

Ecabert

Shahreza

, et al. Edgeface: efficient face recognition model for edge devices. IEEE Trans Biom Behav Identity Sci 2024; 6(2): 158–168.

30.

Serengil

Özpınar

. A benchmark of facial recognition pipelines and co-usability performances of modules. Bilişim Teknolojileri Dergisi 2024; 17(2): 95–107.