Abstract
Facial occlusions like sunglasses, masks, caps etc. have severe consequences when reconstructing the partially occluded regions of a facial image. This paper proposes a novel hybrid machine learning approach for occlusion removal based on Structural Similarity Index Measure (SSIM) and Principal Component Analysis (PCA), called SSIM_PCA. The proposed system comprises two stages. In the first stage, a Face Similar Matrix (FSM) guided by the Structural Similarity Index Measure is generated to provide the necessary information to recover from the lost regions of the face image. The FSM generates Related Face (RF) images similar to the probe image. In the second stage, these RF images are considered as related information and used as input data to generate eigenspaces using PCA to reconstruct the occluded face region exploiting the relationship between the occluded region and related face images, which contain relevant data to recover from the occluded area. Experimental results with five standard datasets viz. Caspeal-R1, IMFDB, and FEI have proven that the proposed method works well under illumination changes and occlusion of facial images.
Introduction
In the field of face recognition, computer vision has become a hot burning field, especially when dealing with occlusion. Occlusion can take many forms, including eyeglasses, sunglasses, masks, and many others. Presently, as the entire world is combating with COVID-19, people wear masks at all times in all places. As a result, their faces are obscured, making even familiar faces difficult to recognize. Face recognition is now used in real-time in a variety of settings, including airports, businesses, and mobile phone applications for locking and unlocking the devices. Face occlusion removal is essential in these cases to identify the subjects behind the masks for fast and unobtrusive services. Many authors have proposed various algorithms for reconstructing and restoring occluded regions, including the PCA, Fisherface algorithm, Linear Discriminant Analysis, hidden Markov model etc. However, these approaches do not demonstrate reasonable performances when tested with arbitrary data which vary completely from the training data.
To overcome this limitation, this research harnesses structural information of the facial images for occlusion detection and reconstruction. To locate an appropriate person, a face similarity calculation is used. The term “similarity” refers to the degree of structural similarity between two images. Generally, some similarity measures, such as Euclidean distance, Minkowski distances, cosine-based distances etc. are used to identify a specific person. SSIM is a similarity measurement used in face recognition. The primary advantage of SSIM is that it is a perception stand form that deals with structural data at the pixel level, even in spatially closed regions, whereas PSNR and MSE deal with absolute errors.
Briefly stated, our proposed work is as follows. To reconstruct occluded faces effectively, FSM has been proposed as an effective method. FSM (Fi) provides additional information to fill occluded regions by identifying similar faces from the gallery face dataset that are similar to the probe image using Similarity Computation Matrix (SCM) computation. These similar faces are recommended for the reconstruction process using PCA for face recognition. Instead of using massive images, similar or related face images are used for probe image reconstruction, resulting in a small computational time.
Rest of this paper is organized as below. Section 2 provides a brief account of the related works in this context and Section 3 describes proposed work and explains the restoration of occluded regions using SSIM_PCA. Image classifications are presented in Section 4 and experimental results with five dataset (CAS-PEAL-R1, IMFDB and FEI), are discussed in Section 5 and the paper is concluded in Section 6.
Related work
Several algorithms for the de-occlusion of face images have been proposed by various authors. In [1], a network of in-painting on the face is used as the foundation for Weighted Face Similarity (WFS-Net) to produce an improved restoration. The author of the paper [2] used effective pixels to detect and restore occluded pixels to reconstruct occluded face images. Sonu Agrawala et al. [3] used local texture descriptors to reduce the PCA dimension and age function model. Markus Storer et al. [4] proposed a two-stage approach for detecting outliers which, in enormous smaller subspaces are created to detect outliers in the first stage, while robust least-square fitting is used to detect outliers in the second stage. In [5], the authors employ an hypothesize and test paradigm to determine the coefficients of eigen images from a subset of image points. The competing hypotheses are subjected to Minimum Description Length principle to eliminate outliers in occlusion detection and employ multiple eigen image classes. In [6], the authors used a single sample per person methodology to solve a variation of face images and follow similarity measures for recognition. To solve a single sample per person problem, a combination of Traditional and Deep Learning (TDL) techniques is used. Inverse Euclidean distance similarity calculation [7] takes place for facial recognition using geometrical approximated PCA (gaPCA). Face recognition based on perceptual hash, which is used for feature extraction and preprocessing, was proposed in [8]. The authors employ Discrete Wavelet Transform (DWT) and a graph-oriented technique known as the Quintet Triple Binary Pattern (QTBP).This approach employs uses K-Nearest Neighbors (KNN) and Support Vector Machine (SVM) for face classification. In [9], the face is defined using a Local Gradient Number Pattern (LGNP) and the local gradient information and grey position are extracted using the Sobel operator. Further, Fuzzy Convex-Concave Partition (FCCP) was employed for capturing fray transitions in the local neighborhood and extraction of additional local and global information for face recognition. In [10], the authors utilize PCA, Local Binary Pattern, and pyramid pooling and propose a three-layer system with one convolutional layer, a nonlinear layer, and a pooling layer, implying that high-end hardware requirements are essential for face recognition. In [11], feature based similarity measure which combines the best features of SSIM and Feature Similarity Index Measure (FSIM) is used to resolve the limitations with feature and structural similarity measures in the detection of similar and dissimilar facial images. The author [12] makes three contributions. The author first discusses the application of cosine similarity following discriminant analysis, then the inadequacy problem of cosine similarity, and finally a new similarity measure called Face Recognition Grand Challenge (FRGC) to improve pattern recognition by the integration of absolute value of angular measure and lp norm. Image quality assessment based on structural information was proposed initially by Zhou Wang et al. [13]. According to a new philosophy, image degradations are viewed as perceived changes in structural information variation. As a result, SSIM measures image similarities effectively considering aspects such as object structure, luminance, and contrast. Jim Nilsson [14] explained the mathematical concepts underlying the SSIM operating principle. In this paper, authors examine the mathematical factors of SSIM and demonstrate that it can produce unexpected, sometimes undefined, and nonintuitive results in both synthetic and realistic use cases. As a result, SSIM is used to evaluate image quality by utilizing contrast, luminance, and similarity. In paper [15], on the reconstructed face image database, this study evaluated the performance using discrete wavelet transform of the principal component analysis and singular value decomposition algorithms (DWT-PCA/SVD) as a preprocessing mechanism. The half-face images were reconstructed using the frontal faces’ bilateral symmetry.
For modular face recognition systems, the author Mehmet Koc [16] presented a method to detect and use the non-occluded areas of the face image using three coefficients (i)image entropy, (ii) image correlation, (iii) root-mean-square error.
By combining a cropping-based strategy with the Convolutional Block Attention Module (CBAM), the author[17] proposes a new method for masked face identification. There are two special application scenarios: using unmasked faces for training to recognize masked faces and using masked faces for training to recognize unmasked faces.
Meixiang Zhao et al. [18] proposed a lighthearted novel. A new ridge regression model is used to propose 2DPCA (Two Dimensional Principal Component Analysis. R2DPCA (Rigid 2DPCA) produces a weighting vector based on label information and maximises a relaxed threshold using an optimal algorithm to obtain the key features.
The feature-based method for 2D face images was presented by the authors [19]. For feature extraction, Speeded Up Robust Features (SURF) and Scale Invariant Feature Transform (SIFT) are used.
In this research thus author [20] concentrates on facial occlusions and proposes an enhancement method using Principal Component Analysis with Singular Value Decomposition using Fast Fourier Transform (FFT-PCA/SVD) for preprocessing face recognition algorithm on face images with missingness and augmented face image database.
The author [21] of this study devised a look-up table-based method as well as a novel efficient restoration strategy. The benefits of spectrum independence of the reference image and pixel missing region were leveraged by the author. Using a reference image obtained from several sensors, significant heterogeneity zones were employed to recreate the pixel missing image.
Though considerable research is done in face recognition based on occlusion detection, the number of significant works based on structural and feature silarity are very few in number.
Proposed work
This section presents the implementation of the proposed occlusion detection system which is implemented in four phases. Generally, computer vision based problems require a broad dataset for training and testing machine learning models. However, large datasets demand sophisticated hardware requirements for effective computations, feature extraction, training and testing. Contrary to this, the proposed system is implemented with a small set of images and minimal hardware requirements demonstrating good performances with smaller computational times and lower error rates.
Pre-processing
In this research, for a given probe image, the images in the gallery face dataset are masked according to the occluded portion of a probe image. The masking process is described as below and the algorithm is given in Table 1.
Face Masking Algorithm
Face Masking Algorithm
Let GFi = [GF1, GF2, ⋯ GFN] be the d-dimensional vector of the i th image. Let GF = linebreak [GF1, GF2, ⋯ , GFi, ⋯ GFN] be the Gallery Face (GF) for all N images. The Table 1 shows the Face Masking Matrix (FMM) algorithm for Masked Face (MF) Dataset.
For comparing the gallery and probe images, the gallery face images need to be multiplied with FMM. The FMM is formed by detecting an occluded region of the probe image. Pixels in the MF dataset’s corresponding gallery face picture have a value of zero. The FMM is very useful in similarity computation because it only compares the similarity between known parts of the occluded probe image and corresponding regions of the gallery face image. The schematic of the proposed masking and Face Similarity Matrix (FSM) system is depicted in Fig. 1.

Proposed pre-processing and SCM architecture.
SSIM is used to compute the similarity of each image in the MF dataset and the probe image. SCM is calculated using the following equation:
μPf, μMFi is the probe’s global and MFi images. σPfMFi, σ2Pf and σ2MFi are covariance of two face images. C1 and C2 are constants to keep away from zero division. For each candidate image, FSM is generated which consists of Related Face Matrix for the corresponding face and finally SCMs are constructed for all images in the MF dataset.
Related Face (RF) images from the GF dataset are chosen from the FSM. Since each FSM is of size four, N = 4 is attributable to the covariance of two face images. These faces are used in the SSIM_PCA reconstruction process to reconstruct the probe image. Figure 2 depicts the use of SSIM to create Related Face images.

Using SSIM to identify related faces from the Gallery Face dataset.
PCA is a machine learning technique for reducing the number of dimensions in an image. It is a well-known method for eigen face-based face recognition and feature extraction. When dealing with an occluded facial image, PCA has certain disadvantages resulting in performance degradations with some images. Generally PCAs constructed from the images are less interpretable compared to the original features and smaller number of PCAs results in loss of fine details. To overcome these limitations, several authors have employed kernel PCAs to capture non-linear relationships between the features.
The prime goal of the proposed system is to reconstruct occluded face images with fewer images and less processing time. Using SSIM, SCM was determined and FSM (Fi) was produced as a result of these factors. The FSM is used to choose Related Face (R) images, which are then used for reconstruction.
Initially, the eigenfaces are constructed from selected RF images. The eigen faces are represented as in Equation (5) where
Where efi is the eigenface for the ith image, d is the dimension and m is the number of eigenfaces, where m < N usually. Eigenfaces or eigenvectors are used to solve the problem of face recognition. In a high-dimensional vector space, the covariance matrix is used to measure eigenvectors. Eigenfaces serve as the origin of all images in the gallery face dataset. These result in a reduction in the size of the original face images. Classification can be accomplished by examining how the source location represents faces. Principal component scores are generated using eigenfaces, and the reconstruction process is carried out.
In [2], the authors claim that eigenspace projection of the probe image yields a low principal component score. Instead of projecting the input image into eigenspace which is common, this approach projects normalized input images into eigenspace. The principal component (PC) score is computed as in Equation (6) where PC represents the principal component, W represents the weight matrix and y is the normalized probe image.
The schematic of the reconstruction process and the corresponding algorithm are shown in Fig. 3 and Table 2 respectively. The pipeline of the reconstruction process depicting the intermediate images generated in each stage is shown in Fig. 4.

Architecture of reconstruction process.
SSIM_PCA algorithm for reconstruction of occluded face images

Reconstruction process pipeline.
SSIM_PCA can be used to reconstruct the entire probe face image and achieve
Finally,
The algorithm for determining the accuracy of reconstruction is given in Table 3. It uses Euclidean distance to equate the reshaped image to the original image before classifying them with gallery pictures.
The algorithm for determining accuracy employs Euclidean distance
The reconstructed images after occlusion removal and reshaping are matched under two categories as below. Given a probe image, under image with image matching, the matching processes identify a similar images from a dataset in which the images are subjected to different kinds of occlusions. Similarly, in image with class matching, a probe image is compared with images within an occlusion class to find a matching image. Image with Image matching and Image with Class matching.
Image with Image matching
In the Image with Image matching, the reshaped image is compared with the set of all gallery images, and the image for which Euclidean distance is minimum is identified to be the closest matching image. This process is illustrated in Fig. 5.

Caspeal-R1 dataset comparison with three different categories of eyeglass occlusion.
The first step in Image with Class Matching is to determine if the probe image belongs to a specific class. After determining the class of the probe image, the reconstruction process begins, and the image with the highest similarity using SSIM becomes the corresponding original image. The schematic of the Image with the Class Matching method is depicted in Fig. 6.

An example of an experimental flow of image classification using the IMFDB dataset’s Class Matching.
This section presents quantitative and visual experimental results with three datasets viz. Caspeal-R1, IMFDB, and FEI; additionally it compares the experimental flow on RMFRD and FGNET datasets and presents their interpretations for a clear understanding of the behavior of the proposed system.
The occluded facial recognition process had two components. To mask the images in the gallery face dataset, the occluded section of a probe image is used. FMM is particularly useful in similarity calculation because it simply examines the similarity between known areas of the occluded probing image and corres-ponding regions of the gallery face image after masking. By calculating SCM and FSM, a minimum number of similar image scans based on the probe image are generated to recover from the occluded region. The reconstruction of the obstructed face took place based on SSIM and PCA to properly restore the original face.
Experiments with data from the CASPEAL-R1 dataset
The Caspeal-R1 dataset [22] includes various frontal image variations, including Normal, Aging, Accessory, Distance, Expression, Background, and Lighting. Normal images are called gallery images in those combinations, whereas Aging, Accessory, and Distance are considered probe images. In addition to that, additive Gaussian white noise is applied to aging images and considered as probe images. Under Aging, there are 66 images in total, (Male-51, Female-15). In 66 images, 33 were used for training and the remaining were used for testing at random. The same set is used for the additive Gaussian white noise probe set. There are 161 males and 133 females from the Distance (D2) group chosen for the probe sets. The Distance (D2) category has a total of 294 images. In 294 images, 50 percent of the images are classified as training, while the remaining 50 percent are classified as testing. Eyeglass types are selected and studies are performed in Accessory variation. There are three types of frames in the world of eyeglasses, and tests are carried out on each of them. There are a total of 200 images in category frame 1 (Full Rim Frame-Black Colored), with 162 females and 38 males. In category frame 2 (Full Rim-Silver Colored), a total of 300 images were selected, with females accounting for 160 and males for 140. In category frame 3 (Rimless frame), 230 images are selected to test the experiments, including both females (170) and males (60). For each eyeglass category, 50 percent is used for training and 50 percent is used for testing. Figure 8 (a) shows experimental results on a Caspeal-R1 dataset of eyeglass category frame1 with various existing methods. Figure 8(b) shows the experiment flow of additive Gaussian white noise of the Caspeal-R1 dataset. Figure 7 depicts some Caspeal-R1 images.

Caspeal-R1 dataset example images.

Shows different state-of-art methods comparison of CAS-PEAL-R1 dataset (a) PCA, (b) FW-PCA, and the proposed method (c) SSIM_PCA.

Experiment flow of additive Gaussian White noise of the Caspeal-R1 dataset.
The Indian Movie Face database (IMFDB) [23] is a large, unrestricted face database consists of 34512 images of 100 Indian actors culled from more than 100 videos. Each actor or actress’s face image was collected from a minimum of three different movies. So, in IMFDB, there are many variations such as occlusion, illumination, poses, and makeup are present. In the IMFDB dataset, one actor or actress was itself varied for the following reasons. Age, Expression, Occlusion, Pose and so on.
As a result, even a single actor has a large number of variations. Actor Amirkhan’s age, expression, occlusion, and pose are all shown in Fig. 9.

Shows some examples of Amirkhan‘s image of different variations of occlusion by hand and eyeglass, expression, pose.
Occlusion and pose with the same style images are chosen to conduct our experiments on these datasets. On the IMFDB and FEI datasets, Fig. (10) compares the occlusion removal procedure to various state-of-the-art methods.
Since it is difficult to find neutral images of an actor or actress due to the movie clippings in the dataset, each actor or actress is considered a class to identify on the IMFDB dataset. IMFDB is an expression database in general. There are a total of 35,512 images of all of the actors, including (64 male actors and 36 female actresses). Experiments are performed on 30 actors (or) classes, each with 10 images with various variations from different movies, from a total of 100 actors. As a result, a total of 300 images are chosen at random from each actor’s or actress’s various films. Each actor has a different hairstyle, makeup, voice, age, and poses for each film. As a result, the proposed work aims to locate the probe image from all gallery images that belong to a specific actor or actress’s class and compare it to the Reshaped image. At random, 150 images were used as training and 150 as testing.
The FEI face database [24] is a face database of Brazilians. It has 14 images each of 200 individuals, for a total of 2800 images. In the FEI face dataset, frontal images are chosen to test the proposed experiments. 200 neutral images are chosen and tested in two different ways. 200 smiling images are used as test datasets in the FEI dataset, while 200 neutral images are used as train datasets. Alternatively, 200 neutral images are artificially occluded in various forms of the same scale and then tested with 200 neutral images. The results of two different types of experiments on the FEI dataset are shown in Fig. 10. Figure 10 depicts the combined experimental results on IMFDB and FEI datasets of various state-of-the-art methods and the proposed method. The artificially occluded face images from the FEI dataset are shown in Table 4. Figure 11 depicts some FEI dataset examples.

Shows a comparison of multiple datasets from IMFDB and FEI using the proposed method SSIM_PCA.
Shows different ways of artificially occluding parts of the facial region

Example images from the FEI dataset.
RMFRD [25] includes 5000 masked faces from 525 people and 90,000 normal faces. It is a sizable dataset. Out of 525 people, the first ten were used, each with five images. So a total of 50 images were used, with 25 serving as training and the remaining 25 serving as testing at random. To begin, the proposed work seeks to identify a person’s specific class based on the input image. Second, based on the class identification, it generates the matched image for the input image. Figure 12 depicts the experimental results for the RMFRD and FGNET datasets. Figure 13 depicts a comparison of the proposed method with the CBAM and GSO methods.

Shows the experimental results for the (a) RMFRD and (b) FGNET datasets.

Results Experiment results on the RMFRD and FGNET datasets using state-of-the-art methods on CBAM and GSO.
Figure 13 depicts a comparison of various methods on different datasets. The occluded image was reconstructed for the input image, and their corresponding original image was generated, indicating that the proposed methods work well.
FG-NET [26] is made up of 1002 images of 82 people ranging in age from 0 to 69. It is a face ageing dataset with significant variations such as pose, illumination, and expression. Each person differs from the next based on their age. As a result, each individual is regarded as a class. First, the proposed work aims to identify the person’s specific class based on the input image. Second, it generates the matched image for the input image based on the class identification. Only 30 of the 82 people in the dataset are used, each with 5 images. So, in total, 150 images are used to put our proposed method to the test. Out of 150 images, 30 are used for testing and 50 are used for training at random.
Calculation of Root Mean Square Error (RMSE)
Root Mean Square Error can be calculated to estimate the difference between the original image and the reconstructed image. This can be done by computing the arithmetic mean of the square root of all images as in Equation (15).
Where N is the number of images,
The structural similarity index measure is used to find similarities between two images in comparing luminance, contrast, and structure as in Equation (16).
SSIM calculates the value of the similarity index for image Y using X as the reference image. In accordance with the experiments, Y is the test image (or) reconstructed image and X is the gallery image.
SSIM returns the value of the neighboring pixels of every pixel in the image Y. The SSIM value ranges from –1 to 1. The value from 0.94 to 1 denotes the specific match of two images. If the two images are different, the SSIM value goes down to zero. Table 7(a) to Table 7(c) indicates the correlation between the gallery and the reconstructed image with their SSIM value and the gallery and test image with the SSIM value (SSIM_val). SSIM generates similarity map (SSIM_Map). Table 7(a) to Table 7(c) describe the SSIM_Map for comparing test and reference images. The maximum SSIM value appeared as bright pixels, whereas the minimum SSIM value appeared as dark pixels, which means that two images are different from each other.
The range of Euclidean distance and SSIM value is represented in Table 5. The range denotes the degree of similarity between two images, whether they are similar or dissimilar.
Calculate the similarity between the original image and the probe image with the following observation of different datasets
Table 6(a) and 6(b) shows the comparison of the proposed method against various existing methodologies on a different dataset. Among the IMFDB data, a total of 300 images, including male and female images, were tested. In the FEI dataset, the smiley face and the artificially occluded face were tested on 400 images. Within the Caspeal-R1 dataset, different experiments were carried out, on Eyeglass with different categories, Aging and Distance
Compares the IMFDB and FEI datasets
Shows the comparison on the Caspeal-R1 dataset
The next three tables Table 7(a) to 7(c) present the results of the Caspeal-R1, IMFDB and FEI dataset experiments.
Comparison of the Original image by reconstructed image with its SSIM value and map of Caspeal-R1 dataset
Comparison of the Original image with the Test image with its SSIM value and map of Caspeal-R1 dataset
The comparison of the original images with the reconstructed image using SSIM with its SSIM value and map of IMFDB and FEI dataset
Table 7 (a) presents the results of the comparison of the original image to the reconstructed image. Caspeal-R1 dataset generated by the proposed work, such as the SSIM value (SSIM_val) and their corresponding similarity map (SSIM_Map). The map illustrates the similarity with larger and brighter regions.
Table 7 (b) presents comparable results of the Original image with the Test image of the Caspeal-R1 dataset, such as the SSIM value and its corresponding similarity map. The map shows the dissimilarity with the dark regions.
Table 7 (c) displays comparable results from the original image with the test image and the reconstructed image of the IMFDB and the FEI dataset with the SSIM value and its corresponding similarity map.
The computer software and hardware requirements used here are CPU: AMD E2-9010 RADEON R2, 4 Compute Cores 2C+2G, RAM: 8GB; Operating System: Windows 10 64bit; Matlab 2017a. Table 8 shows execution time using these system requirements. Figure 14 depicts how the original image and reshaped image are similar. That is, the proposed method successfully reconstructed the occluded face image, and the distance between the reshaped image and the original image is more equal. The distance is measured using Euclidean distance.
Comparison of execution time with existing techniques of the FEI dataset
Comparison of execution time with existing techniques of the FEI dataset

Shows the Euclidean distance of the original image and reshaped image of the CAS-PEAL dataset with the accessory eyeglass frame 1 category as occluded input.
Figure 14 highlighting the similarity between the original and reshaped image. That is the suggested technique effectively recovered the occluded facial image, reducing the gap between the reshaped and original images. To compare the reshaped image and the original image, one of the Caspeal-R1 dataset images with eyeglass category frame 1 as an occluded was used. The distance between the reshaped image and the original image was calculated using Euclidean distance, resulting in a more accurate match.
Because of the COVID-19 outbreak, individuals are wearing masks when they go out. Authors have built many existing face recognition algorithms to distinguish occluded faces, but these algorithms are unable to detect masks and other occluded regions. Under the proposed system, there were two aspects of the occluded facial recognition process. The occluded section of a probe image is used to mask the images in the gallery face dataset. Because it simply examines the similarity between known areas of the occluded probing image and corres-ponding regions of the gallery face image after masking, FMM is particularly useful in similarity calculation. By calculating SCM and FSM, a minimum number of similar image scan be generated according to the probe image to recover from the occluded region. To properly bring back the original face correctly, the reconstruction process of the obstructed face took place based on SSIM and PCA. While, from the point of view of system requirements, the proposed approach works well. Accuracy of recognition will be considered in future work when handling the heavy obstructed portion. As a future improvement, the algorithm should be improved when dealing with images with side poses and it will also be used in deep learning.
Footnotes
Acknowledgments
The research in this paper uses the CAS-PEAL-R1 face database collected under the sponsor of the Chinese National Hi-Tech Program and ISVISION Tech. Co. Ltd. We sincerely thank them for allowing us to use the above mentioned dataset.
Thank you also to the authors [23–25,
] for allowing us to use their dataset.
