Abstract
The application of 3D technology is rapidly expanding, and stereoscopic imagery is typically used to display 3D data. However, compression, transmission, and other necessary processes may reduce the quality of these images. Stereo image quality assessment (SIQA) has gained more attention to guarantee that customers have a positive watching experience. In order to provide the highest level of experience, it is necessary to develop a quality evaluation mechanism for stereoscopic content that is both dependable and precise. A full-reference method for SIQA is presented in this paper. Compared to previous measures, this method gives users more freedom to use distorted pixel metrics and edge similarity. The binocular summation map is calculated by adding the left and right images for a stereo pair. Improved gradient similarity based distorted pixel measure (SGSDM) is used to calculate the quality of binocular summation. The scored 3D LIVE IQA database is used to evaluate the correlation of the proposed metric with the DMOS subjective score given by the database. The proposed method’s efficacy is demonstrated by experimental comparisons.
Introduction
Over the past several decades, the stereo and multiview systems have seen rapid development and widespread applications. Stereoscopic images, like 2D images, can be affected by a few common degradations. Due to their numerous applications in printing, compression, communication, analysis, registration, restoration, and enhancement, 2D image quality assessment has received a lot of attention [1–3]. The goal of quality assessment (QA) measures is to automatically estimate the quality of images in accordance with human quality judgments. There are two types of IQA: subjective and objective. Metrics of objective image quality are primarily used to determine the difference between the test and reference images. The best way to judge the quality of an image is through subjective evaluation. The human observers carry it out.
The full-reference measure, the no-reference measure, and the reduce-reference measure are the three categories that can be used to group the majority of the methods that have been suggested for providing an objective assessment of image quality. Full-reference techniques (FR) [21], in which the algorithm has access to a flawless version of the picture to compare the distorted version, are addressed in the first class of methods. The second class, the no-reference methods (NR) [23], simply have access to the test picture and must judge the quality of the image without knowledge of the ideal version. They do not require the reference image for IQA. The reduced-reference approach (RR), which only uses a portion of the reference picture, is the third category of image quality evaluation. It is important to note that stereoscopic 3D pictures fall under this group as well.
The aim of this paper is to use the gradient similarity and distorted pixel measure to evaluate stereo image quality.
The remainder of the paper is organized as follows. In Section 2, some algorithms of image quality measures are presented. In Section 3, the proposed image quality measure is defined. In section 4, performance of the proposed method is compared with others measures using images with different types of distortion. The conclusion is reported in Section 5.
Related works
In the literature many efforts have been completed to develop SIQA measures. The next section breaks out these state-of-the-art SIQA measures per category.
FR methods of stereoscopic image quality assessment
The combination of two quality measures—the difference between original (left or right) and distorted images—and depth information—the difference between its corresponding disparity maps—was the basis for the approach proposed by Benoit et al. [11]. As measures for evaluating image quality, The SSIM [5] and C4 [20] metrics have been chosen.
The authors of [13] propose to investigate the capacity of 2D image quality metrics to integrate disparity information to estimate stereo image quality. There have been eleven 2D image quality metrics used.
A stereoscopic image quality metric is proposed in [14]. The latter, takes into account the Human Visual System (HVS)’s sensitivity to changes in contrast and luminance at high spatial frequency, and matches regions of high spatial frequency between the left and right views of the stereo pair. It is used to evaluate compressed image quality.
A method for objectively assessing the quality of stereo images was initially proposed in [16]. This strategy surveys sound system pictures from two viewpoints: the objective evaluation of the stereo sense between the viewpoint pairs and the image quality. As a result, there are two components to the indicator of stereo image objective assessment: stereo sense and image quality.
Based on the most recent advancements in human visual system (HVS) physiological and psychological research, the authors of [17] proposed a perceptual metric for assessing stereo video quality. A multi-channel vision model based on 3D wavelet decomposition is proposed after an examination of a number of major HVS properties that are associated with stereo video.
Utilizing the Contrast Sensitivity Function (CSF) and multichannel decomposition with cortex filters to derive perceptual views, Hachicha et al. [19] propose a Stereoscopic Image Quality Assessment measure based on HVS modeling. Additionally, the Binocular Just Noticeable Difference (BJND) model is used in the proposed metric to calculate the smallest distortion in each view of the reference stereo pair. The BJND model was likewise used to demonstrate the binocular concealment hypothesis.
A novel region segmentation algorithm for dividing three-dimensional images into occluded and non-occluded regions is first proposed in [30]. The binocular just noticeable difference model reveals the binocular vision of the human visual system in non-occluded regions, while the just noticeable difference model is utilized on occluded regions to formulate monocular vision. In order to address the unstable segmentation issue encountered by conventional approaches, disparity information and Euclidean distance between stereo pairs are utilized.
A Rich Structural Index (RSI) based on multi-scale perception characteristics is proposed in [35]. Later, this incorporates characteristics of visual perception and the image pyramid. In addition to multi-scale depth edge structures, the approach also takes into account the edge structures and hierarchical structures of multi-scale cyclopean maps. To begin, sensitive images of varying resolution are obtained by placing the stereo pair in the CSF-based image pyramid. On gradient maps, the luminance masking and contrast masking are taken into account, and then a locally adaptive local Luminance and Structural Index (LSI) is obtained. Singular Value Decomposition (SVD) is also used to get the Sharpness and Intrinsic Structural Index (SISI), which help to effectively capture the image’s changes
RR methods of stereoscopic image quality assessment
Using the extracted edge information, Hewage et al. [12] proposed a Reduced-reference quality metric for the transmission of 3D depth maps.
Wan et al. [24] created the Reduced-reference measure by simulating the brain’s visual perception with sparse representation and natural scene statistics. In particular, the visual information that is closely associated with the hierarchical progressive process of human visual perception. And, it is measured using the distribution statistics of the classified visual primitives extracted by sparse representation.
In [32], the reduced-reference 3D quality assessment evaluator goes through two important technical steps: the perceptual properties of the human visual system (HVS) and the statistical characteristics of three-dimensional images.
Wang et al. [34] utilized disparity and cyclopean images based on depth perception and binocular visual mechanism to complement as monocular perception and used luminance and chroma images that corresponded to the original image to mimic monocular perception. The entropy of the chromaticity map, texture features of the third phase, energy features, energy differences features, and MSCN coefficients from the high frequency sub-band are among the quality-aware features extracted using the quaternion wavelet transform (QWT). A heterogeneous ensemble model using support vector regression (SVR), extreme learning machine (ELM), and random forest (RF) is built to predict quality score, and bootstrap sampling and rotated feature space are used to increase the diversity of the data distribution.
NR methods of stereoscopic image quality assessment
The reference-less SIQA metrics also have attracted researchers. For instance, For JPEG-coded stereoscopic images, Akhter et al [18] proposes an assessment of image quality with no reference. Local information extraction of disparity and artifacts serves as the foundation for this metric.
A cyclopean image and saliency map-based no-reference stereoscopic IQA has been proposed in [25]. Asymmetrical distortion has been taken into account with the cyclopean image, and saliency aims to select relevant patches from the cyclopean image to focus on the most perceptual relevant regions. In order to estimate the quality, these patches are then fed into a modified version of a CNN model that has already been trained.
A blind stereoscopic IQA metric has been suggested in [26]. The model is based on a sophisticated machine-learning algorithm and human binocular perception. The gradient magnitude (GM), relative gradient orientation (RO), and relative gradient magnitude (RM) maps have been used to extract effective perceptual features. A multiscale gradient map of the cyclopean image was used to account for the various viewing conditions and resolutions of the stereo image. The stereo image features are mapped to the quality score using the AdaBoost neural network.
In [27], another profound element extraction approach has been investigated for NR SIQA. The proposed metric takes into account the phenomenon of binocular rivalry and employs the cyclopean image hypothesis. The cyclopean image is then used to extract a bank of features using four CNN models. Following that, a support vector regression (SVR) is used to map this bank to a quality score.
A cyclopean image and saliency map-based blind stereoscopic IQA has been proposed in [28]. Asymmetrical distortion has been taken into account with the cyclopean image, and saliency aims to concentrate on the most perceptually relevant areas. After that, patches from the cyclopean image’s saliency regions were selected, and a reworked version of the pre-trained vgg-19 [33] model was used to estimate the patches’ quality.
A StereoIF-Net is suggested in [29]. The StereoIF-Net has taken into account all of the human cortex regions’ visual responses to stereoscopic visual signals. It offers a fitting and intricate architecture for replicating the intersection of right and left visual signals in visual cortex regions.
The hypothesis that the perceived quality of the natural stereoscopic view is close to that of the cyclopean image is tested in [31], which generates an intermediate image from the left and right views. The naturalness image quality evaluator score and the entropy score for each subband are calculated using multi-steerable decomposition on cyclopean images.
An algorithm for [36] is proposed that makes use of the characteristics of the spatial domain and the complex contourlet. The across-scale and across-orientation features in the complex contourlet domain as well as the natural scene statistics features in the spatial domain are among the monocular views from which monocular features from the components of CIELAB color space are extracted. The cyclopean image is used for binocular perception to extract energy, energy difference, structural correlation in the complex contourlet domain, and statistics distribution in the spatial domain. The disparity image of stereopairs is also used to extract statistical features related to visual comfort. Finally, the KELM regressor is used to map these features to the objective score.
Proposed method
The complete view of the degraded stereo pair is compared with reference stereo pair in order to use a full-reference image quality metric to evaluate the quality of the test image. Our method’s process can be broken down into four steps: Obtain binocular summation map. Compute the distorted pixel. Calculate the gradient similarity. Obtain the overall assessment as the standard deviation (Fig. 1 shows a flowchart of how the proposed measure is calculated).

Block diagram of proposed image quality assessment method.
Each section and the structure of the proposed metric are interpreted in details in the following parts. The proposed method uses gradient similarity and Ruderman [4] operator to form map.
The following variables and abbreviations used in the proposed method and the rest of manuscript are defined as:
I: reference image.
J: test image.
I L : left reference image.
I R : right reference image.
J L : left test image.
J R : right test image.
M × N: image size.
μ (i, j): local mean within the 3 × 3 block surrounding position (i, j).
σ (i, j): standard deviation of (i, j) within the block.
DM _ map:distorted map.
G I : gradients magnitude of reference image.
G J : gradients magnitude of test image.
G _ map: Gradient map.
NR: no-reference image quality assessment.
RR: reduced-reference image quality assessment.
FR: full-reference image quality assessment.
IQA: image quality assessment.
HVS: human visual system.
SGSDM: gradient stereo gradient similarity based distorted pixel measure.
CC: Pearson’s linear correlation coefficient.
RMSE: Root mean square prediction error.
ROCC: Spearman’s rank order correlation coefficient.
DMOS P : predicted Difference Mean Opinion Score.
MOS: mean opinion score.
VQEG: the standard performance of the video quality experts group.
Both additive impairments and detail losses are summed in the test stereo pair’s binocular summation map. Additive impairments refer to redundant visual information that does not exist in the original but only appears in the test pair, whereas detail losses refer to the loss of useful visual information in the test stereo pair. We can assess the quality of the binocular summation by separating the test image’s binocular summation from the detail losses and additive impairment and comparing the difference to the reference summation.
Before introducing the notion of the proposed measure, some useful concepts must be explored. The stereo reference and test images are represented by I L (I R ) and J L (J R ) respectively. Where L is left image and R is right one. Whereas M × N are image dimensions. The binocular summation map (The reference and test images I and J respectively) of a pair of stereo images can be calculated based on Eq (1) (shown in Fig 2).

Summation of stereo pair.
A distorted pixels measure derived from the reference and test images are used to reflect the local differences between I and J [4]. This cycle can be applied to given intensity image I (i, j) to make:
Information can be extracted from images using a gradient. To generate gradient mages, this section introduces 5 × 5 Sobel operators; It explains as:
The Gradient map (G _ map) is formed as a result of our method for calculating gradient similarity:
Last but not least, the stereo gradient similarity based distorted pixel measure map (SGSDM _ map) can be expressed as follows:
Where

A flowchart showing a high level overview of the proposed image quality assessment method.
The used image databases and measurement procedures
To examine SGSDM’s performance; The video quality experts group (VQEG)’s [6] standard performance is followed. Non-linear mapping was carried out between the subjective scores [12] and the objective scores. The five parametric non-linear mappings (θ1, θ2, θ3, θ4, and θ5) are used to transform the objective quality measure set of quality ratings into the predicted Difference Mean Opinion Score (DMOS) values (DMOS
P
). The mapping function, which is a logistic function, is described in equation (11) [7].
Where VQR is the objective quality rating and θ1, θ2, θ3, θ4, and θ5 are the ones chosen for the best fit. Three metrics are used after the regression: the Spearman rank-order correlations coefficient (ROCC), the Pearson linear correlation coefficient (CC), and the Root mean square prediction error (RMSE). The first is the Pearson linear correlation coefficient (CC), which measures the relationship between subjective (DMOS) and objective (DMOSP) scores. It provides an assessment of the accuracy of the prediction and is defined as:
To test the suggested approach; the LIVE 3D Database of the University of Texas at Austin consists of two phases is utilized: LIVE 3D Phase I Database [22] and LIVE 3D Phase II Database [38]. 20 reference images (shown in Fig. 4) and 365 distorted images (80 each for JEPG, JEPG2000 (JP2K) compression, additive white noise (WN), fast fading (FF) and 45 with Blur) are included in the LIVE 3D Phase I Database.

reference images used in the subjective study. Shown here are only the left-views.
Compression using the JPEG and JPEG2000 compression standards, additive white Gaussian noise, Gaussian blur, and a fastfading model based on the Rayleigh fading channel were the distortions that were simulated. The JPEG compression utility in MATLAB was used to simulate JPEG compression, while the Kakadu encoder was used to simulate JPEG2000 compression. The "quality" parameter and the bitrate were the only variables. The imnoise command in MATLAB was used to simulate additive white Gaussian noise, which was applied equally to the R, G, and B planes. By applying a Gaussian low-pass filter to each of the color planes, Gaussian blur was also simulated. The variance of the Gaussian was the control parameter for both WN and Blur. A JP2K compressed image was used to transmit fast-fading distortion over a Rayleigh fading channel, with the channel Signal-to-Noise ratio (SNR) serving as the control parameter.
LIVE 3D Phase II Database contains 8 reference images and 360 distorted images (72 each for JP2K, JPEG, WN, Blur, and FF).
To illustrate the performance of the proposed method SGSDM, we compare it with 21 state-of-the-art measures, including Benoit [11], Hewage [12], You [13], Gorley [14], Shen [15], Yang [16], Zhu [17], Akhter [18], Hachicha [19], PSNR, SSIM [5], MS-SSIM [37], Messai01 [25], Messai02 [27], Jianwei20 [30], KELM [36], Jianwei22 [29], R3DQAE [32], MO-NIQE [31], Wang2021 [34] and Zhang2022 [35].
Table 1 shows a comparison study with the classic Sobel [8], Prewitt [8], Scharr [9], and the 5 × 5 Sobel. The 5 × 5 Sobel may perform better than the other three.
ROCC values using four gradient operators
ROCC values using four gradient operators
On each 3D IQA database, the ROCC, CC, and RMSE performance of the image quality assessment methods are compared in Tables 2, and 4. Each assessment measure’s top three measures are highlighted in bold.
Spearman’s Rank Ordered Correlation Coefficient (ROCC)
Spearman’s Rank Ordered Correlation Coefficient (ROCC)
Linear Correlation Coefficient (CC)
Root-mean-squared-error (RMSE)
From these tables, the proposed method ranks the top three on LIVE 3D Phase II database. In the LIVE 3D Phase I database, the performance of SGSDM slightly less than the performance of Jianwei22, Messai02 and Wang2021 works. The combination the distorted pixel measure and gradient similarity features in this paper can describe the situation of distortion usefully. Moreover, the introduction of gradient similarity feature can reinforce the discrimination of several types of distortion. The proposed method is well correlated with human subjective perception, as shown by the preceding analysis.
The proposed measure and other SIQA algorithms are also tested in the LIVE 3D IQA database on a variety of distortion types. The results of ROCC and CC in LIVE 3D IQA database are listed in Tables 3, in which the top three results are highlighted in bold.
Our approach provides the highest CC performance on JP2K and FF in LIVE 3D Phase I and on JPEG, JP2K and FF in LIVE 3D Phase II. On JPEG, JP2K in LIVE 3D Phase I, the proposed method is only inferior to the Messai02 model when compared to Wang2021, MO-NIQE, and Messai02 methods. In addition, in LIVE 3D Phase I, the KELM model outperforms the proposed model on WN, Blur, and JP2K, JPEG, and FF. However, our model’s prediction accuracy is competitively robust and stable across all distortion types.
For the ROCC performance, our measure provides the best three performance on JP2K, JPEG, WN, Blur and FF in LIVE 3D Phase I and LIVE 3D Phase II, and performs well under all kinds of distortions, demonstrating the monotony of the proposed model. In LIVE 3D Phase I, the proposed model outperforms the Jianwei22 model on JP2K distortion, JPEG distortion, and Blur, but it outperforms the Jianwei22 model on all distortions in LIVE 3D Phase II, with the exception of JP2K distortion. The fact that the proposed model performs best overall suggests that it can predict the quality of a 3D image under a variety of distortions.
Fig. 5 displays the subjective DMOS scatter distributions in comparison to the SGSDM database’s projected scores. The curve in Fig. 5 was created by a nonlinear fitting using (11).

Scatter plots of subjective scores versus scores from the proposed scheme on IQA databases.
The curves in Figure 5 show that the SGSDM values are very close to DMOS, indicating that this measure is effective. The classification of all measures’ performance based on their ROCC values in Table 2 demonstrates the SGSDM’s dependability. However, the comparison yields an intriguing result. The CC and ROOC values are closer to 1. Based on our examination of the results obtained with Hachicha, we conclude that SGSDM performs significantly better than Hachicha. Our SGSDM measure clearly performs well and is competitive with other IQA measures, as shown by the results.
The proposed model performs well without test settings and learning strategy. Also, this later outperforms significantly higher consistency with subjective opinions.
SGSDM necessitates the determination of three parameters (see Equations (5), and (8)).We adjusted the parameter in this way using data from the 3D LIVE IQA database. The parameter value that led to a higher ROCC was chosen as the adjusting measure. Consequently, the required parameter for the proposed method was set to: α=0.88, C1=170, and C2=180.
Moreover, we try to study the outcomes of SGSDM using equation (8) by computing the standard deviation or the mean, as result its performance is higher with standard deviation.
Efficiency evaluation
To compare the effectiveness of various measures, the execution time is calculated on an image from the 3D LIVE IQA database. Asus Intel(R) Core (TM) i5-4200U CPU @ 1.60 GHz and 2.30 GHz 4G RAM is used to run IQA measures. Additionally, the platform for the software is MATLAB R2015a. Table 5 displays every result that was obtained. SGSDM is observed to take more time than SSIM. The other tests are slower than SGSDM.
Running time of the competing IQA models
Running time of the competing IQA models
Due to two binocular interactions known as binocular fusion and suppression, the human visual system (HVS) is able to distinguish between two retinal images in order to construct a mental image with depth perception. In this paper, we proposed a SGSDM algorithm employing the binocular summation channel. Additionally, the proposed metric employs the distorted pixel measure and gradient similarity for IQA. Exploratory outcomes show that the anticipated consequences of the proposed model are exceptionally predictable with human subjective perception. Our subsequent research will focus on the application of additional HVS characteristics to the quality evaluation of stereoscopic images. Test some of the hypotheses regarding the visual characteristics of stereoscopic images by applying our metric to the evaluation of stereoscopic video quality.
