Abstract
Medical and satellite image analysis require incredibly high resolution. Super-resolution combines several low-resolution images of the same scene to generate a high-resolution image. The Super resolution employing deep learning techniques still has an illumination issue. This paper proposes a novel CGIHE-VDSR algorithm that integrates the Very Deep Super Resolution (VDSR) Network with Color Global Image Histogram Equalization (CGIHE) to improve image resolution. In the proposed method, the low-resolution image is first histogram equalized using the CGIHE algorithm. Then, the VDSR network is applied to the histogram equalized image for super-resolution. The comparison of real-time data with the benchmark images is done using the proposed algorithm in the MATLAB platform. The PSNR and SSIM metrics demonstrate that the super resolution image obtained using the proposed method is significantly better than the existing methods.
Introduction
Image processing techniques are being developed and implemented in visual communications at a faster rate worldwide. Due to this, viewers desire excellent quality images. The two most essential concerns in image processing are image fidelity and recognition. For fidelity and recognition, image processing methods improve visualization and extracts more data. A high-resolution medical image aids a radiologist in making an accurate diagnosis. The image with high resolution depends on the pixel size and this can be achieved by improving the image-capturing device, such as a digital Charge Coupled Device (CCD) camera with small-size sensors. However, there is a degradation in the quality of the image captured by small-size sensors of the low-cost camera. The image quality depends on the sensors’ signal and noise power. Signal power is proportional to chip size, while noise power is constant. Hence, reduced sensor pixel size reduces signal power, which degrades image quality. The high-resolution images can be obtained directly by using image capturing equipment that costs high. To address this economic issue, flexible algorithms can be employed to generate high-resolution images from low-resolution images. Super-resolution algorithm involves the conversion of low-resolution image into a high-resolution image. Super-resolution plays a specific role in various explicit applications such as medical [1, 2], remote sensing [3–5] and surveillance [6]. Super-resolution algorithms are classified as learning and reconstruction [7]. The high-resolution details are estimated locally in learning methods that use machine learning algorithms and these may be based on pixel [8] or example [9]. The methods having prior knowledge about the constraints that define the high-resolution image are known as reconstruction [10]. Convolutional Neural Networks (CNN) and Non-Convolutional Neural Networks (Non-CNN) are used in image Super-resolution. Deep learning methods, such as CNN, are a subset of machine learning that seeks to learn the relationships between input and output data, whereas Non-CNN methods use statistical data. The Bicubic Interpolation [11, 12], wavelet [13], Adjusted Anchor Neighborhood Regression (A+) [14], Self-Exemplar Super-resolution (Self Ex) [15], Random Forest Learning (RFL) [16] are Non-CNN based and auto encoder networks [17], Super-resolution Convolutional Neural Network (SRCNN) [18], image priors [19], Very Deep Super Resolution Network (VDSR) [20] are CNN based super-resolution.
Several CNN-based Super-resolution methods have emerged that outperform traditional SR techniques. CNN-based deep learning technologies have greatly contributed to single image super-resolution [21]. Super-resolution images generated by CNN methods have a higher Peak Signal-to-Noise Ratio (PSNR) [21] than images obtained by conventional Non-CNN methods [22]. The PSNR value can be further increased by enhancing the images.
The histogram equalization can be used to enhance the images [23]. The decision-tree learning algorithm and the histogram equalization pre-processing are used in the single image-based learning approach [24].The histogram of an image gives a general representation of the appearance of the image. Most tones are often depicted in the center of the graph by a good histogram, whereas none or very few tones are present at the end of the scale. By computing the total pixels present at each color or luma range and generating a graph that displays the number of pixels at each percentage of luma or color, the histogram affords a statistical study of the image. Since there is a relationship between the color channels in RGB (Red, Green and Blue) color images, Histogram Equalization is commonly employed for each color channel separately, varying the chromaticity of the colors. Histogram equalization approaches are categorized as local or global based on the region of image enhancement. By applying multiple transformations to similar gray levels at various locations throughout the initial image, the Local Histogram Equalization (LHE) may improve a wide range of graphic details [25]. In order to improve contrast, the Global Histogram Equalization (GHE) technique equally distributes the values of pixels throughout the image, resulting in a better image having a linear cumulative histogram [26].
Super resolution
Image resolution is defined as the amount of detail an image contains in the context of digital and film types of images. In general, images with higher resolutions have more detail. Resolution measures the closeness among lines that can be resolved. Several methods for measuring image resolution include spectral, temporal, spatial and radiometric resolutions. The capability of a sensor to recognize specific wavelength intervals is known as spectral resolution. The degree of precision of measurement about time is referred to as temporal resolution and is denoted as 24 to 48 frames per second for movies. In computer image files, the radiometric resolution represents the difference of luminous intensities using either 8 bits or 256 levels. For practical purposes, the spatial resolution of an image determines its clarity by the number of pixels in an image. In spatial resolution, images of lower resolutions typically have fewer pixels and fewer details. In contrast, high-resolution images have more pixels per unit area, providing the image with better quality and clarity as more detail is available.
Super Resolution (SR) method is used to improve the resolution of images. SR generates a higher- resolution image from a series of low-resolution images from the same scene. A high density of pixels in a high-resolution image conveys more information about the original image. The need for providing viewers with high-resolution images has increased due to the rapid growth and implementation of image processing techniques in multimedia communication. A high-resolution image aids in obtaining a better geographic classification in a multi-spectral satellite image.
Histogram and histogram equalization methods
The histogram of an image gives a general representation of the appearance of the image. Most tones are often depicted in the center of the graph by a good histogram, whereas none or very few tones are present at the end of the scale. By computing the total pixels present at each color or luma range and generating a graph that displays the number of pixels at each percentage of luma or color, the histogram affords a statistical study of the image. The contrast and appearance of an image can be improved by employing histogram equalization. Since there is a relationship between the color channels in RGB (Red, Green and Blue) color images, the histogram equalization is commonly used for each color channel separately by varying the chromaticity of the colors. Histogram equalization approaches are categorized as local and global based on the region of image enhancement.
Local Histogram Equalization (LHE)
An image’s local histogram is made up of pixels within a certain pixel’s neighborhood. It specifies an exact set of colors or pixel intensities within the given region. The major drawback of the traditional histogram equalization approach is that it gives the majority of the intensity range to high, narrower peaks, much like a uniform, noisy background. Local histogram equalization operates on individual pixels to construct a transfer curve from a pixel neighborhood’s histogram. This method distributes most intensity values in an image and improves a low-contrast image. The cumulative distribution function (cdf) for each pixel neighborhood in the equalized image appears linear.
Color Global Image Histogram Equalization
Humans cannot distinguish between luminance and color information since R, G, and B in RGB are all related to the color luminance or intensity. Since the HSV color model closely resembles how people perceive color, human perception can clearly define the colors employed in HSV, which is not necessarily valid with RGB or CMYK. Hence, the HSV model (or Hue, Saturation and Value) may separate an image’s color and brightness information. This process simplifies the work when we need to determine the image’s brightness, and in various applications, only intensity values have to be equalized in the histogram. To achieve this, Color Global Image Histogram Equalization (CGIHE) is applied to the RGB image to attain those as mentioned earlier. In CGIHE, the RGB image should be first converted to HSV format. The V channel includes the brightness value comparable to grayscale information, while the H and S channels provide the color information. Then histogram equalization for the value component alone is performed by calculating the probability density function (pdf) and cumulative distribution function (cdf). The enhanced value is appended to the value of HSV. A random variable’s probability of obtaining an amount to the random variable is known as pdf. The cdf is the probability that a random variable will have an amount equal to or less than its amount. The enhanced value component is appended with the HSV image and converted into an RGB image.
Very Deep Super Resolution (VDSR) Network
The VDSR network [20] generates the super resolution image more accurately. The increase in network depth accounts for improved accuracy. The VDSR network has a deep convolutional network of 20 weight layers. The network model follows VGG-net [18] used for ImageNet classification. Since there is a large receptive field, the contextual information can be spread over large image regions. The receptive field is represented as (2d+1)×(2d+1) with convolutional layers d.
Consequently, the increased convolutional layers led to the huge receptive field of 41 × 41. As a result, the increased network depth improves the performance. The image reconstruction needs information regarding the receptive field. The image reconstruction network follows Simonyan and Zisserman [19]. The network has d layers with the intermediate layers are same, whereas the first and last layers are different. The filter size of 3 × 3 operates on 64 channels, which can be denoted by 3 × 3 ×64. The first layer uses the input image. A single filter with a size of 3 × 3 ×64 makes up the final layer, which is used for reconstructing the image.
Furthermore, very deep networks can benefit from high nonlinearities. This network can describe very complex functions with few channels since 19 rectified linear units are employed. The network estimates image details, i.e., residuals from an input low-resolution interpolated image. There are various benefits to explicitly modeling residuals, including a faster convergence of the residual network and greater PSNR during training.
Since the image is removed and only the learned features are used to generate the output, the network must preserve all input detail, which requires more memory. This problem can be resolved with residual learning. Let x represent a low-resolution interpolated image and y for a high-resolution image. Since the input and output images are essentially identical, the residual image, r = y - x is determined, where the majority of values seem to be null or small. The objective is to develop a model f that estimates values for CGIHE equalizes the low-resolution image’s histogram Super-resolution is achieved by applying the VDSR network to the histogram equalized image Comparison of real-time data with the bench mark images is done The performance of the proposal is illustrated by the quantitative analysis made on the color super resolved images
The following Sections describes the proposed method and elaborates the outcomes of the experiments.
The Very Deep Super Resolution (VDSR) network [20] generates high-resolution images with accuracy from low-resolution images. This network uses the residual learning technique from the brightness channel and in HSV color space, the value channel has the brightness details. Among various color spaces like YIQ, YCbCr and Lab, the HSV color space is the best for histogram equalization of color images. In HSV color space, the value channel has the brightness details. The Color Global Image Histogram Equalization (CGIHE) [26] method produces contrast enhanced images using HSV color space. Figure 1 illustrates the workflow of the proposed CGIHE-VDSR method. This work converts the low-resolution image into HSV color space and the CGIHE is applied to the Value channel to enhance the brightness. The enhanced value channel is appended with the HSV image and converted into an RGB image.

Workflow of the proposed CGIHE-VDSR method.
This histogram equalized image is given as input to the VDSR network for super resolution. The super resolution of the image using the proposed CGIHE-VDSR algorithm is addressed in Section 5.1. The VDSR algorithm uses a residual learning technique i.e., the network learns to estimate a residual image. A residual image is generated by the difference between the bicubic interpolated low-resolution image and the high-resolution image [27]. The low-resolution image is upscaled through bicubic interpolation to match the size of low-resolution and high-resolution images. The VDSR network learns the mapping between high and low-resolution images. This mapping becomes possible because high and low-resolution images possess identical image content but differ specifically in high-frequency characteristics. The high-frequency components of an image can be found in a residual image. Brightness changes are more easily perceived by people than color changes. Red, green, and blue pixel values are linearly combined to indicate the brightness of every pixel in the luminance channel, or Y, of an image.
Contrarily, the red, green, and blue pixel values that comprise an image’s two chrominance channels, Cb and Cr, comprise distinct linear combinations with color-difference information. The VDSR network uses a color image’s luminance to identify the residual image and the luminance channel alone is used to train the VDSR. Hence, the image is subjected to CGIHE to enhance the luminance channel. In CGIHE, the conversion of RGB image to HSV format is done and the RGB and HSV color spaces are shown in Fig. 2. The V channel includes the brightness value comparable to grayscale information, while the H and S channels provide the color information.

Color space models and histograms of the input image (a) RGB (b) HSV.
To preserve the original color scheme and achieve a high-quality image, the intention is to apply contrast enhancement to the V channel while leaving the H and S channels unchanged. The low- resolution image is converted into a histogram equalized RGB image using the CGIHE algorithm, as shown in Fig. 3a and 3b, respectively. For the Value channel, the probability density function (pdf) and cumulative distribution function (cdf) are computed. The enhanced Value is then appended with the V channel of the HSV color model. The output of a convolution operation typically consists of one pixel, which is the weighted sum of a block of an input image and a filter. As a result, the output image shows the spatial relationship of the input image with the applied filters [21].

Histograms (a) Low-resolution image with its histogram (b) Output Image from CGIHE algorithm with its histogram.
The histogram of the input image has the most pixel values in the center and thus less contrast, as shown in Fig. 4. The distribution of pixel values in the histogram of the CGIHE-generated image exceeds the center, or closest to the maximum value and hence the contrast of the image is enhanced. Then, the histogram equalized RGB image is given as input to the VDSR network for super resolution, as illustrated in Fig. 5 and the output from the VDSR network is the super-resolved image.

Histograms of input image and contrast enhanced image using CGIHE algorithm.

Architecture of VDSR Network.
The VDSR network consists of 20 weight layers, all identical except for the start and final layers. The first layer of the network extracts the input image’s luminance channel. Each convolutional layer has 3 × 3 filters having 64 channels, and a Rectified Linear Unit (ReLU) follows each convolutional layer. The last layer then uses a 3 × 3 ×64 filter to reconstruct the required residual image. Figure 6 depicts the luminance and chrominance channels of the CGIHE image in which the luminance components are concentrated at the low end of the histogram scale. YCbCr color space of the CGIHE-VDSR resultant image is shown in Fig. 7. The procedure to obtain a VDSR residual image is illustrated in Fig. 8.

YCbCr color space of the CGIHE image.

YCbCr color space of the image resultant from CGIHE-VDSR method.

Generation of VDSR Residual.
The luminance of the high and bicubic interpolated low-resolution images is designated as Yhr and Ylr, respectively. The VDSR network learns to generate the residual image (Yres) using the training data Ylr as input using Equation (1).
To generate a super-resolved image, the estimated residual image is integrated with the interpolated low-resolution image. The YCbCr color space is then converted to RGB. From the iaprtc12 dataset, extractions of 1,477 high-resolution images are done. The down sampling of high-resolution images is done to generate low-resolution images correspondingly. The dataset of low-resolution image is then up-scaled to their original resolution. For ImageNet classification, the VDSR network employs an extremely deep convolutional network modelled by VGG-net [28].
To train the network faster, the patch extraction of 64 of 41 × 41 image patches and rotation by 90° for some patches were done. The network’s training parameters kept the same as those mentioned in the VDSR [20].
An RGB image with a low resolution of the size M×N is given as input to the CGIHE-VDSR algorithm. This algorithm consists of two parts. In the algorithm’s first part I, color global histogram equalized image is created. The histogram equalization process for enhancing the color and grayscale images is identical except for color images, the color model conversion technique is applied. The input color image’s R, G and B components represent the rectangle or cube-shaped coordinate color space. When performing histogram equalization on the RGB image, it must apply histogram equalization to each component’s RGB color space. As a result, the original color of the enhanced image will be varied. The HSV color space represents a spherical or circular coordinate color space formed by the combination of the components H (Hue), S (Saturation), and V (Value). While doing the histogram equalization on the HSV color space, it is only necessary to apply to its V component to preserve the object’s original color in the enhanced image. R, G, and B in RGB are all linked to color luminance (intensity), hence it is not possible to separate color from luminance. HSV color space is utilized to separate the color and luminosity information in an image which facilitates when we need to work on the image’s luminosity. In this work, for every pixel of the RGB input image, Hue, Saturation and Value is computed based on the difference of maximum and minimum of three color channels using the lines 1.1 to 1.8 of the algorithm. Then histogram equalization for the Value component alone is performed by calculating the probability density function (pdf) and cumulative distribution function (cdf). The enhanced Value is appended to the Value of HSV as given in line 1.9 of the algorithm. A random variable’s probability of obtaining a value identical to the random variable is known as pdf. The probability for a random variable to have a value equal to or lesser than the value of the random variable is defined as the cdf. The HSV image is converted into RGB using the lines from 1.10 to 1.21.
The VDSR network [20] is applied to the color global histogram equalized image in the second part of the algorithm. The convolutional layers are set as 20, and for each convolutional layer, 3 × 3 filters for 64 channels are given in lines 1 and 2 of the second part II. From lines 3 to 9 of the second part II, the super-resolution of the image is obtained using residual learning. The residual image is ‘r’, bicubic interpolated low-resolution image is ‘x’ and f(x)+x is the resultant super-resolution image.
Results and discussions
In this section, the performance of the CGIHE-VDSR algorithm is evaluated using various quantitative metrics such as Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM). In the proposed work, the histogram equalization is applied to the value channel alone after converting an RGB image into an HSV image. Then the enhanced image is converted back to contrast- enhanced RGB images. The resultant image from the CGIHE algorithm is used in the VDSR network. The performance evaluation of the proposed framework on several datasets has been done and the results are depicted in Figs. 9 to 15. These figures illustrate that the proposed method performs better than existing methods. Figure 10 demonstrates that the proposed method accurately reconstructs the plant’s stem in Test Image #69015. Similarly, the proposed approach for Test Image #86000 generates the cross lines precisely, as shown in Fig. 12. Pixels near the boundary of the image are cropped in existing methods. Although the proposed method produces full-sized images, the images are reduced to the same extent as the current methods for comparison.

Super-resolution results for the Test Image #220075 (a) A+ (b) RFL (c) Self Ex (d) SRCNN (e) VDSR (f) CGIHE-VDSR.

Super-resolution results for the Test Image #69015 (a) A+ (b) RFL (c) Self Ex (d) SRCNN (e) VDSR (f) CGIHE-VDSR.

Super-resolution results for the Test Image #106024 (a) A+ (b) RFL (c) Self Ex (d) SRCNN (e) VDSR (f) CGIHE-VDSR.

Super-resolution results for the Test Image #86000 (a) A+ (b) RFL (c) Self Ex (d) SRCNN (e) VDSR (f) CGIHE-VDSR.

Super-resolution results for the Test Image #25 (a) A+ (b) RFL (c) Self Ex (d) SRCNN (e) VDSR (f) CGIHE-VDSR.

Super-resolution results for the Test Image #38092 (a) A+ (b) RFL (c) Self Ex (d) SRCNN (e) VDSR (f) CGIHE-VDSR.

Super-resolution results for the Test Image #148026 (a) A+ (b) RFL (c) Self Ex (d) SRCNN (e) VDSR (f) CGIHE-VDSR.
The images of 200 from BSDS [29] and 91 images from Yang et al. [30] have been considered for training datasets. To increase the size of the dataset, augmentation such as flip and rotate are performed. For testing, the commonly used benchmark datasets such as Urban100 [11], Set 5 [15], B100 [22] and Set 14 [26] are used. The training parameters of the network are set as network depth of 20, Momentum of 0.9 and the weight decay parameter of 0.0001. For the networks utilizing 19 units of ReLU, the weight initialization is done. The epoch is 80 (9960 iterations with a batch size 64). The initial learning rate is 0.1 for epochs 0 to 20, 0.01 for epochs 21 to 40, 0.001 for 41 to 60 and 0.0001 for epochs 61 to 80. The stage of learning ceased after 80 epochs and the training with MATLAB 2021 on a GPU NVidia takes about two hours. The network uses the same learning rate for different layers since the convergence rate is faster due to residual learning. The proposed method is compared with standard super-resolution methods such as A + [14], Self-Ex [15], RFL [16], SRCNN [18] and VDSR [20]. Intensity details are more perceptible to human eyes., i.e., the Value component of HSV image than color. Due to the implementation of color global image histogram equalization on Value, there is an improvement in the intensity values compared to the earlier methods. The network uses residual learning.
In residual learning, the high-frequency components are retained from the low-resolution image and only residual images are trained. This makes the network convergence much faster than other Convolutional Neural Networks. The large receptive field for the network depth of 20 is 41 × 41. Because of the large receptive field, the image details can be predicted more appropriately. Hence, the image super resolution has been achieved.
The experimental results demonstrate that the color global histogram equalization done for very deep super-resolution network significantly enhances the super-resolution. The proposed work is analyzed in three aspects: global histogram equalization, depth, and residual learning. First, the intensity values are improved due to the histogram equalization for the entire image. Second, the increase in network depth increases the performance. Thirdly, compared to conventional CNN, the residual learning network converges much more quickly due to residual learning.
The proposed technique is quantitatively analyzed with the existing methods using various datasets in terms of PSNR and SSIM metrics. Tables 1 and 2 illustrate the PSNR and SSIM estimations performed on various datasets. For all datasets, the proposed CGIHE-VDSR algorithm outperforms existing methods in terms of PSNR and SSIM.
Peak Signal to Noise Ratio (PSNR) in dB
Structural Similarity Index Measure (SSIM)
The most commonly used full-reference objective assessment of quality metric for the restoration of images is the Peak Signal-to-Noise Ratio (PSNR) [31]. Peak Signal-to-Noise Ratio (PSNR) is a frequently used metric to assess the fidelity of digital images that have been reconstructed. It computes the difference in noise between the original signal and the reconstructed signal, with larger values signifying higher quality. In contrast, it is stated that Structural Similarity Index Measure (SSIM) [31] more precisely represents the quality of images than PSNR. In comparison, Structural Similarity Index Measure (SSIM) is reported to reflect visual quality better than PSNR. SSIM evaluates the visual effect of three image characteristics, brightness, contrast and structure, compared to a reference image. The Equation (3) and (4) are used to find the PSNR and SSIM values using Mean Square Error (MSE) for the images I1 (m, n) and I2 (m, n) with M × N dimension.
Where, R = 28-1 = 255 for 8 bits/pixel
In most cases, PSNR and SSIM evaluate the recovered image’s quality. The quantitative evaluation of the proposed method in terms of PSNR achieves better results than the existing methods because of the global histogram equalization increase in depth. The closer the SSIM value is to one, the more precisely the test image matches the reference image. The SSIM of the proposed method is also high compared to all other methods. Figures 16 and 17 graphically illustrates the Average PSNR and Average SSIM values obtained using various single-image super-resolution techniques on multiple datasets.

Comparison of Average PSNR (dB) for various datasets.

Comparison of Average SSIM for various datasets.
Super resolution of color images is achieved through the CGIHE-VDSR method. The proposed CGIHE-VDSR method gives improved PSNR and SSIM results compared to existing methods for all datasets. Hence, the super-resolution images obtained from the CGIHE-VDSR algorithm are applied for OKN detection using the SLKOF algorithm [32]. The Mean and Standard Deviation values in Table 3 illustrate that the improved Normalized Average Peak Velocity (NAPV) of the subsampling factor of 1/4 original super-resolution image is coherent with that of the original size super-resolution image.
Mean and Standard Deviation
Hence, the images obtained using the proposed CGIHE-VDSR method can be used for accurate detection, producing better results for the low resolution images.
Super resolution of color images is achieved through the proposed CGIHE-VDSR algorithm. The Color Global Image Histogram Equalization enhances the luminance channel used by the VDSR network to obtain a residual image. The low resolution color image undergoes global histogram equalization to enhance brightness. The proposed CGIHE-VDSR algorithm has better PSNR and SSIM results than existing methods for all datasets. Hence, color images generated with the CGIHE-VDSR super-resolution algorithm can be used in the medical field for diagnostic purposes. The VDSR network depth may be increased in the future to improve the performance.
