Abstract
Due to the evolution in the digital domain limitless multimedia is generated daily. It creates a necessity of potential and appealing image resuscitation system. In this paper, a shape and texture-based image retrieval system is proposed that estimates the resemblances of each query image with the images stored in the repository in the form of shape and textural facets and retrieves the images within an expected range of resemblance. The proposed approach employs a statistical approach for image retrieval. The proposed approach takes into account discriminative features of the input image for generating the shape and texture descriptors that produce outstanding results for image databases of restricted variety, which merely includes homogeneous patterns, this approach yielded satisfactory results. For texture images it uses the spatial gray level dependency matrix (SGLDM) and proposes an algorithm to compute the the inverse difference moment (IDM) as the optimal image representative feature. It further employs K-Nearest Neighbour (KNN) classifier for the classification and retrieval tasks. The proposed system outperforms the various other ultra-modern content-based image retrieval (CBIR) systems in many respects.
Introduction
The fundamental concept behind the CBIR System is that it has a pre-treated collection of images in the repository in the form of a feature vector. It resuscitates the images from this repository based on the query image. The derived features of the query image are essentially the content of the image such as shape, color, or, texture. We can even specify the input as a mixture of these image features. The zooming growth in the numerous repositories has introduced and entertained the efficacious and potential CBIR system. The prior art of the CBIR systems is to lookup for images in the repository that are with “close” proximity to the query image using certain resemblance criteria. The texture, shape, color, or text annotations of the objects present in the query image that portray image properties are employed in the latest CBIR systems. The candid CBIR systems are as depicted in Fig. 1.

General architecture of CBIR-system.
In this paper the constraint of accurate feature extraction is focused and a new approach for extracting the image features is proposed. Extraction of accurate features from an input with an efficient method is also a big challenge. In this paper the constraint of optimal feature representation is further focused and a new feature vector Fourier descriptor (FD) to represent the primary features (central distance) is proposed. The proposed method achieves feature selection accuracy. The approach signifies an advantage in low resource overhead in computing shape feature. The proposed method stores the feature vector as FDs which are very less in count (10 values per image) which results in very less storage requirement and reduced computation in feature matching which results in increased performance as compared to other methods. For texture images it uses the SGLDM and proposes an algorithm to compute the IDM as the optimal image representative feature. It employs KNN for retrieval and classification tasks. The proposed method is position and orientation invariant of the object in image. In this work cellular and convolutional neural network (CNN) is not preferred as the features extracted here are ideal and optimistic. Our approach involves feature extraction as a separate step whereas the other networks learn the feature themselves. The CNN further does not consider the relationships among the learned features throughout the network, it only searches for the features in matching stage. However, our approach SGLDM maintains the relationship among learned features.
The rest of the paper is organized as follow. An ephemeral review of the current researches related to CBIR is given in Section 2. In Section 3, the pre-processing steps required to extract the boundary is presented. The proposed approach for feature extraction and retrieval of similar images is described in Section 4. In Section 5, experimental results are reported to show the performance of the proposed approach. Conclusions are presented in Section 6.
Several CBIR systems employ diverse subordinate features with diversified depictions and diverse resemblance measures. Due to the prominence of texture, shape, and, color features in human insight, they are commonly used.
The authors of [1] have presented an image segmentation technique that employs the spatial, texture, and, spatial correlations between image sections as the delegate features of CBIR. The approach generates partial results for a separate feature first and then takes the weighted average sum of them as a final result. It involves intensive computations in determining the separate features. In [2] a color pair-based approach is proposed that extracts the color entities. This approach is feature deficient. In work [3] relevance feedback from the user is considered to enhance the vector space. The image notions are exhibited with semantic ones additionally the miniature’s semantic user’s preference.
The authors of [4] have proposed a spline curve-based approach for leaf retrieval. The approach extracts the curve points from the input leaf image boundary and generates a new feature vector known as FD. The approach is position invariant. Co-occurrence matrices, texture histograms, color, and, orientation-based retrieval approach is outlined in [5] This approach is slower in computation as it uses large feature vector for representing the image features. In [6], the steerable pyramids are used to determine the texture direction and, an orientation that is further used in computing the distribution of texture directions employed in textural querying. This approach fails to retrieve the texture patterns with finer variations such as small variations in orientation texture direction.
The authors of [7] have presented a multifeatured approach that implements a contrast limited adaptive histogram equalization (CLAHE) algorithm to enhance the procured image and clustering process to extract features. The sequential model provided by Keras is further employed to classify whether the leaf is diseased or healthy. In [8] linear binary pattern is used as a feature for face recognition. The authors of [9] have done a comprehensive survey that discusses the various deep learning and image processing methodologies involved in brain tumour detection. In [10] the deep learning techniques employed in cancer detection are outlined. The authors of these papers have argued that optimal discriminative feature vector is crucial for effective and efficient image retrieval.
In [11] the authors have proposed an approach that identifies the interpolating control points (CP) of the curved boundary of the leaf image. It further draws a new Bezier curve from the extracted CPs. The curve points of Bezier are further used to compute the fast Fourier transform (FFT) as a feature vector and then further these FFT values are used for leaf classification. A segmentation technique termed as ‘Blobworld’ is presented in [12]. The authors of [13] have proposed a capsule network-based approach for leaf classification. They have proposed a new dataset consisting of CPs, extreme points, and FDs. The approach yields better results for leaf classification.
In [14] the authors have proposed a CBIR method to retrieve the leaves of deceased Soyabean plants. It uses HSV color histogram as a color feature, scale invariant feature transform (SIFT) shape features, and a combination of local binary pattern (LBP), Gabor filter, and a proposed local gray gabor pattern (LGGP) texture features for the retrieval. The features used here requires huge computations to reach to the final feature vector and it does not use a precise classifier or algorithm for similarity measure in retrieval. The authors of [15] has proposed a Support Vector Machine (SVM) Based approach for Leaf classification. The input image is first converted to a gray level image and then to extract the line responses it is convoluted with the elliptical half Gabor wavelets (EHGW). From these line responses the maximum gap local line direction patterns (MGLLDP) are extracted. The histogram of the normalized patterns is calculated and regarded as the counting-based local structure descriptor. This descriptor is used further for classification using SVM. This assumption of feature selection process minimizes the descriptive features relevancy w. r. t. image representation. In [16] the authors have developed an approach grounded on shape and texture features referred as rotation invariant wavelet descriptor for plant species classification. For classification multi-layered perceptron (MLP) is used. It gives acceptable accuracy whereas the feature computation time is significant in this approach.
In [17] a method using Bezier curve is developed for the smoothening of curvature to derive control points. The control point evolution helps in generating a curve in piece wise manner for the easy track of a curve. The authors of [18] have proposed an interpolation-based method for any number of points with the minimization of noise corruption. This approach defines the interpolation by lines and Bernstein polynomials for Bezier curve interpolation of original data with noise removal. In the recognition of control point where images have high varying short period variations, the feature points are large in count. This feature description results in large memory storage and higher search overhead. The larger descriptive feature also leads to higher plausible misclassification in recognition. In research [19], the authors have examined the exponential stability of monotone travelling wave solutions for a class of nonlinear delayed dynamical neural networks with leakage term and distributed delays, However, the exponential stability of travelling wave solutions for nonlinear delayed cellular neural networks was investigated by the authors of [20]. The authors of [21] have presented an approach for subpixel categorization of clustered images where it speeds up the process by utilizing morphological operations. The research in [22] focuses on generating high resolution color image from remote sensing images and attempts to remove parts of concealed district. The researchers in [23] have presented a new approach for multi-level steganography and color segmentation. The work in [24] demonstrates on the impact of image degradation on categorization in CNN.
Methodologies
Pre-processing
The pre-processing stage outputs the boundary coordinates of the shape present in the input image. The steps in the pre-processing are shown in Fig. 2. The first step is binarization of the shape image that uses simple thresholding to convert the gray level image to binary. To get a boundary, which is differentiated from the rest part of the object, the following steps are performed on given images. The first step is plotting histogram. It represents the relative frequency of the appearance of numerous gray levels present within an image [25]. The plot obtained between gray levels v/s their frequency is called as a gray level Histogram. It is used to obtain an approximate level of threshold.

Pre-processing of the input image.
The second step is binarization. Threshold plays an important role to detect the boundary of an object. The threshold is one of the simplest methods of segmentation. In this, each pixel is assigned to either object part or background part comparing its brightness value to the threshold value. The next step is Boundary detection or contouring which is used to extract shapes from a given image. In this work, the Marching squares algorithm with its latest variations is used for boundary detection [26]. The final step is denoising. The extracted boundary may get degraded or distorted by statistical noise in the image acquisition and pre-processing steps such as thresholding. Therefore, the contour smoothening operation is carried out on the extracted contour [27]. A smoothening of the contour, reveals in the dominant curvature patterns, which illustrates the representing shape of the region. Therefore, to extract the dominant curvature patterns, the obtained curvature is recursively smoothened by using the Gaussian smoothening parameter (σ). The Gaussian smoothening operation is then defined by,
The denoising process removes the statistical noise [11]. After this step, the shapes which are obtained for further processing are irregular in shape.
The objects present in the input image are either specified shapes or texture patterns. The logical image description includes color, texture, and shape attributes. The stochastic or spatial deterministic facets of the grey level distribution can be depicted by the textural attributes of the images. A slice of information about the semantic or morphological content of an image is known as an image feature. There are several properties for measuring the quality of a feature:
Capacity: The number of distinguishable images that can be represented.
Maximal match number: The maximum number of images a query could possibly retrieve.
Complexity: The amount of computation required to determine if two images are similar for a particular feature.
Compactness: The amounts of space required to store and compare the feature.
Feature extraction is performed when an image is added to the database. The system extracts feature that supports image query types for shape & texture. The problem of partially overlapping objects or closely located identical objects retrieval is always a difficult task in Object retrieval systems [28]. The performance of such a system using global features matching approaches like moments, area, size provides poor performance.
Shape signatures
The shape signatures are used to interpret the 2-dimensional shapes in 1-dimensional form. The proposed approach uses the central distance shape signature to model the shape. Suppose the coordinates of extracted boundary after pre-processing are (x0, y0), (x1, y1),... (xL - 1, yL - 1). The centroid distance can be described as the separation of the individual boundary point from the centroid (xc, yc).
The approach becomes position invariant as the centroid is subtracted from the boundary point coordinates. The centroid (x
c
, y
c
) of all the boundary points can be computed by taking the mean of x and y co-ordinate values.
Thus, by using the above image categorization, the logical image database consists of the only link of the image added in the database [29].
Due to the practical nature of this shape signature for general shape delineation, the recent FD approaches are commonly using it for shape indexing hence, the proposed approach. The central distance function mentioned in equation (3) is used for FD computation. For a given shape signature described in this Section, z(t), t = 0, 1, . . . , M-1, assuming it is normalized to N points in the sampling stage, the discrete Fourier transform of s(t) is given by,
The coefficients v n , n = 0, 1 . . . M-1, are called as FD of the shape, denoted as FDn, n = 0, 1 . . . M-1 [29–31]. The shapes are represented by FDs in the frequency spectrum. The finer details of the shape are delineated by higher FDs while the general ones are represented by lower FDs. Finally, a subset of FDs and extreme points from the shape image are used to form the final feature vector for indexing the images [29, 32]. Underfitting and overfitting are the main issues faced while implementing a classifier. The former one arises when we don’t have sufficient features for classifying the input. It means that if there is lack of dissimilarities in learned features then the training results in underfitting. However, the latter one causes due to the large feature count which is also present in another classes. Therefore, the classifier fails to identify the correct class. In our approach, the different shape signatures are modelled accurately by the proposed feature vector therefore, the problem of underfitting and overfitting merely exists.
For the process of retrieval and, classification, a KNN classifier is used. The feature scaling operation is applied on feature vector to avoid further underfitting or overfitting up to certain extent. A Euclidian distance-based measuring approach for classification is developed. This approach defines the minimum distance of feature vectors of the querying sample over database train features and results in the best match value M. The minimum distance between the train feature and the test feature is given by,
A critical primitive in texture features and human vision is texture. It has been used to find image contents. Some examples are nothing but recognizing mountains sand terrains and agricultural terrains from the aerial image domain. Besides, the semantics of the images like sand, clouds, hairs, bricks, etc. can be portrayed by the textures. Prominent results can be obtained if the texture is combined with color enhancing the identifying and descriptive characteristics of the texture. Hence, the facts of the significant features of image objects at the first sight may be given. The most critical difference between texture and color features is that the color is a pixel or property and the texture is a property that is local-vicinity. Hence, it is not that important to debate the texture content at pixel level excluding the consideration of the neighborhood. Figure 3 shows some of the texture patterns used in the interior design system.

Examples of texture patterns.
In the area of image processing, three main approaches are present to perform the extraction of texture features. Those are the spectral approach, statistical approach, and structural (or syntactic) approach. The methods employed in the statistical approaches utilize each pixel’s intensity values in the image and put on numerous arithmetical equations to the pixels for calculating the feature descriptors. The texture feature descriptors derived from statistical methods are divided broadly into 2 types depending on the order of the used statistical function: 1st order and 2nd order texture features. The former ones are fetched explicitly from the available data by the intensity histograms. Hence pixel location information is not provided by it. Another terminology employed for the same is grey level distribution moments (GLDM). Opposite to that, later ones consider the location of a pixel concerning another. The majorly employed 2nd order approach is the SGLDM approach. The technique approximately contains building matrices by counting the number of occurrences of pairs of pixels given intensities at a given displacement [33].
The cardinality of SGLDM is equal to the number of grey levels present in an image. For instance, if the image has 32 discrete grey levels then the SGLDM will be of size 32x32. Additionally, it requires a position operator P which is applied to the image and is assessed for individual image pixel. SGLDM is a function of P and P is a function of displacement d and direction μ, where, μ could be a single or multi-directional represented by an angle. The element cij of SGLDM will be increased by 1 if the intensity value of the pixel is i and the intensity of the pixel to which the operator points is j. For an instance: P (1, 45) is an operator that points one to the right and one below where d = 1 and, μ=45. The definition given above outputs an SGLDM that is not symmetric typically and if the orientation of an object is vital, then this computation should be used and it is called W-SGLDM. If the attention is not in the orientation of the object, the fashion of description can be somewhat changed to generate a symmetric matrix. The only change to the W-SGLDM is that when the position operator P (d, μ) is applied on the image, then the operator P (d, μ) is also applied to the image at the same time. The Computation of W-SGLDM is depicted in Fig. 4. Algorithm for SGLDM and IDM computation is as follows:
Step 1: Accept the input texture image and resize and convert it into a grayscale image.

Computation of W-SGLDM for P (1, 0).
Step 2: Find out the SGLDM of the image as explained above along bottom, left bottom corner, left, left top corner, top, right top corner, right, right bottom corner, directions.
Step 3: Now calculate the weight matrix as, w = Abs (i - j) by using the grey values of the input image.
Step 4: Finally calculate the IDM values as:
By using the above algorithm, 9 SGLDM matrices & 9 corresponding IDM values for the image are produced. Here 9 IDMs are calculated because the image is segmented into 9 identical sized tiles. The IDM is calculated by using the formula given below:
Further, for the process of retrieval and, classification, a KNN classifier based on the Euclidian distance measure discussed above is used. It considers the generated FD values for the query image and matches with the warehoused FD values image and retrieves the top images stored in the database with closest FD values to that of query images FD values as a retrieval result.
Experimental analysis
The following Tables 1 and 2 contains a few examples based on texture-based and shape queries, respectively, and corresponding results gained for them. The texture database contains 240 images and a shape database contain 150 images. These 240 images on average consist of around 50 to 60 diverse texture patterns and hence essentially there are 4 to 5 of them containing a specific texture pattern. In a shape database of 150 images, 6 to 7 images belong to the same shape may having different rotations, translation, or scaling.
Result analysis for texture based queries
Result analysis for texture based queries
Result analysis for shape based queries
The performance (P) of the system can be computed as [33, 34]:
Figure 5 depicts the sets of results for four example queries that were executed on the database. It is observed that the first result obtained is the query image itself. Hence, the proposed approach gives the peak probable resemblance score (i.e., 1 : 000). The topmost 4 results obtained for the query image are outlined in Figs. 6 and 7.

Input texture images.
From the tables mentioned below, it is clear that the proposed system gives almost 80% performance for Texture based retrieval and almost 86% performance for Shape-based retrieval.

Sample queries: texture images and similarity scores for each query result.

Sample queries: texture images and similarity scores for each query result.
Result analysis for various test samples
The image retrieval system based on content is firstly analyzed in terms of effectiveness of the retrieval. So, to analyze the efficacy of retrieval systems, several evaluation measures are employed [35]. Precision and sensitivity are the most widely used evaluation parameters for measuring the performance of CBIR Systems. The evaluation parameters are mathematically represented using the following terms:
True positive (TP): It denotes the images retrieved correctly as belonging to the class. False Positive (FP): It denotes the images incorrectly retrieved as belonging to the class. True negative (TN): It denotes the images correctly retrieved as not belonging to the class. False negative (FP): It denotes the images incorrectly retrieved as not belonging to the class. Accuracy: It is the ratio of the retrieved results that is true positive and false negatives to the total number of retrieved images. It can be mathematically expressed as:
Precision: It is defined as the ratio of relevant output instances retrieved to the total instances retrieved. Mathematically, it can be defined using Equation (10):
Sensitivity: It is the ratio of obtained relevant images retrieved to the total number of relevant images present in the database. It is also called a recall. It can be mathematically represented as:
Specificity: It is the proportion of truly negative class predicted over the total false class and mathematically given as [11]:
To produce the shape repository, in this experiment, 70 identical synthetic and 25 bottle shapes are employed for individual 95 shapes. Further, 4 additional instances of the individual shapes are procured by applying affine transformations. This generates a repository of 640 shapes including the original ones. The database generated with this approach makes the assessment more reliable. The precision and recall measures discussed earlier are employed to evaluate the retrieval efficiency. The meticulous comparative analysis of the proposed approach is carried out using several measures discussed earlier and compared with various latest techniques as demonstrated in Table 3.
Several iterations are carried out on dataset with varied samples and the obtained results are averaged to present as mean results. Proposed approach gives higher accuracy with satisfactory specificity and sensitivity. The color, shape and texture-based (CST) [14], EHGW and MGLLDP-based [15] methods fall short in comparison to it. The color-based technique outperforms regarding sensitivity and specificity whereas regarding accuracy and specificity it outperforms both the methods. The proposed system yields 43% and 46% more performance in terms of precision compared to color feature-based and Wavelet transform-based methods. There is a net 33% and 37% improvement in sensitivity compared to color feature-based and Wavelet transform-based methods respectively [36].
The obtained results are also presented in graphical format below. Each result is described in two parts as the number of evaluations are significant. Figure 8 represents the part I of the results obtained for the proposed approach and compared with the latest techniques in terms of accuracy. It depicts that for all 4 simulations, this module has outperformed than the other methods. color-based methods generate poor results than the wavelet-based and proposed approaches.

Result comparison with latest techniques in terms of accuracy.
Figure 9 through 12 represents the part II of the results obtained for the proposed approach in terms of sensitivity, specificity, precision, and recall and compared with the latest techniques.

Result analysis for test sample S1.

Result analysis for test sample S2.

Result analysis for test sample S3.

Result analysis for test sample S4.
The proposed approach implements a CBIR system that assesses the resemblance of the individual image in its repository with a query image in the relation of shape and textural aspects and presents the images within an anticipated range of resemblance. For shape-based retrieval, the shape is indexed by FD values. The proposed approach adopts the statistical methods to extract shape and texture features from the training as well as the query images. It is observed that the multi-operational process for the computation of the SGLDM matrix yields better performance. The proposed work picks IDM as the finest subset of the 2nd order statistical features that are computed from SGLDM. In this approach, images are represented using only FDs and IDM values. Thus, size of features which are lastly stored in database is substantially reduced. It further reduces the computation overhead at the time of feature matching for a query image. Additionally, this approach is position and orientation invariant. It works fine for similar objects with any orientation and irrespective of its position. Due all these reasons proposed method outperforms others in terms of computation complexity and memory efficiency. Results prove that proposed method achieves feature selection accuracy. For image repository of limited diversity, which includes very few homogeneous patterns, it produced satisfactory results. As loads of images are produced nowadays quality CBIR systems are critical. The proposed approach outperforms as compared to other CBIR systems in terms of performance and accuracy. By using our system. The authors of the work hope that the end-user may get speedier outputs in the search of desired images. Plant categorization and identification, leaf-based plant disease prediction, plant growth prediction in controlled farming, handwriting identification, signature matching, improving the functionality of a text-based image recovery system are some of the applications of the proposed approach.
Conflicts of interest
The authors have no conflicts of interest to declare. All co-authors have seen and agree with the contents of the manuscript and there is no financial interest to report. We certify that the submission is original work and is not under review at any other publication.
Author contributions
Sandeep Pande and Suresh Rathod conceived the presented idea. Sandeep Pande and Shantanu Pathak developed the theory and performed the computations. Sudhanshu S. Gonge and Rahul Joshi verified the collected the dataset analytical methods. Pramod P. Jadhav, Sachin P. Godse, and Sk Hasane Ahammad investigated the results with standard evaluation parameters and supervised the findings of this work. All authors discussed the results and contributed to the final manuscript.
Footnotes
Acknowledgments
The authors would like to thank MIT, Academy of Engineering, Alandi, Pune who provided insight and expertise that greatly assisted the research, the infrastructure, tools and opportunity to publish this manuscript.
