Cashmere/wool identification based on bag-of-words and spatial pyramid match

Abstract

Due to the similarities between cashmere and wool, the automatic identification of these two animal fibers continues to be a huge challenge in textile society. In this paper, for the identification of micrographs of cashmere and wool, bag-of-words and spatial pyramid matching are used. Each fiber image was regarded as a collection of feature vectors in our logic. The vectors, extracted from the original dataset, were fed into a support vector machine for supervised classification. The codebook size and the resolution level were completely investigated. The experimental results indicated that the image segmentation delivered a positive contribution in enhancing the accuracy of classification. The overall performance of the model was robust under various blend ratios. It verifies that the bag-of-words with spatial pyramid match is an effective approach to the identification of cashmere and wool fibers.

Keywords

cashmere wool identification image segmentation bag-of-words spatial pyramid match

Wool and cashmere are two kinds of highly similar animal fibers with a particular surface structure consisting of an outermost scale cuticle. Currently, the identification of these two kinds of animal fibers is accomplished in accordance with the morphology of the scales of individual fibers using optical or scanning electron microscopy.¹ Skilled microscopists distinguish cashmere and wool fibers based on differences in morphological characteristics.² These differences mainly include the form of scale margin, the distance between the external scale margins, and the general scale patterns.³ Besides morphology based identification methods, near infrared spectroscopy, protein analysis, and DNA analysis technologies were also employed to identify cashmere and wool fibers.^1,4–7 For example, Zoccola et al.¹ proposed a method based on near infrared spectroscopy for the identification of cashmere, wool, and yak in accordance with the differences among spectra of these animal fibers. Vineis et al.⁴ extracted keratin from different animal fibers and keratin was digested by enzyme to produce peptide mixtures. The peptide analysis of the mixtures was used to assess relative percentages of fibers present in blends. Tang et al.⁵ extracted mitochondrial DNA from cashmere and wool fibers for the identification of the two kinds of animal fibers.

Although some new methods have been developed, the cuticle scale pattern is still the major reference for distinguishing animal fibers. Technically, morphological identification of animal fibers was regarded as a problem of pattern recognition. She et al.⁸ extracted nine scale parameters of merino wool and mohair via image processing. A total of 280 scales of merino fibers and 280 scales of mohair fibers were collected as samples. An artificial neural network (ANN) model was constructed for image classification. The feature vectors extracted from samples were fed into the ANN model and the recognition rate reached 94.6%. Ma et al.⁹ measured eight characteristic parameters from wool and cashmere fibers. A support vector machine (SVM) was chosen as classifier for supervised classification. This method achieved an identification accuracy near 89.0%. Shi and Wu¹⁰ extracted four shape parameters from micrographs of cashmere and fine wool fiber and established a multi-parameter Bayes model. A total of 100 cashmere fibers and 100 wool fibers were selected as a dataset. The Bayes model achieved an identification accuracy of 90.5%.

However, these methods were limited by the quality of the micrographs. For some blurry images, it is difficult to measure exact shape parameters such as scale height and scale area. Instead of calculating geometric characteristics of fibers, some researchers deciphered scale patterns by means of other methods. Zhong et al.² transferred the microscopic images of wool and cashmere fibers into projection curves. In order to reveal the numerical features embedded in the projection curves, they compared three different approaches, including discrete wavelet transform (DWT), direct geometrical description (DGD), and recurrence quantification analysis (RQA). The numerical features were fed into three classifiers, ANN, SVM, and Kernel ridge regression (KRR), to perform supervised classification. Experimental results demonstrated the combination of RQA and SVM achieved the best accuracy.

During the last few years, the bag-of-words (BoW) approach has drawn significant attention in the field of image classification and object recognition. It has been successfully applied to many fields, such as medical image analysis and industrial inspection, and has achieved good performances.^11–13 The BoW model in computer vision originated from the BoW method for document classification.¹⁴ In document classification, a document can be represented by a histogram of words. In the BoW model for image classification, an image can be represented using a histogram of image features. An image was considered as a document by treating image features as words. In order to achieve this, the features of images were defined as “codewords,”¹⁵ which are represented by vectors. In this sense, image classification is converted into the classification of a collection of vectors.

In the BoW approach, SVM commonly outperforms other classifiers, including Naïve Bayes and Hierarchical Bayesian models, for image classification.¹⁴ Since an image was represented as a histogram of vectors, the histogram intersection kernel could be chosen as a kernel function of SVM in the BoW approach.^13,14,16 However, information about the spatial arrangement of features in the image was not considered in the histogram intersection kernel. For this problem, Grauman and Darrell¹⁷ proposed the spatial pyramid match (SPM) kernel based on different levels of resolution. Lazebnik et al.¹⁸ further improved the SPM kernel. SPM maps the features, which were represented as histograms of codewords, to a multi-dimensional multi-resolution histogram. The major advantage of SPM is that it can capture co-occurring features to show promising results in many applications.^17,19 This paper focuses on the evaluation of the BoW with SPM approach for the SVM based classification of cashmere/wool fiber micrographs.

Methods

In this paper, fiber identification was reached in three steps. In the first step, we performed image preprocessing to enhance the features and to remove the noise. In the second step, we constructed a bag-of-words model to reformulate the features of fibers. In the third step, we employed SVM as the classifier to obtain the identification result. For convenience of explanation, in the following sections, the bag-of-words model will be abbreviated to BoW. The bag-of-words with spatial pyramid match model is abbreviated to SPM.

Sample collection and image preprocessing

The cashmere samples were collected from several different origins, including Inner Mongolia, Shanxi, and Tibet. The wool samples were collected from Inner Mongolia. Samples were prepared in the form of tops by the Erdos Group. Fiber images were captured with 10 × 50 magnification via an optical microscopic system (CU-5), which was manufactured by Beijing UVTec Co., under ISO 137-1975, IWTO-8-97, and GB/T 10685-2007. Microscope images were stored in “bmp” format as 768 × 576 pixels in size. Each image contained only one fiber (cashmere or wool) and the majority of the fiber trunk was clearly captured. All fiber images had been pre-labeled as wool or cashmere by seasoned experts. It is worthwhile mentioning that the total number of samples was equal to the total number of microscopic images. Therefore, the identification of animal fibers was transformed into the classification of fiber images.

Typically, the optical micrographs of fibers may not be ideal; as shown in Figure 1, bubbles and impurities may appear. There were even a small number of blurry images in the sample. Since the information for fiber identification is contained in the surface of the fiber instead of the background, fibers were segmented from the background before feature extraction. Taking the wool fiber in Figure 1(b) as an example, the detailed steps of image preprocessing follow. Figure 2 is the flowchart for image preprocessing.

Figure 1.

Original microscope images of cashmere and wool fibers. (a) Original cashmere fiber. (b) Original wool fiber.

Figure 2.

The flowchart of image preprocessing.

Step 1. Image enhancement. First, the original image was transformed to a grayscale image. As shown in Figure 3(a), a Gaussian high-pass filter was used to enhance the scale edges of the fiber. Second, it was found that the bands occupied in their gray level histograms were very narrow. As shown in Figure 4, the gray values of the vast majority of pixels in the image were concentrated in the range (115, 180). The gray values of pixels in this interval were stretched linearly to a wider range (0, 255). As shown in Figure 3(b), after contrast stretching, the scale edges of the fiber image were enhanced significantly.

Figure 3.

Image preprocessing. (a) Highpass filtering. (b) Contrast stretching. (c) Binarizing. (d) Removing small connected components. (e) Filling margin. (f) Segmenting from background.

Figure 4.

Gray level histogram of the wool image.

Step 2. Image binarization. In this step, as shown in Figure 3(c), Otsu’s method was used to generate an adaptive threshold to produce a binary image for each fiber.²⁰

Step 3. Fiber segmentation from background. First, every connected component in the image was labeled and its individual area was calculated. The biggest component (two fiber edges) was retained and others were removed, as shown in Figure 3(d). Second, a closing operation using a “disk” structuring element with radius 50 was performed in order to fill the gap between the two fiber edges, as shown in Figure 3(e). Third, after performing the AND operation between the contour image and the enhanced image, the fiber trunk was segmented from the background, as shown in Figure 3(f).

Bag-of-words model

As shown in Figure 5, image classification with BoW mainly includes three stages. In the first stage, the local features of the images were detected and described by a local descriptor. In the second stage, the feature descriptors extracted from the images were clustered. Each cluster center was taken as a codeword and all codewords constituted a codebook. Therefore, each image can be represented by a histogram of codewords. In the third stage, a classifier was trained for image classification. The detailed explanations follow.

Figure 5.

The three stages of BoW.

Feature extraction

In our study, the scale invariant feature transform (SIFT) method was chosen to represent the fiber images.²¹ SIFT descriptors are invariant to translation, rotation, and image scale. Compared with other local descriptors, SIFT descriptors performed best in image classification.²² The SIFT algorithm consists of four steps: detecting local extrema, locating the keypoints, assigning the dominant orientation for each keypoint, and generating keypoint descriptors.

Step 1. Detecting local extrema. First, the scale space was created for each original image. An original image was convolved by a Gaussian filter with different scales to generate a series of Gaussian blurred images. A Gaussian blurred image $L (x, y, k σ)$ is represented as

L (x, y, k σ) = G (x, y, k σ) * I (x, y)

(1)

where

G (x, y, k σ)

is a Gaussian filter at scale

k σ

I (x, y)

is an original image, * is the convolution operation.

Second, the Difference of Gaussians (DoG) images were generated by the subtraction of two successive Gaussian blurred images. A DoG image $D (x, y, σ)$ is indicated as

D (x, y, σ) = L (x, y, k σ) - L (x, y, σ)

(2)

Thirdly, each pixel in the DoG image was compared with its neighbors. The pixels with the extrema (maximum/minimum) were treated as the candidate keypoints.

Step 2. Locating the keypoints. The position of each candidate keypoint was accurately determined by interpolation of nearby data. Considering stability and simplification, those candidate keypoints with low contrast or poorly positioned along an edge were further screened.

Step 3. Assigning dominant orientation. The dominant direction of each keypoint was specified by calculation of the magnitude and the orientation of sample points in a neighboring region around the keypoint.

Step 4. Generating keypoint descriptors. The gradient orientations and the coordinates of the descriptor were rotated relative to the keypoint orientation. This was to make the descriptor achieve orientation invariance. For robustness in application, a set of orientation histograms were generated by computing the magnitude and orientation values of sample points in a 16×16 region around the keypoint. These histograms form a descriptor (128-dimension vector) to describe the patch around the keypoint.

Codebook construction and image representation

As shown in Figure 6, the descriptors extracted from each image in the training set were collected together. Each descriptor (vector) was considered as a point in a multi-dimensional space. A K-means clustering algorithm was applied to convert similar descriptors to a cluster. The final cluster centers were defined as codewords (analogous to words in a document), which was regarded as representative of several similar descriptors. After clustering, all descriptors were converted to a collection of codewords. The collection of codewords was called a “codebook.”¹⁵ Therefore, each descriptor in the images can be mapped to a codeword and each image can be represented by a histogram of codewords. The histogram representation for a fiber image is

H = [t_{1}, t_{2}, \dots, t_{i}, \dots, t_{M}]

(3)

where M is the size of codebook and t_i is the number of occurrences of codeword i in the image. H is the histogram of codewords, which is a representation of an image in the standard BoW.

Figure 6.

Fiber image representation using BoW.

Classification

Support vector machine

In our study, SVM was chosen as the classifier to perform binary classification. SVM is one of the most popular supervised learning methods for classification and regression. It can create a hyperplane as a decision boundary in high dimensional space and classify the dataset. The hyperplane has the largest distance to the nearest data of every class. The optimal hyperplane is achieved by training a number of data. The error function is defined as follows

\frac{1}{2} w^{T} w + C \sum_{i = 1}^{N} ɛ_{i}

(4)

subject to the constraints

y_{i} (w^{T} ϕ (x_{i}) + b) \geq 1 - ɛ_{i} and ɛ_{i} \geq 0, i = 1, \dots, N

(5)

where w represents the vector of coefficients, C is the capacity constant, and b is also a constant to form the hyperplane

w^{T} ϕ (x_{i}) + b = 0

ɛ_{i}

represents the parameters for handling non-separable data (inputs). The index i labels the N training cases. Note that

y = \pm 1

represents the class labels and x_i represents the independent variables. The kernel ϕ is used to transform data from the input to the feature space. The kernel should be determined according to the specific task.

Spatial pyramid match kernel

The BoW model with SPM is an extension of the standard BoW. The SPM kernel can map the features of images to multi-resolution histograms and measures the similarity between images. In SPM, the feature vectors extracted from images can be regarded as points in a multi-dimensional feature space.

To represent the multi-resolution, as shown in Figure 7, an image was averagely divided into a sequence of grids at resolution 0, …, L. At the top in Figure 7, the image, which contains three feature types (codewords), is partitioned into three successive levels of resolution. At the bottom in Figure 7, for each level, the number of features falling in each grid cell is counted. According to Equation (6), each spatial histogram is weighted. SPM kernel $K^{L}$ calculates the similarity between images by counting the matches of image features at each level of resolution. It is defined as

K^{L} (X, Y) = \sum_{l = 0}^{L} w^{l} N^{l} = \frac{1}{2^{L}} I^{0} + \sum_{l = 1}^{L} \frac{1}{2^{L - l + 1}} I^{l}

(6)

where

N^{l}

denotes the number of new matches at the lth level of resolution.

I^{l}

denotes the total number of matches in the lth level.

w^{l}

, the weight of

N^{l}

, is set to

\frac{1}{2^{L - l}}

, which can be considered as the measure of difficulty of a match at level l. When the level of resolution is set to zero, the SPM approach is equal to the standard BoW approach.

Figure 7.

An example of a three-level spatial pyramid.

Figure 8 is an example of matching for one feature type between two images. As shown in Figure 8, the three rows correspond to three successive levels of resolution l+2, l+1 and l, respectively. In Figure 8(a), two images are represented as two feature (points) sets, X and Y. Each grid corresponds to a bin at this level. The green bold lines indicate the new matches formed at this level. The black bold lines indicate a match that already occurred at a finer level. In Figure 8(b), the histograms of X (blue) and Y (red) show the number of features in a certain grid at this level. In Figure 8(c), the intersections between the histograms in Figure 8(b) are shown. For example, the new matches found at three levels of resolution are 2, 2, and 1, respectively.

Figure 8.

An example of matching at three levels of resolution. (a) Points sets, (b) Histograms pyramids and (c) Intersection.

For simplicity, Figure 8 only considers an example of a single feature type. Usually, an image contains various feature types. For M feature types, $K^{L}$ was implemented as a histogram intersection of “long” vectors formed by concatenating the appropriately weighted histograms of each feature type at all levels. The cost of this implementation grows with the increase of dimensionality of vectors. Since $K^{L}$ is a weighted sum of histogram intersections and $c min (a, b) = min (ca, cb)$ for positive numbers, the final histogram vector has dimensionality $M \sum {no}_{l = 0}^{L} 4^{l} = M \frac{1}{3} (4^{L + 1} - 1)$ . Here, M is equal to the size of the codebook. It is worth noting that the large L and M may lead to the curse of dimensionality. For example, let $L = 0$ and $M = 600$ ; the histogram vector represented in SPM has dimension 600. Let $L = 4$ and $M = 600$ ; the dimensionality of the histogram vector reaches 204600. The dimensionality of feature vectors would sharply augment with an increase in codebook size and resolution level. In the next section, the choice of M and L in the first experiment is investigated.

Experimental results and discussion

Experimental setup

A sample set with 1458 fiber images (cashmere and wool) was prepared. We performed three experiments to evaluate our approach. In the first experiment, we investigated the choice of the size of codebook and the level of resolution. In the second experiment, we compared the performance of the original images and images segmented from the background. In the third experiment, the fibers (cashmere and wool) were selected to form several datasets with different blend ratios. We evaluated the stability of SPM on these datasets. All the experiments were run on an Intel(R) Xeon(R) E5-2620 v3 CPU@2.0GHz machine with 24GB memory.

For each dataset, we randomly selected 70% of the fiber images of the dataset as a training set and the remaining 30% as a testing set. The identification accuracy of cashmere and wool was defined as

A_{c} = \frac{R_{c}}{T_{c}} \times 100 %

(7)

A_{w} = \frac{R_{w}}{T_{w}} \times 100 %

(8)

A_{t} = \frac{R_{c} + R_{w}}{T_{c} + T_{w}} \times 100 %

(9)

where A_c and A_w are the identification accuracy of cashmere and wool fiber, respectively. A_t refers to the average accuracy of cashmere and wool. T_c denotes the total number of cashmere fibers in the dataset. T_w denotes the total number of wool fibers in the dataset. R_c represents the number of cashmere fibers that were recognized correctly. R_w represents the number of wool fibers that were recognized correctly.

Experimental study

The choice of the size of codebook and level of resolution involves the trade-off between generalizability and discriminability.²³ In this experiment, we investigated the performance of SPM at various sizes of codebook and different levels of resolution. A total of 1458 fiber (737 cashmere and 721 wool) images were chosen as the dataset. Twelve sizes of codebook were selected: 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, and 1200. Five successive levels of resolution were set to 0, 1, 2, 3, and 4. In fact, SPM is equal to the standard BoW model when the level of resolution is set to zero.¹⁸

As shown in Figure 9, five curves represented the performance of SPM at different levels of resolution. The size of codebook and the level of resolution had an obvious impact on the accuracy of classification. Meanwhile, these curves showed a similar trend. The identification accuracy at each level was low when the size of codebook was small. This indicated that the small size of codebook could not afford adequate discrimination for the two kinds of fiber images. With an increase in the size of codebook, the performance started to rise. When the size was set in the range (500, 800), SPM at each level of resolution achieved a higher identification accuracy. The accuracy did not increase significantly or even began to fall (level 0, 1, and 2) when the size of codebook exceeded 800. Two reasons are responsible for the decline. The first one is that only limited features are contained in fiber images. When excessive codewords were chosen in the training stage, similar features were assigned to different codewords. This led to weak discrimination for fiber images in the testing set. The second reason is that too many codewords generated a model over-sensitive to disturbances such as blurring and noise caused by the intrinsic nature of fiber micrographs. These two reasons together led to a poorer generalization of the testing set when the size of codebook was too large.

Figure 9.

Accuracy of SPM at different scale levels.

It is noticed that the curves at levels 3 and 4 did not decline when the size of codebook exceeded 800. The possible reason is that, although an excessively high level of resolution reduced the generalization ability of the model, SPM with higher levels combines multiple levels of resolution. It kept the stability of the model.¹⁸ Csurka found that a high performance was achieved at an intermediate size of codebook, rather than a large size of codebook.¹⁴ This coincided with our observation that SPM with level 2 achieved the best performance for codebooks with a size over 50. Considering the identification accuracy and the dimensions of features, an SPM model with a resolution level of 2 and a codebook size of 600 was chosen in subsequent experiments.

We further investigated the impact of the size of codebook by comparing the error rates of the training set and testing set. In this experiment, the dataset still contained 1458 fiber (737 cashmere and 721 wool) images. Fourteen sizes of codebook were selected: 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1200, 1400, and 2000. Figure 10 exhibits the error rate of identification at resolution level 2 for the training set and testing set under various sizes of codebook. The identification error for the training set decreases with increasing size of codebook. For the testing set, the lowest error was achieved when the size was 600. The error began to go up when the size of codebook was greater than 800. To avoid over-fitting, as shown in Figure 10, choosing an appropriate size of codebook is very necessary.

In order to evaluate the effect of image segmentation on performance, we compared the identification accuracies between treated and untreated images, respectively. Each dataset contained 737 cashmere and 721 wool fibers. The dataset with untreated images was called dataset A, and the dataset containing images after image segmentation was called dataset B. According to the foregoing discussion, the resolution level was set to 2. Figure 11 exhibits the performance of SPM under various sizes of codebook for these two datasets. Obviously, the identification accuracies in dataset B outperformed those of dataset A. This indicated that the proposed image segmentation delivered a positive contribution in enhancing the identification accuracy. Although the highest accuracies under the two datasets were almost the same, the dimensionality of feature vectors was much lower when the size of codebook was 600. This means the cost of computation was reduced significantly.

Figure 10.

Accuracy of the training set and testing set.

Figure 11.

The performance of SPM in original images and segmented images.

To evaluate the stability of SPM, 13 groups of samples were prepared with different blend ratios, as shown in Table 1. In Table 1, the second column shows the number of fibers, instead of the weight of fibers, in the blend. According to the foregoing discussion, we chose 600 as the size of codebook and 2 as the level of resolution in this experiment. The identification accuracy of each dataset exceeded 90% and the fluctuation is around 3%, which means the performance of SPM is relatively stable under various blend ratios.

Table 1.

The performance of spatial pyramid match under different blend ratios

Group no.	Blend (cashmere/wool)	Blend ratio (%) (cashmere/wool)	A_t (%)
1	737/721	50.5/49.5	92.4±2.1
2	737/621	54.3/45.7	92.6 $\pm 1.5$
3	737/521	58.6/41.4	92.8 $\pm 2.4$
4	737/421	63.6/36.4	93.9 $\pm 3.3$
5	737/321	69.7/30.3	94.0 $\pm 2.5$
6	737/221	76.9/23.1	94.4 $\pm 1.4$
7	737/121	85.9/14.1	95.4 $\pm 1.3$
8	637/721	46.9/53.1	92.6 $\pm 2.2$
9	537/721	42.7/57.3	91.5 $\pm 3.8$
10	437/721	37.7/62.3	92.3 $\pm 2.5$
11	337/721	31.9/68.1	92.1 $\pm 3.5$
12	237/721	24.7/75.3	92.0 $\pm 2.4$
13	137/721	16.0/84.0	93.4 $\pm 1.6$

Conclusion

This paper presented an evaluation of the BoW with SPM model to identify the microscopic images of cashmere and wool fibers. This is the first systematic investigation of this representation scheme in the field of fiber identification. Several experiments were performed to evaluate the performance of the model. Firstly, we compared the identification accuracy of the model at various sizes of codebook and different levels of resolution. Considering discrimination and generalization, a codebook size of 600 and a resolution level of two were recommended in the model. Secondly, by comparing the processed images and the original images, it was found that the former could retain effective information contained in the images and lower the cost of computation. Finally, experiments were performed under datasets with different blend ratios. Results indicated that the identification accuracy is stable for the model to identify cashmere and wool fibers.

Footnotes

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by National Natural Science Foundation of China (Grant No. 61572124), and the Fundamental Research Funds for the Central Universities (Grant No. CUSF-DH-D-2016016).

References

Zoccola

Mossotti

et al.

Identification of wool, cashmere, yak, and angora rabbit fibers and quantitative determination of wool and cashmere in blend: a near infrared spectroscopy study. Fiber Polym 2013; 14: 1283–1289.

Zhong Y, Lu K, Tian J, et al. Wool/cashmere identification based on projection curves. Text Res J. 2017; 87: 1730–1741. DOI: 10.1177/0040517516658516.

Tridico

Natural animal textile fibers: structure, characteristics and identification. In: Houck

(ed). Identification of Textile Fibers, Cambridge, England: Woodhead Publishing Ltd, 2009, pp. 27–87.

Vineis

Tonetti

Paolella

et al.

A UPLC/ESI–MS method for identifying wool, cashmere and yak fibres. Text Res J 2014; 84: 953–958.

Tang

Zhang

Zhou

et al.

A real-time PCR method for quantifying mixed cashmere and wool based on hair mitochondrial DNA. Text Res J 2014; 84: 1612–1621.

Kirsten

Gabriel

Lothar

et al.

Development of a DNA-analytical method for the identification of animal hair fibers in textiles. Text Res J 2009; 79: 69–75.

Tonetti

Vineis

Aluigi

et al.

Immunological method for the identification of animal hair fibres. Text Res J 2012; 82: 766–772.

She

Kong

Nahavandi

et al.

Intelligent animal fiber classification with artificial neural networks. Text Res J 2002; 72: 594–600.

Liu

. A research on cashmere automatic identification method based on statistical analysis. Wool Text J 2014; 42: 62–64.

10.

Shi X and Yu W. A new classification method for animal fibers. In: 2008 International Conference on Audio, Language and Image Processing, Shanghai, China, 7–9 July 2008, pp. 206–210. New York: IEEE.

11.

Yuan

Meng

MQH

. Improved bag of feature for automatic polyp detection in wireless capsule endoscopy images. IEEE Trans Autom Sci Eng 2016; 13: 529–535.

12.

Zhu

Zhong

Zhao

et al.

Bag-of-visual-words scene classifier with local and global features for high spatial resolution remote sensing imagery. IEEE Geosci Remote Sens Lett 2016; 13: 747–751.

13.

Yang Y and Newsam S. Bag-of-visual-words and spatial extensions for land-use classification. In: Proceedings of the 18th Sigspatial International Conference on Advances in Geographic Information Systems (ed AE Abbadi), San Jose, CA, USA, 2–5 November 2010, pp. 270–279. New York: ACM.

14.

Csurka G, Dance CR, Fan L, et al. Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision Eccv 2004 (ed D Comaniciu), Prague, Czech Republic, 11–14 May 2004, pp. 1–22. Berlin: Springer.

15.

Li FF and Perona P. A Bayesian hierarchical model for learning natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2005 (ed C Schmid), San Diego, CA, USA, 20–25 June 2005, pp. 524–531. Los Alamitos: IEEE Computer Society.

16.

Caicedo JC, Cruz A, and Gonzalez FA. Histopathology image classification using bag of features and kernel functions. In: 12th Conference on Artificial Intelligence in Medicine (ed C Combic, Y Shahar, A Abu-Hanna), 18–22 July 2009, Verona, Italy, pp. 126–135. Berlin: Springer.

17.

Grauman K and Darrell T. The pyramid match kernel: discriminative classification with sets of image features. In: IEEE International Conference on Computer Vision 2005, Beijing, China, 17–20 October 2005, pp. 1458–1465. Los Alamitos: IEEE Computer Society.

18.

Lazebnik S, Schmid C, and Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2006, New York, USA, 17–22 June 2006, pp. 2169–2178. Los Alamitos: IEEE Computer Society.

19.

Yang J, Yu K, Gong Y, et al. Linear spatial pyramid matching using sparse coding for image classification. In: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009, pp. 1794–1801. Los Alamitos: IEEE Computer Society.

20.

Otsu

. A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern Soc 1979; 9: 62–66.

21.

Lowe

. Distinctive image features from scale-invariant keypoints. Int J Comput Vision 2004; 60: 91–110.

22.

Mikolajczyk

Schmid

. A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 2005; 27: 1615–1630.

23.

Yang J, Jiang YG, Hauptmann AG, et al. Evaluating bag-of-visual-words representations in scene classification. In: 9th ACM Sigmm International Workshop on Multimedia Information Retrieval, Mir 2007, Augsburg, Bavaria, Germany, 28–29 September 2007, pp. 197–206. New York: ACM.