Automated paper impurities evaluation using feature representations based on ADMM sparse codes

Abstract

To automatic detect and characterize paper impurities with computer vision, we present a novel two parts evaluation procedure with feature representations using Alternating Direction Method of Multipliers (ADMM) sparse codes. The method is based on an offline training step to obtain sparse coefficients and codebooks via learning extracted features with ADMM optimization, followed by an online detection step to use linear SVM classifier to assess defective paper samples from non-defective ones. Our approach bridges the gap between paper impurities evaluation and sparse feature representations, taking advantages of existing ADMM algorithms to handle sparse codes problem. We compare different feature descriptors and sparse code methods to implement the procedure and experimentally validate it on a dataset of 11 paper classes. Experiment results show that the proposed method is competitive and effective in terms of evaluation accuracy and speed.

Keywords

Paper impurities evaluation feature representation sparse code ADMM

1 Introduction

Quality control is an important part of modern industry, and the paper manufacturer is no different to any other manufacturer in this field. In the process of paper production, paper might contain various types of dirt influenced by temperature, humidity and other environmental conditions. In most conditions, the appeared impurities and surface defects need to be avoided immediately to control the quality of paper. The increasing attention to environment-friendly production policies and the rising in the production of recycled paper have made the paper quality control need more and more compelling [1]. Meanwhile, the detection of impure particles can help to find the source of impurities in the manufacturing process, which could be eliminated subsequently. This might reduce the chemicals usage in the bleaching process and have advantaged effects on the environment. Hence, paper manufacturer is more and more concerned about the development of a reliable and quick system to detect such impurities and defects automatically.

With the rapid development of image recognition technologies and hardware production, numbers of defect detection approaches based on compute vision have been proposed in recent years. These defect detection applications now contain a wide range of industrial products, such as fiber [2], textile [3], cold-formed micro-parts [4, 5], metal surface [6] and natural stone [7]. However, the application of computer vision is still not common in the papermaking industry as far as now. In many cases paper impurities evaluation is still conducted by humans with respect to the high detection accuracy but limited by inspection frequency, stability and costs. As a result, automatic computer vision paper impurities detection has a very broad application prospect.

Within paper impurities evaluation field, Bianconi et al. [1] presented a two steps approach based on machine vision to detect impurities. Their algorithm is based on an early classification step to distinguish defective paper parts from non-defective parts, and it followed by a threshold step to separate the impurite part from the background. Torniainen et al. [8] described an equipment to automatic count dirt parts on dry and wet pulp sheets with transmitted light. The accuracy of their method ranged from 75% to 90%. Similarly, Duarte et al. [9] introduced an automatic visual inspection system which aimed at dirt inspection in the pulp and paper manufacturing, and a new hierarchical region oriented segmentation algorithm was used in the method. To improve, Campoy et al. [10] proposed a machine vision system developed for on-line visual paper inspection under the critical lighting requirements of the UNE-ISO 5350-2 standard. Above former researches make a great contribution on the hardware design of the paper impurities inspection, however, the method used in impurities recognition is too simple to ensure high detection accuracy. To solve this problem, we propose a new approach for paper impurities assessing.

In this paper, we proposed a novel feature representations approach based on ADMM sparse codes for paper impurities evaluation. The method mainly divide into two parts: offline training and online detection. In offline training, the features are firstly extracted from the labelled image dataset, and then calculated sparse coefficients and codebooks via learning extracted features with ADMM optimization. In the online detection, the learned linear SVM classifier model used the extracted sparse features to predict whether a paper image belongs to the defect or non-defect class. Our approach bridges the gap between paper impurities evaluation and sparse feature representations, taking advantages of existing ADMM algorithms to handle sparse codes problem. Experiment results show that the proposed method is competitive and effective in terms of evaluation accuracy and computation speed.

The rest of the paper is organized as follows. In Section 2, we introduce the framework of proposed method and how to make linear classification with sparse coding feature representations. Section 3 shows how the codebook and coefficients solved by the ADMM, and gives the detail of optimization process. Section 4 reports experimental results and illustrates the performance of presented method. Finally, a conclusion is made in Section 5.

2 Problem description

2.1 Framework of proposed method

The scheme of proposed paper impurities evaluation method is shown in Fig. 1 which is composed of two major parts. In the offline training part, the method uses datasets of labelled training samples that contain both defect and non-defect paper images to make training. First, the features are extracted from the labelled dataset. Then each image is computed a spatial-pyramid image representation based on sparse codes of extracted features, which will discuss later. Furthermore, our approach uses max-spatial pooling that is more robust to local spatial translations and more biological plausible [11]. Finally, a codebook will be obtained through optimization to help classify in online detection.

Fig.1

Paper impurities evaluation framework structure.

In the online detection part, we extract sparse features from the images that need to be evaluated firstly. The new sparse features capture more salient properties of visual patterns, and turns out to work surprisingly well with linear classifiers [12]. In the next step, the linear classifier uses all training examples to learn a model via learning procedure. At last, the learned linear classifier model uses the extracted sparse features and predicts whether an image belongs to the defect or non-defect class.

2.2 Feature extraction

The set of features extracted from original image is important for final classification results. Here, we mainly introduce following four image descriptors which are rotation invariant, since in principle defects can occur at any orientation.

The first feature is the Histograms of Oriented Gradient (HOG) descriptors. The main idea behind HOG which introduced in [13, 14] is that the local appearance and shape of object in an image could be represented by the direction of the contours or the intensity distribution of gradients. The implementation of HOG descriptors can be first obtained by separating the image into small connected parts. Then we calculate a histogram of gradient directions or edge orientations for all pixels in it for each small part. The combination of these computed histograms is the descriptor. HOG method can maintain the invariance to geometric and photometric transformations.

Then, the Scale Invariant Feature Transform (SIFT) descriptors are also adopted in the paper. SIFT descriptor could transform an image into a large collection of local feature vectors, and each of which is invariant to image [15]. Commonly, SIFT features are extracted in four steps. The first step computes the positions of potential interest points through detecting the minima and maxima of a set of Difference of Gaussian filters which are applied at different scales over the image. Next, these positions are refined via abandoning points with low contrast. Then an orientation is given to each key point that depended on local image features. Finally, a local feature descriptor is computed at each key point. Every feature is a vector with 128 dimension identifying the neighborhood around the key point distinctively.

We also get the SIFT and HOG features that computed after Gabor filtering. Gabor filters could measure the response of an input image at different orientations and frequencies, and it can be regarded as two-dimensional sinusoids modulated by a Gaussian envelope [16]. Feature extraction with Gabor filters requires to make the filter bank meet the needs of the specific application domain. After Gabor filtering and normalization, SIFT and HOG features are calculated to represent an image.

2.3 Linear SVM classifier using sparse coding

In this part, we will concentrate on how to sparse code the extracted features and use a linear classifier to make distinguish on defect or non-defect papers with linear classifier. Let X be a set of feature descriptors extracted from an image in d-dimension, i.e. $X = [x_{1}, x_{2}, \dots, x_{n}] \in ℝ^{d \times n}$ . Given a codebook $B = [b_{1}, b_{2}, \dots, b_{n}] \in ℝ^{d \times k}$ with k entries. The problem is to covert each feature descriptor into a k-dimensional code and finally generate the image representation.

First, we present the problem formulation with sparse coding below. Assume $x_{i} \in ℝ^{d}$ , $B \in ℝ^{d \times k}$ , and $c_{i} \in ℝ^{k}$ , where c is the sparse vector of coefficients, and i refers to the i-th training example. For the optimization problem, it can be given as: $min_{B, c_{i} \forall i} \frac{1}{n} \sum_{i = 1}^{n} (\frac{1}{2} x_{i} - B c_{i 2}^{2} + λ c_{i 1})$ (1) $s . t . {∥ B_{j} ∥}_{2}^{2} \leq 1, j = 1, 2, \dots, k .$

Then we can define the objective function F as given in Equation (2):

$\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} F (x_{i}, c_{i}, B) = \frac{1}{n} \sum_{i = 1}^{n} (\frac{1}{2} x_{i} - B c_{i 2}^{2} + λ c_{i 1}) \\ = \frac{1}{n} \sum_{i = 1}^{n} (\frac{1}{2} x_{i} - \sum_{j = 1}^{d} c_{i, j} B_{j 2}^{2} + λ c_{i 1}) \end{matrix}$ (2) where c_ij refers to the j-th entry of c_i.

Here we know that the objective function is non-convex if we solve the problem for both B and c_i. However, if we only solve for either the coefficients c_i or the bases B when keep the other as constant, we could get a convex problem with convex constraints. Several convex programming can be used here, but it is often slow. Instead, we use Alternating Direction Method of Multipliers (ADMM) to solve the sparse coding problem, and the details will be given in the next section.

In the next step, we need pool these features across different spatial locations over different spatial scales after obtaining sparse coding features of the image. For any image represented by a set of descriptors, we may compute a single feature vector based on some statistics of the descriptors’ codes [17]. Let z denote the histogram representation for image I . Here, we use max pooling function on the sparse codes: $z_{j} = \max {c_{1, j}, c_{2, j}, \dots c_{n, j}}$ (3) where c_n,j is computed via solving Equation 2, z_j is the j-th element of z , and n is the number of local descriptors in the region. This max pooling method is well built by biophysical evidence in visual cortex [18] and is justified by many algorithms applied to image classification.

Finally, we can use linear SVM to make image classification. The SVM aims to learn a decision function: $f (z) = \sum_{i}^{n} α_{i} κ (z, z_{i}) + b$ (4)

Assuming ${(z_{i}, y_{i})}_{i = 1}^{n}$ is the training set, and y_i ∈ {+1, - 1} indicates labels of image impurities. For a test paper image represented by z , if f (z) >0, then the image is classified as defect, otherwise as non-defect.

3 Optimization with ADMM

Looking on Equation (1), we see that the objective function is not convex if it need to be solved for B and c toghther. However, if we only solve for either the codebook B or the coefficients c while holding the other parameter constant, then we can obtain a convex problem with convex constraints. Hence, it can be solved by alternatively minimizing objective function which consists of two steps: minimization with respect to the coefficients and with respect to the codebook B .

3.1 Solving for the coefficients

As mentioned above, iterative optimization is slow, so we turn to the ADMM algorithm in order to solve it more efficiently. Separate the variable from minimizing data term and regularizing term via assuming $c_{i} = {\hat{c}}_{i}$ , and formulate the convex optimization problems into the following form: $min_{c_{i}} \frac{1}{n} \sum_{i = 1}^{n} (\frac{1}{2} x_{i} - B c_{i 2}^{2} + λ {\hat{c}}_{i 1})$ (5) $s . t . c_{i} = {\hat{c}}_{i}, i = 1, \dots, n .$

Then, the augmented Lagrangian function is given:

$\begin{matrix} L (c_{i}, {\hat{c}}_{i}, y) \\ = \frac{1}{n} \sum_{i = 1}^{n} (\frac{1}{2} x_{i} - B c_{i 2}^{2} + λ {\hat{c}}_{i 1} + y_{i}^{T} (c_{i} - {\hat{c}}_{i}) \\ + \frac{ρ}{2} c_{i} - {\hat{c}}_{i 2}^{2}) \end{matrix}$ (6)

Hence, the data term ${∥ x_{i} - B c_{i} ∥}_{2}^{2}$ and the regularizer $λ | | \hat{c} | |_{1}$ can be minimized separately through the variable splitting. Following the guidance of the ADMM, we can update c_i via solving the minimization problem with respect to ${\hat{c}}_{i}$ and y_i alternately as: $\begin{matrix} c_{i}^{k + 1} \leftarrow {(B^{T} B + ρ I)}^{- 1} (B^{T} x_{i} + ρ {\hat{c}}_{i}^{k} - y_{i}^{k}) \\ {\hat{c}}_{i}^{k + 1} \leftarrow max (c_{i}^{k + 1} + \frac{1}{ρ} y_{i}^{k} - \frac{λ}{ρ}, 0) \\ - max (- c_{i}^{k + 1} - \frac{1}{ρ} y_{i}^{k} - \frac{λ}{ρ}, 0) \\ y_{i}^{k + 1} \leftarrow y_{i}^{k} + ρ (c_{i}^{k + 1} - {\hat{c}}_{i}^{k + 1}) \end{matrix}$

Assume that the convolution is periodic and B ^T is a block-circulant matrix with circulant blocks, then $c_{i}^{k + 1}$ can be fast computed by Fast Fourier Transform (FFT). The use of the FFT in solving the relevant linear system is shown to give substantially better asymptotic performance than the original spatial domain method, and evidence is presented to support the claim that the resulting boundary effects are not significant [19, 20]. The solution of ${\hat{c}}_{i}^{k + 1}$ and $y_{i}^{k + 1}$ are one-dimension minimization, it can be numerically pre-computed and used in their minimization as [21].

3.2 Solving for the codebook

The problem for optimizing for the codebook B is convex, where each basis is a column b_j. It can solve it directly using various optimization tools such as SPAMS [22]. However, we found that solving the problem in this way is time-consuming, because when we solve for the codebook $B \in ℝ^{d \times k}$ , the number of variables is d × k. Instead for solving d × k variables, we can form the dual problem and solve with only d variables. Assume $B = \hat{B}$ , and formulate the convex optimization problems into the following form: $min_{c_{i}} \frac{1}{n} \sum_{i = 1}^{n} (\frac{1}{2} x_{i} - B c_{i 2}^{2})$ (7) $s . t . B = \hat{B} .$

Then, the augmented Lagrangian function is given:

$\begin{matrix} L (B, \hat{B}, y) & = & \frac{1}{n} \sum_{i = 1}^{n} (\frac{1}{2} x_{i} - B c_{i 2}^{2}) \\ + \frac{ρ}{2} B - \hat{B} + R_{2}^{2} \end{matrix}$ (8) where the new variables R are the residuals to the estimates of the Lagrange multipliers of the corresponding constraints, and $R \in ℝ^{d \times k}$ . Following the guidance of the ADMM, we can update B via solving the minimization problem with respect to $\hat{B}$ and R alternately as: $\begin{matrix} B^{k + 1} \leftarrow \arg min_{B} \frac{1}{n} \sum_{i = 1}^{n} (\frac{1}{2} x_{i} - B c_{i 2}^{2}) \\ + \frac{ρ}{2} B - {\hat{B}}^{k} + R_{2}^{k 2} \\ {\hat{B}}^{k + 1} \leftarrow Π_{C} (B^{k + 1} + R^{k}) \\ R^{k + 1} \leftarrow R^{k} + B^{k + 1} + {\hat{B}}^{k + 1} \end{matrix}$ where $Π_{C}$ a projection function that is applied to each column of the matrix individually as:

$\begin{matrix} Π_{C} (X) & = & Π_{C} (X_{j}) \forall j \\ Π_{C} (X_{j}) & = & {\begin{matrix} \frac{X_{j}}{X_{j}^{2}} & for X_{j}^{2} > 1 \\ X_{j} & for X_{j}^{2} \leq 1 \end{matrix} \end{matrix}$ (9) and X_j is the j-th column of X. Variables R are introduced by the augmented Lagrangian method (ALM), and they can be updated iteratively by simple gradient descent with trivial computation time. However, the first step is very slow to solve in the iteration. Hence we present a heuristic solution which is much faster.

3.3 Heuristic solution for the codebook optimization

In order to optimize the codebook with an ADMM solution, we reform the minimization problem which shown in Equation (1) using a heuristic solution. We change the non-degeneracy constraint and regularize the least-squares problem into a Frobenius norm [23]: $min_{B} \frac{1}{n} \sum_{i = 1}^{n} (\frac{1}{2} x_{i} - B c_{i 2}^{2} + β B_{F}^{2})$ (10)

Now, we notice that the objective function is a convex function that could be nice solved. Taking the gradient with respect to B , Equation (10) can be turn into: $\begin{matrix} \nabla_{B} (\frac{1}{n} \sum_{i = 1}^{n} \frac{1}{2} x_{i} - B c_{i 2}^{2} + β B_{F}^{2}) \\ = \nabla_{B} (\frac{1}{n} \sum_{i = 1}^{n} \frac{1}{2} (x_{i}^{T} x_{i} - 2 x_{i}^{T} B c_{i} + c_{i}^{T} B^{T} B c_{i}) \\ + β B_{F}^{2}) \\ = \frac{1}{n} \sum_{i = 1}^{n} (B (c_{i} c_{i}^{T}) - x_{i}^{T} c_{i}^{T}) + 2 β B \\ = B (\frac{1}{n} \sum_{i = 1}^{n} c_{i} c_{i}^{T} + 2 β I) - \frac{1}{n} \sum_{i = 1}^{n} x_{i} c_{i}^{T} \end{matrix}$

Looking for the global minimum, it can be found that: $B^{*} = (\sum_{i = 1}^{n} x_{i} c_{i}^{T}) {(\sum_{i = 1}^{n} c_{i} c_{i}^{T} + 2 n β I)}^{- 1}$

This heuristic solution is equivalent to the original problem approximately and could give a suitable solution.

4 Experiments and analysis

In this section, first, we compare the sparse feature represent of SIFT, HOG, Gabor + SIFT and Gabor + HOG with Local Binary Patterns (LBP) [24], Grey-level co-occurrence matrices (GLCM) [25], and statistical Gabor features (S-Gabor) [26] by learning a sparse coding dictionary from dataset images. Then, we also compared the efficiency of ADMM optimization method using SIFT, HOG, Gabor + SIFT and Gabor + HOG feature represents to other popular sparse coding algorithms, such as the feature-sign algorithm and SPAMS toolbox. All experiments were performed by MATLAB with an Intel(R) Core(TM) 2.00 GHz processor. It can be found that our proposed methods are able to speed up the sparse coding algorithm and our implementation is a useful tool for paper impurities evaluation.

4.1 Dataset

The dataset used in this work is provided by the Bianconi et al. [1]. The dataset consists of 11 different classes of paper image, and each class containing 2 different types with 48 positive examples (defect) and 48 negative examples (non-defect).

The characteristics of each class are reported in Table 1. These are 1056 examples in total with an image size of 128 pixel×128 pixel each. The images inside a single class are similar but all examples have a varying kinds of defects. The dataset provides labels for each training example to mark the defect. The dataset is obtained through an imaging system using either transmitted light or reflected light: when working by reflected light, the dome is on and the backlight illuminator is off; when operating by transmitted light the reverse occurs. For every class Table 1 reports two images of each of the defect and non-defect paper group. The dataset comprises a wide enough range of inclusions as for density, transparency and type.

Table 1
The characteristics of each class in the dataset

Class Number Non-defect samples Defect samples Illumination Conditions

1 Transmitted Light

2 Transmitted Light

3 Reflected Light

4 Reflected Light

5 Reflected Light

6 Transmitted Light

7 Transmitted Light

8 Transmitted Light

9 Reflected Light

10 Transmitted Light

11 Reflected Light

Class Number	Non-defect samples	Defect samples	Illumination Conditions
1			Transmitted Light
2			Transmitted Light
3			Reflected Light
4			Reflected Light
5			Reflected Light
6			Transmitted Light
7			Transmitted Light
8			Transmitted Light
9			Reflected Light
10			Transmitted Light
11			Reflected Light

4.2 Detection accuracy performance

We conducted a series of experiments to detect the robustness and performance of the proposed method. In the first experiment, we made a comparison experiments between feature represent of SIFT, HOG, Gabor + SIFT, Gabor + HOG, LBP, GLCM and S-Gabor with ADMM sparse coding. A common strategy is to divide a dataset in training and validation data. The images of each paper class are randomly split into two non-overlapping sub-sets, one for training and the other for evaluation.

In this experiment, the above feature descriptors were extracted from training images at the beginning. Then we used optimization process which discussed in Section 3.3 to calculate the codebook. Finally, remaining test images were computed to estimate the accuracy of detection. To get a stable accuracy evaluation, we repeat divide the dataset into training and evaluation set K times. In each problem a linear SVM classifier is trained using training set images, and then accuracy is evaluate as the percentage of images of the validation set classified correctly. The overall accuracy (ACU) is the average over the K repeats: $ACU = \frac{1}{K} \sum_{l = 1}^{K} \frac{n_{ture, l}}{n_{total, l}}$ where n_ture,l is the number of the test data correctly classified in l-th time and n_total,l is the total number of images in test set. Here, we take K = 100. Meanwhile, in order to assess the sensitivity of training rate, we repeated the experiments using two different ratios, namely 1/2 and 1/4 that correspond to 24 and 12 training samples, respectively.

Table 2 demonstrates the accuracy of the different sparse coding feature descriptors for classification. The data show that high classification accuracy can be obtained in SIFT, HOG, Gabor + SIFT, Gabor + HOG cases. However, in LBP, GLCM and S-Gabor cases, the classification results are not ideal. Through the analysis we could found that the feature dimension of SIFT, HOG, Gabor + SIFT, Gabor + HOG is relatively high, and are more than 100 dimension. By contrast, the feature dimension of LBP, GLCM and S-Gabor is very low. It can be seen that the spares coding classification method are not suitable for those low dimension feature descriptors.

Table 2
Classification accuracy of different sparse coding feature descriptors

(a) Training ratio: 1/2

Texture descriptor SIFT HOG Gabor + SIFT Gabor + HOG LBP GLCM S-Gabor

Dataset

1 93.3 94.2 99.1 95.3 76.0 67.7 55.2

2 98.8 93.3 97.9 97.7 75.0 66.4 51.9

3 95.8 93.1 99.3 99.0 75.8 61.5 54.7

4 100.0 96.5 100.0 100.0 77.1 64.5 56.2

5 88.9 88.4 93.2 90.6 69.9 59.5 47.6

6 98.3 92.7 98.3 97.2 79.2 69.7 57.1

7 96.8 94.6 98.7 96.8 81.1 65.1 52.8

8 92.4 90.4 95.2 92.9 73.2 66.2 54.2

9 94.4 90.5 96.9 95.8 73.9 63.6 59.9

10 96.5 97.1 97.8 96.2 75.6 67.8 58.1

11 88.1 86.4 91.2 89.3 70.4 61.1 59.4

Average 94.85 92.47 97.05 95.53 75.20 64.82 55.19

(b) Training ratio: 1/4

Dataset

1 90.8 93.9 96.2 94.9 73.8 67.3 54.5

2 94.8 90.7 95.6 92.1 73.4 65.6 51.2

3 95.0 91.9 98.6 95.8 57.9 58.6 50.9

4 98.3 96.6 97.5 97.9 68.7 60.7 53.3

5 85.5 83.3 88.8 87.9 64.7 54.1 46.2

6 93.8 91.6 96.1 90.7 77.9 67.8 53.1

7 96.1 91.2 95.8 92.3 76.5 69.6 49.8

8 89.5 87.8 93.9 91.9 65.8 62.3 48.2

9 90.1 88.6 92.1 95.8 75.1 61.1 55.9

10 94.8 96.6 94.0 93.1 77.2 67.0 54.7

11 87.0 85.0 88.6 86.7 62.4 53.2 49.8

Average 92.33 90.65 94.28 92.65 70.31 62.48 51.60

(a) Training ratio: 1/2
Dataset
1	93.3	94.2	99.1	95.3	76.0	67.7	55.2
2	98.8	93.3	97.9	97.7	75.0	66.4	51.9
3	95.8	93.1	99.3	99.0	75.8	61.5	54.7
4	100.0	96.5	100.0	100.0	77.1	64.5	56.2
5	88.9	88.4	93.2	90.6	69.9	59.5	47.6
6	98.3	92.7	98.3	97.2	79.2	69.7	57.1
7	96.8	94.6	98.7	96.8	81.1	65.1	52.8
8	92.4	90.4	95.2	92.9	73.2	66.2	54.2
9	94.4	90.5	96.9	95.8	73.9	63.6	59.9
10	96.5	97.1	97.8	96.2	75.6	67.8	58.1
11	88.1	86.4	91.2	89.3	70.4	61.1	59.4
Average	94.85	92.47	97.05	95.53	75.20	64.82	55.19
(b) Training ratio: 1/4
Dataset
1	90.8	93.9	96.2	94.9	73.8	67.3	54.5
2	94.8	90.7	95.6	92.1	73.4	65.6	51.2
3	95.0	91.9	98.6	95.8	57.9	58.6	50.9
4	98.3	96.6	97.5	97.9	68.7	60.7	53.3
5	85.5	83.3	88.8	87.9	64.7	54.1	46.2
6	93.8	91.6	96.1	90.7	77.9	67.8	53.1
7	96.1	91.2	95.8	92.3	76.5	69.6	49.8
8	89.5	87.8	93.9	91.9	65.8	62.3	48.2
9	90.1	88.6	92.1	95.8	75.1	61.1	55.9
10	94.8	96.6	94.0	93.1	77.2	67.0	54.7
11	87.0	85.0	88.6	86.7	62.4	53.2	49.8
Average	92.33	90.65	94.28	92.65	70.31	62.48	51.60

Figure 2 shows the comparison results. Gabor + SIFT is the most reliable method. In 1/2 training ratio, it can attain over 97% accuracy, on average. Even in 1/4 training ratio, it can still maintain, on average, over 94% accuracy. Meanwhile, the detection performance of Gabor + HOG, SIFT and HOG descriptors (ranking by precision) are all over 90% accuracy. Low dimension feature descriptor LBP, GLCM and S-Gabor do not work particularly well in this application.

Fig.2

Comparison of different feature descriptors.

4.3 Detection efficiency performance

In the second experiment, we compared the efficiency of ADMM optimization algorithm with other popular sparse coding algorithms including the feature-sign algorithm and SPAMS toolbox. The feature adopted is Gabor + SIFT feature descriptor, which has shown to have the best classification result in first experiment.

Here, 48 paper images were extracted from each dataset to make training. We run sequential convex optimization for 50 iterations to learn the bases and coefficients. The dimension of the bases was setup as 512. Table 3 shows the computation time results (seconds) in one iteration for learning the dictionary with the same number of bases as the number of input dimensions. In Table 3, evaluation is performed against popular sparse coding algorithms: the feature-sign algorithm [27] and SPAMS toolbox [22].

Table 3
Computation comparison between ADMM, feature-sign algorithm and SPAMS toolbox

Algorithm Feature-sign SPAMS ADMM

Dataset

1 378.9 15.6 4.3

2 376.1 17.2 4.9

3 350.1 11.9 5.8

4 398.3 16.6 5.0

5 346.5 13.1 4.8

6 383.5 15.7 4.6

7 307.0 13.7 5.3

8 347.1 19.0 4.9

9 314.4 11.0 5.0

10 394.8 16.2 5.5

11 342.7 11.0 5.3

Average 358.13 14.64 5.04

Algorithm	Feature-sign	SPAMS	ADMM
Dataset
1	378.9	15.6	4.3
2	376.1	17.2	4.9
3	350.1	11.9	5.8
4	398.3	16.6	5.0
5	346.5	13.1	4.8
6	383.5	15.7	4.6
7	307.0	13.7	5.3
8	347.1	19.0	4.9
9	314.4	11.0	5.0
10	394.8	16.2	5.5
11	342.7	11.0	5.3
Average	358.13	14.64	5.04

The Feature-sign algorithm performs well when the number of bases is small. However, it becomes slower as the number of bases increase. Here, we use its results as the baseline. We can also found that the SPAMS sparse coding tool box works well through experiments, and it is 24 times faster than Feature-sign algorithm. In our method, using ADMM to quickly solve sub-problems of sparse coding enhances the speed of calculating coefficients significantly. The ADMM solver only needs to run about 5 to10 iterations to learn the dictionary efficiently. As can be seen in last row in Table 3, the method of solving for bases using the ADMM is 71 times faster than solving for the primal problem by feature-sign algorithm. Meanwhile, ADMM algorithm is 2.9 times faster than SPAMS algorithm. To summarize, our method performs reasonably well with a simple implementation.

5 Conclusion

In this paper, we proposed a feature representations approach based on ADMM sparse codes for paper impurities evaluation. The method mainly divide into two parts: offline training and online detection. In offline training, the features are firstly extracted from the labelled dataset, and then each image is computed a spatial-pyramid image representation based on sparse codes of feature representations. Finally, a codebook will be obtained through optimization to help classify in online detection. In the online detection, the learned linear classifier model uses the extracted sparse features and predicts whether an image belongs to the defect or non-defect class. The method uses ADMM algorithm instead of traditional methods to solve sparse coding problem. Two experiments have been conducted to show the accuracy and efficiency of the proposed method. Ongoing work involves applying the proposed method in other defect detection application.

References

Bianconi

, Ceccarelli

, Fernández

and Saett

S.A.

, A sequential machine vision procedure for assessing paper impurities, Computers in Industry65 (2014), 325–332.

Feng

, Zou

, Yan

, Shi

, Liu

, Fan

and Deng

, Real-time fabric defect detection using accelerated small-scale over-completed dictionary of sparse coding, International Journal of Advanced Robotic Systems (2016).

Bianconia

and Fernández

, Evaluation of the effects of Gabor filter parameters on texture classification, Pattern Recognition40 (2007), 3325–3335.

Weimera

, Thamer

and Scholz-Reiter

, Learning defect classifiers for textured surfaces using neural networks and statistical feature representations, In CIRP7 (2013).

Scholz-Reiter

, Weimer

and Thamer

, Automated surface inspection of cold-formed micro-parts, CIRP Annals - Manufacturing Technology61 (2012), 531–534.

Peng

Q.H.

, Liu

, Sun

and Huang

, Reliability estimation for aluminum alloy welded joint with automatic image measurement of surface crack growth, Engineering Computations33(4) (2016), 1205–1223.

Bianconi

, González

, Fernández

and Saetta

S.A.

, Automatic classification of granite tiles through colour and texture features, Expert Systems with Applications39(12) (2012), 11212–11218.

Torniainen

J.E.

, Söderhjelm

L.S.A.

and Youd

, Results of automatic dirt counting using transmitted light, TAPPI Journal82(1) (1999), 194–197.

Duarte

, Araújo

and Dourado

, An automatic system for dirt in pulp inspection using hierarchical image segmentation, Computers and Industrial Engineering37(1-2) (1999), 343–346.

10.

Campoy

, Canaval

and Pena

, Inspulp: An on-line visual inspection system for the pulp industry, Computers in Industry56(8–9) (2005), 935–942.

11.

Serre

, Wolf

and Poggio

, Object recognition with features inspired by visual cortex, In CVPR (2005).

12.

Yang

, Yu

, Gong

and Huangy

, Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification, In CVPR (2009).

13.

Dalal

and Triggs

, Histograms of oriented gradients for human detection, In CVPR (2005).

14.

Dalal

, Finding People in Images and Videos. Phd Thesis, L’institut National Polytechnique de Grenoble, 2006.

15.

Lowe

D.G.

, Object Recognition from Local Scale-Invariant Features, In ICCV (1999).

16.

Manjunath

B.S.

and Ma

W.Y.

, Texture features for browsing and retrieval of image data, IEEE Transactions on Pattern Analysis and Machine Intelligence18(8) (1996), 837–841.

17.

Wang

, Yang

, Yu

, Lv

, Huang

and Gong

, Locality-constrained Linear Coding for Image Classification, In CVPR (2010).

18.

Serre

, Wolf

and Poggio

, Object recognition with features inspired by visual cortex, In CVPR (2005).

19.

Afonso

M.V.

, Bioucas-Dias

J.M.

and Figueiredo

M.A.T.

, Fast image recovery using variable splitting and constrained optimization, IEEE Transactions on Image Process19(9) (2010), 2345–2356.

20.

Wohlberg

, Efficient convolutional sparse coding, In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 7173–7177.

21.

Krishnan

and Fergus

, Fast image deconvolution using hyper-laplacian priors, Neural Information Processing Systems, Vancouver, 2009.

22.

Mairal

, Bachand

, Ponce

and Sapiro

, Online learning for matrix factorization and sparse coding, Journal of Machine Learning Research11(3) (2010), 19–60.

23.

Bhaskar

and Zou

, An ADMM Solution to the Sparse Coding Problem, 2011.

24.

Ojala

, Pietikäinen

and Mäenpää

, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Transactions on Pattern Analysis and Machine Intelligence24(7) (2002), 971–987.

25.

Haralick

R.M.

, Shanmugam

and Dinstein

, Textural features for image classification, IEEE Transactions on Systems Man and Cybernetics3(6) (1973), 610–621.

26.

Manjunath

B.S.

and Ma

W.Y.

, Texture features for browsing and retrieval of image data, IEEE Transactions on Pattern Analysis and Machine Intelligence18(8) (1996), 837–841.

27.

Lee

, Battle

, Raina

and Ng

A.Y.

, Efficient sparse coding algorithms, In NIPS (2007).

(a) Training ratio: 1/2
Texture descriptor	SIFT	HOG	Gabor + SIFT	Gabor + HOG	LBP	GLCM	S-Gabor
Dataset
1	93.3	94.2	99.1	95.3	76.0	67.7	55.2
2	98.8	93.3	97.9	97.7	75.0	66.4	51.9
3	95.8	93.1	99.3	99.0	75.8	61.5	54.7
4	100.0	96.5	100.0	100.0	77.1	64.5	56.2
5	88.9	88.4	93.2	90.6	69.9	59.5	47.6
6	98.3	92.7	98.3	97.2	79.2	69.7	57.1
7	96.8	94.6	98.7	96.8	81.1	65.1	52.8
8	92.4	90.4	95.2	92.9	73.2	66.2	54.2
9	94.4	90.5	96.9	95.8	73.9	63.6	59.9
10	96.5	97.1	97.8	96.2	75.6	67.8	58.1
11	88.1	86.4	91.2	89.3	70.4	61.1	59.4
Average	94.85	92.47	97.05	95.53	75.20	64.82	55.19
(b) Training ratio: 1/4
Dataset
1	90.8	93.9	96.2	94.9	73.8	67.3	54.5
2	94.8	90.7	95.6	92.1	73.4	65.6	51.2
3	95.0	91.9	98.6	95.8	57.9	58.6	50.9
4	98.3	96.6	97.5	97.9	68.7	60.7	53.3
5	85.5	83.3	88.8	87.9	64.7	54.1	46.2
6	93.8	91.6	96.1	90.7	77.9	67.8	53.1
7	96.1	91.2	95.8	92.3	76.5	69.6	49.8
8	89.5	87.8	93.9	91.9	65.8	62.3	48.2
9	90.1	88.6	92.1	95.8	75.1	61.1	55.9
10	94.8	96.6	94.0	93.1	77.2	67.0	54.7
11	87.0	85.0	88.6	86.7	62.4	53.2	49.8
Average	92.33	90.65	94.28	92.65	70.31	62.48	51.60