Abstract
Deep learning-based defect inspection has gained popularity in recent years. The dataset requirements for the supervised learning-based method are currently high, but the types of defects are numerous and difficult to gather. This work proposes a local image reconstruction-based unsupervised fabric defect segmentation method to address this problem. Cyclic structures make up the normal portion of the fabric image, whereas the defects are anomalous and minor in comparison. As a result, the defect will be recreated as a normal texture utilizing the information from its surrounding areas, and the defect information will be preserved in the residual image. By masking the same area with various shapes, different reconstruction outcomes and residual images can be achieved. The signal of the defect will be amplified and the noise will be decreased due to the random distribution when the generated residual pictures are fused, which can effectively identify the defect from the noise and lower the false detection rate. On the denim fabric dataset, the proposed unsupervised method can achieve high precision fabric defect segmentation, with the defect detection rate and detection precision reaching at least 85% and 89%, respectively, with high efficiency (approximately 60 m/min inspection speed), outperforming other fabric defect segmentation methods.
Fabric defect inspection is crucial to the textile industry's manufacturing process. Problems with the fabric's surface quality might have an impact on its appearance and cost. Researchers have been working on automating fabric quality monitoring for decades as manufacturing has advanced. It is believed that, as compared to the manual process, the automated system will achieve uniformity and objectivity in detection, lower costs, and increase detection effectiveness.
Fabric defect detection can be considered as an anomaly detection problem. From this point of view, the problem can be solved as a classification problem, judging whether anomalies exist; or a segmentation problem, aiming at the segmentation of the abnormal area.
Generally speaking, classification problems in machine learning can be classified into two types, supervised and unsupervised. However, in the problem of fabric defect detection, methods based on unsupervised classification without enough prior information are quite hard to achieve. Thus, if the classification stage is regarded as the core of detection, it is more likely to treat this problem as a supervised detection problem. In 2003, Kumar first proposed a linear neural network for fabric defect detection and proved the effectiveness of the scheme. 1 In 2019, Jing et al. decomposed the fabric into local areas and marked them separately. Based on that, the trained deep convolutional neural network is used for transfer learning, and a detection model that can be used to detect the type and location of defects is successfully obtained. 2 In 2022, by performing experiment on complex jacquard fabrics, Khodier et al. pointed out that higher defect detection accuracy can be obtained by using EfficientNet with only half the parameters of ResNet50 and one tenth of the parameters of VGG16. 3 Some researchers regard the defect detection problem as a target detection problem.4–7 Among them, methods based on YOLO are popular in various papers and competitions recently. YOLO-based networks are end-to-end lightweight networks that are easy to deploy in industrial scenarios. Paper works show that this method can achieve high accuracy under high detection speed. However, the training of these networks has high requirements for the annotation of datasets; for example, the external box of the defect area, the specific location of the external box and the type of the detection target. The detection effect of supervised classification is highly dependent on the quality of datasets. Considering the diversity and unbalanced distribution of defects and the difficulty of sample acquisition of fabric defects, which conflicts with the construction of the dataset, it is difficult to ensure the effectiveness of training models.
If anomaly segmentation is the focus of methods to solve the problem of fabric defect detection, the high dependency on datasets and labels can be avoided. This kind of method treats abnormal texture and normal texture as two layers of the foreground and background, respectively. Anomaly segmentation is to remove the background while retaining the foreground as much as possible. The background is derived from the original image, and the anomaly is the residual between the original image and the background. It can be seen that the reconstruction of the background is the core of this method. In 2015, Hu et al. applied Fourier transform on the inspection image, zero-masked the dominant frequency components that show high gradient values in the spectrum and used wavelet shrinkage to denoise the restored residual image to detect defect. 8 In 2016, Zhou and Wang proposed an unsupervised defect segmentation method based on Dictionary Learning. 9 The authors applied the dictionary learned from the positive samples on the reconstruction of defective samples and used the residual to detect defects. In 2017, Li et al. proposed a Fisher criterion-based stacked denoising autoencoder (FCSDA), and trained it respectively with only defective images (FCSDA1) and both defective and defect-free images (FCSDA2). 10 Then the authors used the residual between images reconstructed by FCSDA1 and FCSDA2 to determine if the defect is existed or not. In 2018, Zhao et al. combined a generative adversarial network (GAN) and autoencoder for image reconstruction and used local binary pattern (LBP) on the residual image to detect defects. 11 Mei et al. reconstructed image patches with a convolutional denoising autoencoder network at multiple Gaussian pyramid levels and fused the results from the corresponding resolution channels, the residuals of the synthesized results showed the defects. 12 In 2019, Ouyang et al. proposed a weakly supervised on-loom defect segmentation method by combining the techniques of image pre-processing, fabric motif determination, candidate defect map generation, and convolutional neural networks. 13 In 2020, Hu et al. introduced a new encoder component into the deep convolutional GAN to train a reconstruction model for patches from the image. 14 They filled the image with reconstructed patches to avoid the reconstruction of defect. In 2021, Wu et al. used principle component analysis (PCA) to reduce the dimension of positive samples and operated the Dictionary Learning on the processed samples to generate a dictionary which can only reconstruct the defect-free area effectively. 15 By applying the dictionary on the inspection image, residual can help defect detection. All the above solutions only need positive samples or a small number of negative samples to detect fabric defects, which meets the needs and avoids the difficulties of defect detection.
Inspired by the solutions above, this paper proposed a method based on local image reconstruction. Considering that the existing methods are all realized by the direct matrix transformation from the input image to the output image, there is a mapping relationship between the input image and output image information. This leads to the problem that the anomaly is likely to be partially reconstructed, which reduces the expressiveness of the defects in the residual image. Referring to the principle of image inpainting, the missing parts of the image can be reproduced according to the existing main information and principal component of the image. The prior information provided by the periodic feature from the normal part of the denim fabric image contributes to the stable reconstruction of the missing part of the image. In this process, the defect information is in the minority, which can hardly influence the information used for reconstruction. Meanwhile, considering that prior information must exist in image inpainting, the reconstruction process of the entire image can be divided into multiple local reconstruction processes. In this process, Fast Fourier convolution 16 is adopted to ensure local and global consistency between the reconstructed region and the entire image.
The main contribution of our work are as follows:
The method proposed in this paper uses only positive samples to train the reconstruction network and segment the defect effectively, which avoids the problem of the unbalance in defect samples and the need for a large amount of labelling works. The proposed method reconstructs the raw image in an indirect way, in which way, the mapping between image elements before and after reconstruction is avoided, so as to enhance the signal of defects in the residual image and benefit the defect segmentation. In view of the reconstruction result changing when the shape of the mask changes, different residual images can be obtained, a new defect segmentation method based on residual image fusion is proposed to raise the performance of the segmentation.
Methodology
In our opinion, a good methodology for identifying anomalies, such as defects, should initially presume that the image contains anomalies before attempting to corroborate the assumption with knowing information, and if the assumption holds, the anomaly exists.
The known conditions are as follows:
If the defect area is reconstructed according to the normal area, there will be differences before and after reconstruction. In the residual between the defect area and its corresponding regeneration area, two kinds of information should be included, the defect and the random error. Multiple reasonable generation results can exist for the same missing areas of an image. However, all of these results must follow the same paradigm so that the texture and structure of the reconstruction images are continuous.
The assumptions are as follows:
If defects exist in the fabric image, residual images must have common elements, and these elements will have relationships among them or have a certain quantity.
Summarily, in this paper, different residual images will be obtained by reconstructing from one fabric image multiple times in different ways, and whether there are defects will be determined by whether there are common elements in the residual images. The effect of image reconstruction and the segmentation of fused residual images are the core of defect detection in this paper.
Reconstruction
To improve the effect of the defect segmentation, two factors need to be satisfied: on the one hand, the difference between the fabric texture of the generated area and the original area should be decreased, and on the other hand, the distance between the feature of the generated texture and the anomaly texture needs to be increased. Because the weave texture of the fabric is regular and periodic, when prior information of anomaly is absent, the local generation model can make effective predictions for the missing parts of the image by analyzing the information around the missing areas and learning of the global information of the image, which means the first factor is more likely to be achieved. In our method, the raw image is constructed separately by a pair of masks which are complementary to each other, and each mask is responsible for the reconstruction of a part of the raw image and the final image is combined by the reconstructed part in the reconstructed images, whose reconstruction scheme is illustrated in Figure 1. As shown in Figure 1, the generation of the masked area is achieved by the trained model, and the effect of the reconstruction is determined by effect of the generation. Thus, the generation model is the key to the reconstruction, and the choice of the basic network architecture of the model and the design of the loss functions should focus on improving the reconstruction effect of the image.

Reconstruction scheme in the proposed method.
As shown in Figure 2, compared with the traditional fabric image reconstruction method, the proposed method reconstructs the image by predicting the missing part, which avoids the mapping of the elements between the raw image and the reconstructed image, raises the effect of the reconstruction and enhances the performance of the defect in the residual image.

Comparison of the proposed method and traditional fabric reconstruction method. (a) Result of the proposed method and (b) result of the traditional method (based on Dictionary Learning).
Architecture of the generator network
Fabric images are composed of periodic weaves, of which the textures and between which the distribution and positional relationships determine the image feature. When reconstructing the fabric image, not only the image details described by the low-level information such as the texture of the fabric weave itself, but also the structure determined by the high-level information should be considered. The traditional image inpainting methods use the redundancy of the image itself that are mostly the low-level information to repair the missing parts, which cannot meet the needs of fabric image reconstruction.17,18 In contrast, some neural networks based on GAN can extract both low-level and high-level information from images effectively. 19 – 21 Therefore, this paper uses GAN to reconstruct fabric images.
The research of Geirhos et al. 22 raised the possibility that the classification model tends to utilize the low-level information in the image, especially the texture, as the classification basis of the image. If a classification model is used for image reconstruction, it means that only a low-level feature such as texture is extracted and used to distinguish real images from fake images in loss calculation, which might affect the quality of images generated by the model. However, the segmentation model based on perceptual loss can help extract the high-level information of the image. 22 Therefore, the image inpainting network utilized for fabric image reconstruction in our work needs segmentation models to raise the reconstruction effect.
Based on the reasons above, we considered using the existing image inpainting model LaMa 21 for local reconstruction of fabric images. Compared with other image inpainting models, LaMa implements the inpainting of large missing areas, complex geometric structures, and high-resolution images. LaMa is an image inpainting model based on GAN, but unlike other GAN models, its generator and discriminator are both composed of segmentation models, which helps high-level information extracting. As shown in Figure 3, the input image comes from the defect-free image and randomly generated mask; the generator is a ResNet-like structure, and its residual block adopts fast Fourier convolution (FFC); the architecture of the discriminator comes from the Pix2Pix network. 23 The original model is used for the reconstruction of natural images, and the generator adopts the structure of three down-sampling blocks, six to 18 FFC residual blocks, and three up-sampling blocks. Considering that the inputs of the fabric image are grayscale images and the texture and the structure of fabric are more uniform and regular than nature images, the LaMa model we utilized only adopts six residual blocks, and the input feature channel of the image is simplified to 256 for grayscale images to reduce the parameters in the model, which helps reduce the size of the generator and both training and detecting cost.

Scheme for large-mask inpainting utilized in the proposed method.
In particular, compared with other image inpainting models, the characteristics of the LaMa network are as follows:
Fast Fourier revolution (FFC)
As shown in Figure 3, The generator of the LaMa model utilizes the FFC blocks, 16 which is a ResNet-like residual network. In each FFC block, the input channels are divided into the local part and the global part, and then information in the local part and global part are exchanged and fused to ensure that the training parameters can effectively participate in the propagation of information, and the concatenated output of each block can possess a global receptive field.
Contributions of the FFC block are as follows: first of all, as the center of the repaired area cannot obtain enough prior information from the surrounding area when the reconstruction area is large, the traditional convolutional network may fail to reconstruct the image, while the LaMa network with FFC blocks can still guarantee the reconstruction effect under the same circumstances. Second, compared with the traditional convolution module, the FFC blocks are more sensitive to the periodic structure, which help improve the effect of fabric image reconstruction.
Loss functions
In the loss functions of the generator in LaMa, apart from the L1 loss, a learned perceptual image patch similarity (LPIPS) 24 proposed by Zhang et al. 24 is also utilized. Indicators such as Manhattan distance (L1), Euclidean distance (L2), peak signal to noise ratio, and structural similarity (SSIM) are usually utilized to evaluate the similarity between the generated image and the original image. However, in many cases, images that are highly evaluated by these methods felt blurred by humans, and sometimes the images generated under these losses have unacceptable color stains. The perceptual loss is the loss obtained by the feature-extracting between images and the distance-calculating between the features. Zhang et al. 24 trained a model to measure the perceptual similarity of images, which can be used to evaluate the loss. With this perceptual loss, the generator can quickly learn how to replicate the ‘real’ part to generate the ‘fake’ part. This loss is mainly used to ensure the global consistency of the image and to avoid blur.
The discriminator of the LaMa directly succeeds the patchGAN discriminator of Pix2Pix, the loss function utilized also follows the combination of L1 loss and feature matching loss. 25 This combination is mainly used for optimization of local details in local generation.
Reconstruction mask pairs designing
Apart from the training process, as the method proposed in this paper is an image reconstruction method through local generation, it takes at least two generations for one image to complete the reconstruction of the entire image. The input of the local reconstruction model in this paper is the original image and the mask, and the mask defined the area to be reconstructed. As shown in Figure 1, this paper uses a combination of one image and two masks to reconstruct the entire image: the size of the mask fits the image size, and the reconstruct areas defined by the two masks are complementary (except reserved areas). As shown in Figure 4, the design of the mask in pairs refers to the following rules:

Examples of mask pairs. (a) Mask pair 1 and (b) mask pair 2.
Reserved area: Considering that the local reconstruction of the image requires prior information as a reference, to improve the reconstruction effect, inner padding is reserved in masks and the related area in the image does not participate in the image reconstruction (the image size in this paper is 512 px × 512 px, and the reserved width is 16 px).
Multi-area combination: In order to provide enough information for local reconstruction, each mask is composed of multiple component areas, and each area has an interval between them.
Shape of component areas: Fabric defects are mostly caused by weaving problems of the warp or weft thread, which leads to anomalies distributed in the warp or weft direction, either in the form of a stripe or continuous point. Therefore, the shape of the mask should be designed as a rectangle with a large ratio between length and width, which helps cover the defects in the direction of either warp or weft as much as possible. Then, the possibility and impact of the defects being reconstructed can be reduced.
Segmentation
In this paper, a method based on multi-residual image information fusion is designed for the segmentation of fabric defects. The basic principles of the method are as follows: the residual image generated by the reconstructed image and the original image is composed of two parts; the anomaly should not exist and the noise is generated by the reconstruction error.
As the proposed method should not reconstruct the anomalies from the original image in the reconstructed image, the information on defects should be retained in the residual image with relatively higher signal strength compared to the noise. Considering the small proportion of defective regions in the image, the pixels in the image are binarized according to their gray values to extract the pixels with the top 5% gray values. In this way, most of the noises are filtered and the amount of the pixels with the value of one in the image is less than one in 20, which means if the pixels are randomly distributed in the image, the possibility of each pixel with the value of one is less than one in 20. Here, the noise generated by the error should be randomly distributed in the residual image which should fit the possibility of one in 20, while the defects are not. Therefore, when multiple residual images are fused, the information of the abnormal area will be enhanced, while the noise will be weakened due to its randomness. That is the reason why we thresh and binarize the residual image before fusing.
As shown in Figure 5, the scheme of the segmentation is as follow:

Segmentation scheme in the proposed method.
First of all, for one image to be reconstructed, few different mask pairs are selected. Then, different reconstructed images and different residual images can be generated through the same local reconstruction model by these mask pairs.
Second, as the local generation method in this paper can effectively retain abnormal information in residual images, the signal intensity of defect is higher than that of noise. Thus, threshold segmentation is performed on each generated residual image according to the top 5% gray value of pixels in image, which helps preliminarily remove noise while preserving abnormal information as much as possible.
Finally, the binarized residual images are fused to obtain a residual image fusion. Then we thresholded the fused image according to the number of residual images by the threshold t. The value of t is determined experimentally. After that, combined with the median filtering operation, noise can be eliminated, and defects are segmented.
Results and discussion
Dataset
The dataset used in this paper come from the Smart Diagnosis of Cloth Flaw Dataset,
26
which is mainly denim fabrics. The size of the original data image is RGB images of 2446 px × 1000 px and the resolution is approximately 130 pixels per inch. The dataset images have to be cut into 512 px
Reconstruction performance discussion
Evaluation matrix
In this experiment, the reconstruction effect of the algorithm is evaluated by calculating the similarity between the reconstructed image and the original image. The main evaluation metrics are SSIM and Fréchet inception distance (FID). Between them, SSIM evaluates the structural similarity between the reconstructed image and the original image by comprehensively estimating brightness (mean value of image pixels), contrast (standard deviation of image pixels), and structural similarity (covariance of image pixels). The higher the score, the better the reconstruction effect. FID judges the similarity between images through the image features extracted by the InceptionV3 network. 27 If each feature is regarded as a variable that obeys the Gaussian distribution, the image with variety of features can be represented by a multivariate Gaussian distribution. Different features extracted from different images will have different multivariate Gaussian distributions, and if two images are the same, their multivariate Gaussian distributions should be exactly the same. Therefore, by calculating the distance between the two multivariate Gaussian distributions, the similarity between the images can be judged. 28 FID describes the similarity between the two distributions through the Fréchet distance, and the lower the score, the better the reconstruction effect. Compared with SSIM, FID depends more on the kinds of features existing in the image to determine the similarity, but cannot describe the spatial relationship between features. By synthesizing these two evaluation matrixes, a more comprehensive evaluation of the generation effect of the reconstructed image can be carried out.
Ablation experiment
In this paper, an ablation experiment is designed to verify the necessity of the FFC module and perceptual loss used in the LaMa model to improve the reconstruction effect for fabric image local reconstruction. For generative adversarial networks, it is difficult to judge the training of the network by the trend of loss directly. Therefore, in this paper, the training of the network is evaluated by the reconstruction effect of the generator on the images with the evaluation matrixes mentioned above. Throughout the training process, the performances of the trained models on the validation set are shown in Figure 6.

Evaluation of the reconstructed image of different models under different training steps. (a) Result of structural similarity (SSIM) and (b) result of the Fréchet inception distance (FID.
It can be seen from Figure 6 that the model with both the FFC module and the perceptual loss has a relatively better image reconstruction level during the entire training process, and also enables the training of the model to have faster and more stable convergence. As shown in Figure 6, when trained on the same dataset, the training of the LaMa model had basically stabilized and converged in the 10th epoch, while the training of the other two control models did not stabilize until the 20th epoch.
At the same time, it can be seen from Figure 6(a) that according to the judgment of SSIM, the FFC module plays a very important role in the regeneration of the overall structure of the reconstructed area, especially when the area to be reconstructed is large. As shown in Figure 6, it is difficult for the model without the FFC module to predict the structure of the missing area precisely. However, in Figure 6(b) the image generated by the reconstructed model without the FFC module can still obtain a higher score under the FID standard, which also proves that the FID standard is not sensitive to the structure of the reconstructed image.
Under the standard of SSIM, the images reconstructed by the model without perceptual loss can obtain scores close to the reconstructed images of the LaMa model (Figure 6(a)), while they are different when judged by the FID standard (Figure 6(b)).
As shown in Figure 7, comparing the images reconstructed by the two models, it is obvious that the reconstruction effect of the model in the proposed method is better. On the one hand, this shows that the SSIM standard is not as good as FID and LPIPS in judging low-level features of images; on the other hand, it also shows that perceptual loss combined with the FFC module can evaluate both high-level and low-level image features effectively and help generate a high-resolution image.

Example for images reconstructed by different trained models. (a) Raw image; (b) mask; (c) model of proposed method; (d) model without learned perceptual image patch similarity (LPIPS) and (e) model without fast Fourier convolution (FFC).
Segmentation performance discussion
Evaluation matrix
By calculating the overlap between the detected defect area and the ground truth, the effect of the defect segmentation can be judged. 29 For a defect image, whether each pixel in the image is a defect or a background can be regarded as a binary classification problem, and the image is the dataset to be predicted. The binary image obtained can be regarded as a prediction of the category of each pixel in the original image, and the values from the ground truth represent the actual classification of each pixel in the image. Thus, the confusion matrix for the segmentation result of image can be definite as follows (Table 1).
Definition of the confusion matrix in the proposed method
According to the confusion matrix, the precision and recall for the segmentation result can be calculated:
Objectively, the
As shown in Figure 8, the defect has been segmented accurately in this image, but the recall returns a poor result (0.0863). Compared to recall, the lower false detection rate and higher accuracy are more important for defect detection. Therefore, we utilize the

Difference of the mask and segmentation result. (a) Raw image; (b) mask and (c) segmentation result.
One more thing, we calculate the detection rate to evaluate the sensitivity to defects of the detection method:
in which
Parameter optimization
In the defect segmentation method of multi-residual image fusion proposed in this paper, two parameters should be discussed: one is the number of residual images used for fusion, and the other is the threshold used for defect segmentation after fusion. As there is no literature available to reference on this, we optimize these two parameters through experiments. Here, the scheme of nine residual images is taken as an example.
As shown in Table 2, as the fused image is obtained by superimposing with nine binary residual images, the possible values of pixels in the fused image are integer values from zero to nine. Therefore, the practical selection of the segmentation threshold should be an integer from one to eight, and the fused image can only obtain eight different segmentation results from these thresholds. As mentioned above, the abnormal pixels in the residual image can be divided into two categories, the defect and the reconstruction error (noise). The position of the defect in the residual image should be roughly fixed, and the error should be random, which means the pixels of the defect part must be accumulated several times, while the accumulation of the error is relatively random. By increasing the threshold, the defective pixels with relatively larger values can be extracted.
Reconstruction result of the scheme of nine fusing residual images
In Table 2, with the increase of the threshold, the accuracy of segmentation increases and the recall decreases. The weighted balance of the precision and the recall obtains the maximum value when the threshold is six, and the corresponding accuracy and defect detection rate can be close to 90%. The fabric detection rate increased first and then decreased; this is caused by the low precision when the threshold is small and the excessive constraints when the threshold is large. Similarly, after experiments, when the number of residual images is
The mask pairs utilized for reconstruction in each scheme in our experiment are shown in Table 3. And the F0.5 score and the ideal inspection speed for the scheme with different numbers of residual images from different mask pairs and different thresholds are shown in Table 4.
Mask pairs selection under different reconstruction schemes in the experiment
F0.5 score and time cost for different schemes of the proposed method
We made a scale table for the F0.5 score at different thresholds of schemes with different fusing numbers to observe the distribution of optimal thresholds. The darker the color in the table, the higher the F0.5 value, and the higher the comprehensive performance of the corresponding scheme.
As shown in Table 4, under different circumstances, the interval of the threshold that can obtain the optimal solution is roughly constant in the selectable threshold range. It can be seen that the optimal threshold t should be between one half and two-thirds of the number of residual images. Considering only the integer threshold is practical in the proposed method, we choose the integer part of the two-thirds of the count of residual images as the threshold.
Based on the above segmentation threshold selection, the detection effects of the proposed method under different fusing image counts are shown in Figure 9.

F0.5 score for proposed method under different fusing image numbers.
The increase in the number of residual images taking part in fusion helps to expand the difference between the defect and the noise and improve the segmentation. The F0.5 score increases rapidly until the number of images reaches five, and then the curve flattens out. Considering the computing cost of image reconstruction should be related to the number of fused images, we utilize five residual images for fusing and defect segmentation in this paper.
Meanwhile, to demonstrate the effectiveness of the proposed method on defect-free samples, the scheme based on five residual images was simply tested on different kinds of defect-free fabrics, and some of the segmentation results are shown in Figure 10. The results prove that the proposed method is only sensitive to the abnormalities in the fabric, and the detection on the defect-free fabric can also get the correct detection results.

Segmentation results on different defect-free denim samples.
Comparison with other methods
In this part, fabric image reconstruction methods based on Fourier, 8 PCA and Dictionary Learning9,15 have been compared with the proposed method. Experiments showed that the method based on Auto Encoder cannot reconstruct fabric images in our dataset effectively, and did not participate in the comparison. The reconstruction results of the fabric image and the related residuals are shown in Figure 11.

Reconstruction results and related residual images of different methods. (a) The proposed method; (b) Fourier-based method; (c) principle component analysis (PCA)-based method and (d) Dictionary Learning-based method.
As shown in Figure 11, when fabric images are reconstructed based on methods of Fourier, PCA or Dictionary Learning, the anomalies will be partly reconstructed, and the signal of defects will be weakened on the residual images. In comparison, the local reconstruction method proposed in this paper can keep more defect information.
The Fourier-based reconstruction method comes from the image noise reduction method. As the fabric texture is a kind of low-frequency information, by converting the image from the spatial domain to the frequency domain and performing low-pass filtering, noise reduction can be achieved. However, the fabric defects cannot be removed as effectively as noise through this method, and the defects and noise are difficult to distinguish after segmentation.
Methods such as Auto Encoder, PCA, and Dictionary Learning are essentially dimensionality reduction for image data. They realize the reconstruction of the image through matrix transformation or linear dimension reduction, which means a mapping relationship exists between the generated image and the original image, and the abnormal area will participate in the reconstruction process and affect the reconstruction result. This relationship makes it difficult to obtain a balance between noise reduction and defect signal enhancement in residual images, and affects the segmentation of defects.
In contrast, the local image reconstruction method proposed in this paper reconstructs the entire image in an indirect way, and avoids the mapping relationship between the images before and after reconstruction. Different from the reconstruction method above, this method realizes the prediction of the texture without defects in the target area.
On the one hand, this method avoids the influence of anomalies on reconstruction images, which helps to enhance the signal of abnormal areas in residual images. On the other hand, the method proposed in this paper can use the same training model to predict the same area multiple times and obtain multiple reconstruction results, which cannot be achieved by some traditional reconstruction methods. Combined with the fusion method proposed in this paper, the noise that is difficult to eliminate in a single residual image can be effectively distinguished, and the defect area can be segmented more accurately. Visual examples of the segmentation results are shown in Figure 12.

Visual examples of the segmentation results. (a) Raw Image; (b) Fourier-based method; (c) principle component analysis (PCA)-based method; (d) Dictionary Learning-based method; (e) the proposed method and (f) ground truth.
As shown in Figure 12, the method proposed in this paper effectively reduces the amount of noise in the segmented image and reduces the false detection rate while accurately segmenting the defects. The specific performance of different methods is shown in Table 5.
Comparison of the segmentation results
Shown in Table 5 are the average detection results of images on the test set. As shown in the table, although the methods based on Fourier, PCA and Dictionary Learning are all robust enough to detect defects on our denim fabric dataset, they all have precision problems. Compared with other fabrics, the style of denim fabrics is rougher and more complicated, and as mentioned above, the traditional image reconstruction methods are essentially the dimensionality reduction of the image or are based on noise reduction. Either way of reconstruction above will lose lots of high-frequency information of the raw image, and the residual image obtained will not only contain defect information but also noise generated from the details of the image. It is difficult for the traditional method to eliminate noise effectively by segmenting only one residual image, which affects the precision of defect segmentation.
The method proposed in this paper is essentially the prediction of the possible representation of the raw image through the information of the image itself, and distinguishes the defects and noises in the residual image through a variety of possible prediction results, so as to achieve the accurate division of defects and noises in the residual image. At the same time, compared with the traditional method, the indirect reconstruction method proposed in this paper ensures the intensity of the defect signal in the residual image, and more defect information can be reserved for segmentation. Thus, the precision and defect detection rate can be guaranteed for the proposed segmentation method. As shown in Table 5, the precision of the proposed method is basically above 80%, which is much higher than other methods, and at the same time, each of the proposed methods can provide a high defect detection rate with less time cost. The proposed method can also provide relatively better segmentation results when the efficiency is required. Meanwhile, the proposed method can achieve a very high detection rate and accuracy if it is necessary with more masks and more reconstruction results, but the inspection speed is limited by the count of the images that need to be reconstructed (the larger the number is, the more the batches need to be processed).
For the inspection speed, we assume that the width of the fabric on the inspection machine is approximately 200 cm. As the size of the input image of the proposed method is 10 cm
In summary, the method proposed in this paper can effectively improve the accuracy rate with a trustworthy recall rate and defect detection rate, and the defect and the background can be segmented effectively and reduce false detection under the proposed method.
Conclusions
This study proposes an unsupervised defect segmentation approach based on image local reconstruction, which combines defect segmentation and image reconstruction. The anomaly signal is strengthened and defect segmentation is made possible by combining image processing. Multiple residual images are created by repeatedly reconstructing the same image using different masks.
E experiments demonstrate that the method suggested in this research can effectively reduce the risk of mis-segmentation and achieve high precision defect segmentation when compared with other unsupervised methods (the inspection speed is approximately 60 m per min). The segmentation accuracy of the detection method in this paper can reach the precision of 85% on our dataset when the segmentation effect is guaranteed (recall rate 0.200) and the defect detection rate is achieved (89%), which is significantly higher than other image-reconstruction-based defect segmentation methods. This indicates that the approach we suggested can benefit in precisely locating the defect’s location. In the meantime, effective defect segmentation aids in automating the assessment of fabric quality when taking into account the standard criteria for fabric defect evaluation based on the magnitude of the defects (e.g, four-point system and 10-point system).
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
