Dual-path segmentation network for automatic fabric defect detection

Abstract

Fabric defect detection plays a crucial role in the production process of the textile industry. Vision-based inspection methods have emerged as an inevitable trend due to their lower labor costs and high detection efficiency. As the accuracy requirements for fabric defect detection, methods must not only identify and locate defects accurately but also describe the morphological features of the defects. This poses a challenge for the algorithm’s design, as it must consider both the semantic and texture information of the fabric. In this paper, we propose an end-to-end dual-path segmentation network called DPNet for fabric defect detection, which can extract and fuse both semantic and texture information to achieve high accuracy. The proposed framework consists of two paths: the semantic path, which has narrow but deep layers to obtain high-dimensional features, and the texture path, which has wide and dense layers to extract low-level details. To enhance the interaction between semantic and texture features, a crossed attention fusion module has been developed. Evaluations show that the proposed method outperforms other methods on different datasets in terms of mIoU, with results of 75.84% for Ngan and 70.69% for AITEX. In addition, we developed an inspection platform and tested the proposed method online. We found that it can achieve online detection at a speed of 40 m/min, making it well-suited for practical production environments.

Keywords

Fabric defect detection image segmentation dual-path deep learning

Fabric defect detection is a crucial task in the textile industry’s production process, as the quality of the products directly impacts the economic efficiency and reputation of enterprises in the industry.¹ Manual visual inspection, being prone to inaccuracies, time consumption, and high costs² necessitates the exploration of alternative approaches to address these challenges.^3,4 Among the commonly employed methods, machine-vision-based inspection is one of the most widely adopted approaches.⁵

In the early stages of defect detection, hand-crafted feature extraction techniques were utilized. These techniques include statistical analysis,^6–8 frequency domain analysis.^9,10 model-based methods,^11–13 and more. However, with advancements in the weaving process, the texture of different fabrics varies significantly, and the morphology of the defects becomes more complex. This increases the amount of semantic information in the fabric, which poses a limitation on the detection accuracy of these methods and also reduces the generalization performance, making their widespread application challenging.

In recent years, propelled by the rapid advancement of computer technology, deep learning models have been widely used in image classification,^14–16 object detection,^17–19 and image segmentation.^20–24 These models, such as convolutional neural networks (CNNs), have proved to be particularly effective in automatically extracting suitable features according to the task requirements. By stacking a large number of convolutional kernels, down sampling layers and fully connected layers, deep networks can extract abstract semantic information to describe and distinguish different fabric textures and defects. However, despite enhancing the model’s generalization performance, deep learning models often overlook fabric texture information, which leads to a rough morphological description of the defect. In addition, the accuracy of deep learning models is mainly limited to the image or region level, making it challenging to obtain precise defect shapes and edge details.

The fabric defect detection task necessitates precise identification and localization of defects, as well as a detailed description of their morphological features for subsequent operations such as defect grading. Unlike typical detection tasks such as autonomous driving, which require a high level of semantic abstraction regarding the identified object, fabric surfaces often have similar or periodically changing patterns that are rich in texture information. Meanwhile, the fabric also carries a certain amount of semantic information due to the varying complexity of its texture and morphology of defects. Therefore, there is a need for a model that can consider both the texture and semantic information of the fabric.

To this end, we propose an end-to-end defect detection model with two distinct paths that can effectively combine texture and semantic features. The proposed method not only identifies the defect locations but also describes quantitative features, such as defect size and area, at the pixel level. The model architecture comprises two paths: the texture path, which employs dense connections to reuse the underlying features more frequently, increasing the weight of texture information in the decision; and the semantic path, which uses residual modules quickly to deepen the number of network layers and extract more semantic information. Moreover, we introduce a crossed attention fusion module to merge the two types of features effectively. To validate the effectiveness of our proposed method, we conducted several experiments on three different datasets separately.

The main contributions are as follows:

We propose a novel dual-path segmentation model for fabric detection that can extract both texture and semantic information from fabric for efficient detection of defects.

To enable the interaction of various feature types and enhance the segmentation results, we introduce a crossed attention fusion module that can merge features effectively.

We develop an online inspection platform and test our model in a real-world production setting. Our experimental results demonstrate that our method can achieve impressive results on different datasets and detect defects at a speed of 40 m/min, making it suitable for practical production environments.

Related works

According to the output form of results, fabric defect detection using deep learning can be classified into three approaches: classification, detection, and segmentation. Classification approaches have image-wise accuracy, while detection approaches have patch-wise accuracy. Segmentation approaches, on the other hand, offer pixel-wise accuracy.

Classification approaches regard the image as a whole and identify defects based on high-dimensional features extracted at the top-most part of the model. Jun et al.²⁵ proposed a two-stage method, consisting of local defect prediction in the first stage and global defect recognition in the second stage. Li et al.²⁶ proposed a structure comprising several micro-architectures that can help reduce the number of model parameters, whereas Liu et al.²⁷ used a modified VGG16 model to identify defects in fabric with complex texture. However, while classification approaches are effective at determining the presence of defects in an image, their accuracy is limited to the image level, which cannot meet the requirements of defect location in detection tasks. To address this limitation, some researchers have used the method of dividing images into small patches and inputting them into the model separately to achieve rough localizations of defects.^28–30 However, this introduces a large number of repetitive operations and increases the time cost.

Detection approaches utilize bounding boxes to indicate the defect locations in fabric images.^31–33 To improve accuracy and time efficiency, Zhao et al.³⁴ proposed a method based on a multi-scale convolutional neural network, while Peng et al ³⁵ introduced prior anchor boxes and a feature pyramid structure in their model. Detection approaches are constrained by artificially set candidate frames, which limits their ability to describe the scale of defects accurately. Large defects may be identified as collections of small defects, while small defects may be disregarded or grouped with other nearby defects. Consequently, these approaches cannot provide a more detailed description of the defect boundaries, making it difficult to extract quantitative features such as measurements of defect size and area.

Segmentation approaches, as an end-to-end model, are capable of producing images that match the input image’s size, which allows for pixel-wise accuracy. This is more in line with human intuition and is more convenient for the subsequent extraction of quantitative features.^36,37 Jing et al.³⁸ have reduced the complexity cost and model size of the network by introducing depth-wise separable convolution into the UNet model. Tao et al.³⁹ achieved defect segmentation using a cascaded self-encoder structure. Furthermore, segmentation approaches can handle input images of any size because they discard the final fully connected layers and introduce a fully convolutional network structure. When considering fabric defect detection as an image segmentation task, it is important to consider both texture and semantic information to obtain accurate segmentation results.

Proposed method

In this section, we present the dual-path segmentation network, as illustrated in Figure 1. The network extracts distinct features through the texture path and semantic path, and then merges them at the end (see Crossed attention fusion module) to obtain a more comprehensive feature representation. We will provide a detailed explanation of each module’s primary functions and design principles below.

Figure 1.

Pipeline of the proposed dual-path segmentation network. The architecture consists of two main components: the backbone network in the green dashed box, the fusion module in the red dashed box. The backbone network consists of the texture path (orange) and semantic path (blue). Each stage of the semantic or texture path is represented by $S_{1}$ ∼ $S_{5}$ . The number indicates the ratio of this feature map resolution relative to the input image. The crossed attention fusion module is used to merge two types of features, where ⊗ denotes the element-wise product.

Texture path

Fabric inherently contains a plethora of textural information, primarily extracted by the network’s shallow layers. To encode richer textural features, the texture path has been devised with dense interconnections and ample width. The corresponding parameters have been tabulated in Table 1.

Table 1.

Details of texture path

Stage	Operation	Size/stride	Output size
S ₁	Conv + BN + ReLU	$3 × 3/2$	$128×128×64$
S ₁	Conv + BN + ReLU	$3 × 3/1$	$128×128×64$
S ₂	Conv + BN + ReLU	$3 × 3/2$	$64×64×64$
	DB	$(\begin{matrix} 1 × 1/1 \\ 3 × 3/1 \end{matrix}) ×6$	$64×64×256$
S ₃	Conv + BN + ReLU	$1 × 1/1$	$64×64×128$
	AP	$3 × 3/2$	$32×32×128$
S ₄	DB	$(\begin{matrix} 1 × 1/1 \\ 3 × 3/1 \end{matrix}) ×12$	$32×32×512$

Conv + BN + ReLU denotes convolutional operation followed by a batch normalization and a ReLU activation function.

Same convolution is introduced, in which the size of the feature map remains intact.

DB denotes the dense module.

AP is the average pooling layer.

$S_{1}$ ∼ $S_{4}$ corresponds to the stages in Figure 1.

The texture path is composed of four stages, and dense modules⁴⁰ are introduced in S₂ and S₄. The output feature map $y_{l}$ of layer $l$ can be derived by the following equation:

y_{l} = H ([y_{0}, y_{1}, \dots y_{l - 1}])

(1)

where

H_{l}

denotes the convolutional operation,

[y_{0}, y_{1}, \dots y_{l - 1}]

represents concatenated feature maps generated from layer 0 to layer

l - 1

With an increase in the number of layers in dense modules, the shallow features are reused more frequently. For instance, for a dense module of depth $l$ , the feature maps in layer 0 (bottom) will be reused $l - 1$ times, the feature maps in layer 1 will be reused $l - 2$ times, and so on. The model can capture more and richer textural information by reusing feature maps in the shallow layers, thereby amplifying the weight of textural information in the final decision. To diminish the channel dimension and resolution of feature maps, a $1×1$ convolution and an average pooling layer are appended between two dense modules. The texture path is only down sampled three times to account for the loss of spatial textural information due to resolution reduction. At the end, the resolution is decreased to 1/8 of the initial image.

Semantic path

To achieve more precise distinction between defects and the background, the semantic path adopts a network structure with a higher number of layers. This design enables the model to expand its receptive field rapidly and capture more contextual information. The specific parameters are shown in Table 2. Some key modules in the semantic path are as follows:

Table 2.

Details of semantic path

Stage	Operation	Size/stride	Output size
S ₁	RI	$3×3/2$	$128×128×16$
S ₁		$3×3/2$	$64×64×32$
S ₂		$3×3/1$	$64×64×32$
S ₂	RR	$(3×3/1) ×2$	$64×64×32$
S ₃	RR	$3×3/2$	$32×32×64$
S ₃	RR	$3×3/1$	$32×32×64$
S ₄	RR	$3×3/2$	$16×16×128$
S ₄	RR	$3×3/1$	$16×16×128$
S ₅	RR	$(3×3/1) ×10$	$16×16×256$

RI represents the refined inception module.

RR represents the refined residual module.

$S_{1}$ ∼ $S_{5}$ corresponds to the stages in Figure 1.

Refined inception module

Inspired by GoogleNet,⁴¹ we adopt parallel pooling and convolutional operations to down sample and concatenate feature maps at the end. This approach does not impose excessive computational demands while mitigating the problem of information loss due to the decrease in resolution. The structure is illustrated in Figure 2.

Figure 2.

Details of refined inception module. Conv refers to the convolutional operation followed by a batch normalization and a ReLU activation function. MPooling is the max pooling layer. $1×1$ , $3×3$ means the kernel size. $H \times W \times C$ denotes the size of the feature map (height, width, depth); © denotes concatenation.

Refined residual module

Given the limited number of channels in the feature maps of the semantic path, we draw inspiration from MobileNetv2⁴² and design a refined residual module, as shown in Figure 3. First, we increase the input channels by a factor of 6, followed by the extraction of features by depth-wise separable convolution. We then append the squeeze and excitation (SE)⁴³ attention module to enhance the expression capability of the residual module for the features, as shown in Figure 3(c). The output y of SE is as follows:

y = x ⨂ σ (H_{1} (ReLU (H_{2} (H_{g a} (x)))))

(2)

where

H_{1}

and

H_{2}

refer to

1×1

convolutions, and

H_{g a}

is the global average pooling,

σ

is the sigmoid function,

⨂

is the element-wise product. The residual structure preserves the gradient better using identity mapping.

Figure 3.

Details of refined residual module; (a) is used when the resolution of the output feature map is reduced; (b) is used when the input and output feature map resolutions are the same; (c) details of squeeze and excitation (SE) in (a) and (b). DWConv refers to the depth-wise convolution, and Conv refers to convolutional operation, which is followed by a batch normalization in (a) and (b). MPooling is the max pooling layer. GAPooling is the global average pooling layer. Sigmoid means the sigmoid activation function, and ReLU means the ReLU activation function. $1×1$ , $3×3$ means the kernel size. $H \times W \times C$ denotes the size of the feature map (height, width, depth).

Crossed attention fusion module

The texture path and semantic path represent complementary features. Combining the two through simple methods, such as pixel-wise summation or channel-wise concatenation, does not effectively enhance segmentation performance because it ignores the differences between the two information types.

Based on the analysis presented above, we propose a crossed attention fusion module to merge the complementary information from both paths, as shown in Figure 4. The channel attention block (CAB) guides the texture path from the channel dimension to distinguish the defects from the background better, while the spatial attention block (SAB) guides the semantic path from the spatial dimension to make a more detailed segmentation of the defect boundary. SAB and CAB are calculated as follows:

SAB (x) = σ (H ({[H}_{c a} (x), H_{c m} (x)]))

(3)

CAB (x) = σ (H_{1} (H_{0} (H_{g a} (x))) + H_{1} (H_{0} (H_{g m} (x))))

(4)

where

σ

denotes the sigmoid function. For SAB,

H_{c a}

and

H_{cm}

represent the channel average pooling and the channel max pooling.

H

is a

3×3

convolution. For CAB,

H_{1}

and

H_{0}

are

1×1

convolutions with shared weights for both inputs.

H_{g a}

and

H_{gm}

are the global average pooling and the global max pooling, respectively.

Figure 4.

Details of crossed attention fusion module; (a) shows the overview of the crossed attention fusion module; (b) and (c) are the details of the spatial attention block (SAB) and channel attention block (CAB) structure. DWConv refers to the depth-wise convolution, and Conv refers to the convolutional operation, which is followed by a batch normalization in (a) and (b). APooling is the average pooling layer; GAPooling and GMPooling are the global average pooling layer and the global max pooling layer, respectively. CAPooling and CMPooling are the channel average pooling layer and the channel max pooling layer, respectively. Sigmoid means the Sigmoid activation function, and ReLU means the ReLU activation function; $1×1$ , $3×3$ means the kernel size; $H \times W \times C$ denotes the size of the feature map (height, width, depth); ⊗ denotes the element-wise product; ⊕ denotes pixel-wise summation, and © denotes concatenation.

Compared with simple combinations, this type of guided fusion allows for effective interaction between the two types of features and reduces the gap between them.

Segmentation head

The segmentation head integrates the characteristics and performs upsampling to match the input image dimensions using $3×3$ and $1×1$ convolutions, which is illustrated in Figure 5.

Figure 5.

Details of segmentation head Conv refers to the convolutional operation. BN is the batch normalization layer; ReLU means the ReLU activation function, and Upsample means bilinear interpolation. $1×1$ , $3×3$ means the kernel size; $H \times W \times C$ denotes the size of the feature map (height, width, depth).

Cross-entropy is employed as the loss function in this study, as displayed in equation (5):

L = - \frac{1}{N} \sum_{i}^{N} (y_{i} \log (p_{i}) + (1 - y_{i}) \log (1 - p_{i}))

(5)

where

N

denotes the batch size,

p

is the detection output of the model, and

y

represents the ground truth.

Experiments

In this section, we present the results of a series of experiments that evaluate the performance of our proposed method. First, we introduce the datasets and evaluation metrics that were utilized in the experiments and describe the implementation details. Next, we verify the effectiveness of each designed module of the proposed DPNet on the Ngan dataset. Finally, we compare the performance of our proposed DPNet with other deep learning algorithms, namely SegNet,²⁰ FCN,²¹ PSPNet⁴⁴ and Deeplabv3+.⁴⁵ across different datasets.

Datasets and metrics

Datasets

We evaluated the performance of the proposed method on two different datasets, namely Ngan and AITEX.

Ngan⁴⁶ is provided by the Industrial Automation Research Laboratory, Department of Electrical and Electronic Engineering, the University of Hong Kong. The dataset consists of three different fabric structures of dot pattern, box pattern, and star pattern, each of which contains 26 images with five defect categories and 30 defect-free images. All images are in RGB and have a resolution of $256×256$ . Sixty-five images are selected as the training set and 16 images as the validation set.

AITEX⁴⁷ contains a total of 245 grayscale images of $256×4096$ pixels captured from seven different fabric structures. The dataset includes 12 types of common fabric defects and 140 defect-free images. In order to address the problem of positive and negative sample imbalance, only defect samples are chosen to train the model.

Metrics

We apply the mean intersection of the union (mIoU) and the intersection of the union on the defect region ( ${IoU}_{def}$ ) as the evaluation metric, as shown in. Figure 6. Both are calculated as shown in equations (6), (7) and (8):

{IoU}_{def} = \frac{TP}{TP + FN + FP}

(6)

{IoU}_{def_free} = \frac{TN}{TN + FN + FP}

(7)

mIoU = \frac{1}{2} ({IoU}_{def} + {IoU}_{def_free})

(8)

where

{IoU}_{def_free}

is the intersection of the union on the defect-free region.

Figure 6.

Schematic diagram of mIoU calculating on fabric; (a) marks the region of ground truth and detection result on the fabric and (b) reflects the relationship between ground truth and detection result, where TP is true positive, TN is true negative, FP is false positive, and FN is false negative.

Implementation details

Training

The parameters were initialized randomly from a Gaussian distribution. To optimize the parameters, we used the mini-batch stochastic gradient descent (SGD) with a batch size of 12. The momentum was set to 0.9 and the weight decay was $1× 10^{-4}$ . We applied the ‘poly’ learning rate adjustment strategy, in which the learning rate is updated by multiplying it by ${(1 - \frac{epoch}{{epoch}_{\max}})}^{power}$ each epoch. The initial learning rate was set to 0.01, and power was set to 0.9. We trained our networks for 700 epochs or until convergence, depending on the dataset.

For data augmentation, we randomly cropped the input images to $256 × 256$ pixels and applied random horizontal flipping, random vertical flipping, random $180 °$ rotation and random brightness. As the relative position of the camera and the fabric does not change, we did not apply random scaling during the experiments.

Inference

To consider both accuracy and real-time in practical production environments, we did not apply inference tricks such as multiscale testing or multiscale cropping, which may improve accuracy but are time consuming. We set the batch size as 2. For offline testing, the input images were cropped into $256 × 256$ pixels in advance, and for online testing, the resolution was not changed.

Set-up

All experiments were built on the Pytorch1.8.1 deep-learning framework, and ran on Nvidia GeForce GTX 1660 devices.

Ablative studies

In this subsection, several ablative experiments are conducted on the Ngan dataset to assess different modules.

Ablation studies on individual path

First, ablation studies were carried out on an individual path to explore the segmentation effect of texture path and semantic path. As evidenced by the first, fourth and fifth rows of Table 3, employing only simple fusion improves results by almost 3% compared with semantic or texture paths alone, suggesting that texture and semantic information are mutually complementary and essential in the defect detection task.

Table 3.

Ablation studies on Ngan

Path		Basic module			Fusion			Result
Texture	Semantic	Plain	Residual	Dense	Sum	Concate	Ours	mIoU (%)	${IoU}_{def}$ (%)
	✓							72.35	48.63
✓		✓						68.50	42.85
✓			✓					70.98	46.43
✓				✓				71.57	48.16
✓	✓			✓	✓			74.47	51.20
✓	✓			✓		✓		74.66	50.41
✓	✓			✓			✓	75.84	52.57

Ablation studies on basic module

The second, third and fourth rows of Table 3 demonstrate the impact of different basic modules in the texture path. We compared the dense module with the other two common convolution methods, the residual module and plain module, and found that the mIoU and ${IoU}_{def}$ of the dense module reached 71.57% and 48.16%, respectively. Both results are higher than the residual or plain module, suggesting that reusing of shallow features can help achieve better segmentation results.

Ablation studies on feature fusion

We also analyzed the segmentation results using different fusion methods shown in Table 3. The proposed crossed attention fusion module achieved an mIoU of 75.84% and an ${IoU}_{def}$ of 52.57%, yielding a 2% improvement over pixel-wise summation or channel-wise concatenation. As a result, our method enables the interaction of different feature types and effectively improves segmentation outcomes.

Accuracy analysis

To avoid sample imbalance, cropping was exclusively applied to the defect region, as defects are only observed in a portion of the image. The visualization of the detection results on Ngan and AITEX are shown in Figure 7 and Figure 8. More detailed metrics are shown in Table 4 and Table 5. It is apparent that fabric images possess a significant amount of texture information, thus models such as PSPNet that prioritize the semantic information of the images do not yield as accurate segmentation results as models such as FCN, which employ skip connections to supplement the texture information. While our proposed method still outperforms other methods in segmenting the boundary details of the defects, achieving mIou scores of up to 75.84% and 70.69%, respectively, it is noteworthy that skip connections represent a viable approach to supplementing texture information and will be explored further in the future.

Figure 7.

Detection results of different methods on Ngan. White denotes defect region, while black denotes the defect-free region. (a) Original image. (b) DeepLabv3+. (c) FCN. (d) PSPNet. (e) SegNet. (f) DPNet and (g) Ground truth.

Figure 8.

Detection results of different methods on AITEX. White denotes defect region, while black denotes defect-free region. (a) Original image. (b) DeepLabv3+. (c) FCN. (d) PSPNet. (e) SegNet. (f) DPNet and (g) Ground truth.

Table 4.

Evaluation metrics of different methods on Ngan

Dataset	Method	mIoU (%)	${IoU}_{def}$ (%)
Ngan	PSPNet	71.69	44.33
	SegNet	73.38	47.54
	FCN	74.31	51.54
	DeepLabv3+	72.87	46.85
	DPNet	75.84	52.57

Table 5.

Evaluation metrics of different methods on AITEX

Dataset	Method	mIoU (%)	${IoU}_{def}$ (%)
AITEX	PSPNet	66.62	34.75
	SegNet	67.06	35.61
	FCN	69.57	40.61
	DeepLabv3+	67.66	37.03
	DPNet	70.69	42.79

Discussion

Visualization of feature map

In this subsection, we aim to gain a deeper understanding of how different paths respond to various fabric features by visualizing the feature maps of the texture and semantic path outputs,⁴⁸ as shown in Figure 9. In particular, in Figure 9(b), we observe that the texture path is more sensitive to the texture pattern of the fabric and the edge details of the defects, whereas in Figure 9(c) the semantic path excels at identifying and locating the defect regions. By combining the features extracted from both paths using the proposed fusion module, the resulting class activation map in Figure 9(d) demonstrates that our method effectively integrates texture and semantic information to improve the overall segmentation results.

Figure 9.

Grad class activation map (CAM) of DPNet model on the AITEX dataset. (a) Original image; (b) CAM on the texture path; (c) CAM on the semantic path and (d) CAM after fusion.

Online detection

In this subsection, we present the online testing results of our model. For image acquisition, we utilized two CMOS line-scan cameras specifically selected for their exceptional performance within our application. With their combined field of view spanning around 1 m, precisely suited for fabric widths of around 0.7 m, it ensured comprehensive coverage of the fabric surface. The final size of the captured images is 2048 × 256. Tunnel lighting was employed as the illumination. For uniform and minimally distorted image capture, an encoder was employed to trigger the camera capture based on the actual motion state of the fabric. As equipment shaking may affect image quality, we maintained independent installations for the camera and light source of the detection system, separate from the detection platform. The device configuration details are shown in Table 6.

Table 6.

Device configuration details

Device	Configuration
RAM	32 G
GPU	GeForce RTX 3070
CPU	i7-8700 K

We created our own dataset, called warp knitted fabric (WKF), which consists of several rolls of defective warp-knitted cloth and 731 corresponding defective images collected from two factories in Fujian and Shanghai, over a period of 2 months. These images were captured using the prototype shown in Figure 10(b) and contain eight different types of defects, including broken yarns, hooked yarns, holes, knots, soiling and so on. A total of 1512 images were chosen as the final dataset samples after cropping and shifting. We first trained the model on this self-built dataset, and then used the model for online detection. The outcomes are illustrated in Figure 11, where it can be observed that the proposed method achieved satisfactory results with a detection rate of 40 m/min.

Figure 10.

Automatic fabric defect detection platform; (a) shows the schematic diagram of the inspection platform; (b) shows the physical picture of the detection platform. The light source (3) provides a good lighting environment for the fabric (1). The movement of the fabric (1) drives the encoder (4) rotating, which triggers the camera (2) to capture images. The images (7) are transferred into the memory of the computer (5) through the frame grabber (6). While online detection, the images (7) are sent to the proposed model (8) to obtain the segmentation result (9).

Figure 11.

Visualization of online detection. The first row is original images. The second row is the corresponding segmentation results. The third row is examples of quantitative feature extraction.

In addition, several examples of the extraction of quantitative features are depicted in Figure 11. We first segmented the connected components to determine the number of defects. Then, we calculated the defect area by quantifying the number of pixels within each connected component. Finally, we calculated the bounding rectangle using the contour coordinates and derived the length and width of the defects.

Conclusions

In this paper, we propose a novel dual-path model for fabric defect segmentation, thereby achieving outstanding accuracy at the pixel level, which renders it more conducive to subsequent quantitative feature extraction. The proposed model takes into account both texture and semantic information. We utilize dense connections effectively to reuse shallow features and amplify the significance of texture details in the decision-making process, while residual modules are employed rapidly to deepen the model and acquire semantic information. To achieve a more efficient integration of semantic and texture information, we have also devised a crossed attention fusion module.

Through comprehensive experiments conducted on various datasets, our proposed method exhibits impressive performance achieving an mIoU score of 75.84% on Ngan and 70.69% on AITEX. Furthermore, we have developed an automatic platform for fabric defect detection, capable of deploying our model for online testing. With a detection rate of 40 m/min, our system fulfils the demands of actual production requirements.

It is worth noting that the fabric defect dataset utilized in this study is small in terms of size, imposing limitations on the model’s performance. Therefore, we will focus more on the collection of fabric defect samples with the ultimate aim of creating a publicly accessible dataset for research purposes. Moreover, it is challenging to collect defective samples due to their rarity in production practice, whereas defect-free samples are abundantly available. In the future, we plan to concentrate on anomaly detection methods that train models using only defect-free samples.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of the article: This work was supported by the National Key Research and Development Project (grant number 2018YFB1308800).

ORCID iDs

Zhiqi Yu

Yang Xu

Yuekun Wang

References

Zheng

Kong

, et al. Recent advances in surface defect inspection of industrial products using deep learning techniques. Int J Adv Manufact Technol 2021; 113: 35–58.

Hanbay

Talu

Ozguven

OF.

Fabric defect detection systems and methods – a systematic literature review. Optik 2016; 127: 11960–11973.

Czimmermann

Ciuti

Milazzo

, et al. Visual-based defect detection and classification approaches for industrial applications – a survey. Sensors 2020; 20: 1459.

, et al. Fabric defect detection in textile manufacturing: a survey of the state of the art. Security Commun Networks 2021; 2021: 9948808.

Rasheed

Zafar

Rasheed

, et al. Fabric defect detection using computer vision techniques: a comprehensive review. Math Problems Engng 2020; 2020: 8189403.

Jing

Zhang

Wang

, et al. Fabric defect detection using Gabor filters and defect classification based on LBP and Tamura method. J Text Inst 2013; 104: 18–27.

Raheja

Ajay

Chaudhary

Real time fabric defect detection system on an embedded DSP platform. Optik 2013; 124: 5280–5284.

Zhu

Pan

Gao

, et al. Yarn-dyed fabric defect detection based on autocorrelation function and GLCM. Autex Res J 2015; 15: 226–232.

Bodnarova

Bennamoun

Latham

Optimal Gabor filters for textile flaw detection. Pattern Recognit 2002; 35: 2973–2991.

10.

Raheja

Kumar

Chaudhary

Fabric defect detection based on GLCM and Gabor filter: a comparison. Optik 2013; 124: 6469–6474.

11.

Wang

Huang

XB.

Fabric defect detection based on multiple fractal features and support vector data description. Engng Appl Artific Intell 2009; 22: 224–235.

12.

Celik

Dulger

Topalbekiroglu

Development of a machine vision system: real-time fabric defect detection and classification with neural networks. J Text Inst 2014; 105: 575–585.

13.

Zhang

Fabric defect classification using radial basis function network. Pattern Recognit Lett 2010; 31: 2033–2042.

14.

Zhang

Ren

, et al. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 27–30 June 2016, pp. 770–778.

15.

Simonyan

Zisserman

Very deep convolutional networks for large-scale image recognition. arXiv preprint arXi 2014; 14091556.

16.

Sun

Zhou

Gao

, et al. A fast fabric defect detection framework for multi-layer convolutional neural network based on histogram back-projection. IEICE Trans Inform Systems 2019; E102D: 2504–2514.

17.

Huang

Xie

, et al. Research on a surface defect detection algorithm based on MobileNet-SSD. Appl Sci-Basel 2018; 8: 1678.

18.

Liu

Anguelov

Erhan

, et al. SSD: Single Shot MultiBox Detector. In: Computer Vision – ECCV 2016 (eds Leibe B, Matas J, Sebe N, et al.), Cham, 2016//2016, pp. 21–37. Springer International Publishing.

19.

Ren

Girshick

, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Machine Intell 2017; 39: 1137–1149.

20.

Badrinarayanan

Kendall

Cipolla

SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Machine Intell 2017; 39: 2481–2495.

21.

Long

Shelhamer

Darrell

Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015, pp. 3431–3440.

22.

Liu

Y-t

Yang

Y-n

Chao

, et al. Research on surface defect detection based on semantic segmentation. In: 2019 International Conference on Artificial Intelligence, Control and Automation Engineering (AICAE 2019), 2019; pp. 403–407.

23.

Gao

Wang

, et al. BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation. Int J Computer Vision 2021; 129: 3051–3068.

24.

XD.

Fully Convolutional Networks for Surface Defect Inspection in Industrial Environment. In: 11th International Conference on Computer Vision Systems (ICVS), Shenzhen, Peoples R China, 10–13 July 2017, Computer vision systems, ICVS 2017, pp. 417–426.

25.

Jun

Wang

Zhou

, et al. Fabric defect detection based on a deep convolutional neural network using a two-stage strategy. Text Res J 2021; 91: 130–142.

26.

Zhang

Lee

DJ.

Automatic fabric defect detection with a wide-and-compact network. Neurocomputing 2019; 329: 329–338.

27.

Liu

Zhang

, et al. Fabric defect recognition using optimized neural networks. J Eng Fibers Fabrics 2019; 14: 1558925019897396.

28.

Wang

Chen

Qiao

, et al. A fast and robust convolutional neural network-based defect detection model in product quality control. Int J Adv Manufact Technol 2018; 94: 3465–3471.

29.

Zhu

Han

Jia

, et al. Modified DenseNet for automatic fabric defect detection with edge computing for minimizing latency. IEEE Internet of Things J 2020; 7: 9623–9636.

30.

Jing

Zhang

HH.

Automatic fabric defect detection using a deep convolutional neural network. Colorat Technol 2019; 135: 213–223.

31.

Bag of tricks for fabric defect detection based on Cascade R-CNN. Text Res J 2021; 91: 599–612.

32.

Liu

, et al. Fabric Defects Detection based on SSD. In: 2nd International Conference on Graphics and Signal Proceesing (ICGSP), Sydney, Australia, 6–8 October 2018 (ICGSP 2018), pp. 74–78.

33.

Xie

ZS.

A robust fabric defect detection method based on improved RefineDet. Sensors 2020; 20: 4260.

34.

Zhao

Yin

Zhang

, et al. Real-time fabric defect detection based on multi-scale convolutional neural network. IET Collab Intell Manufact 2020; 2: 189–196.

35.

Peng

Wang

Hao

, et al. Automatic fabric defect detection method using PRAN-Net. Appl Sci-Basel 2020; 10: 8434.

36.

Dong

Song

, et al. PGA-Net: pyramid feature fusion and global context attention network for automated surface defect detection. IEEE Trans Indust Informat 2020; 16: 7448–7458.

37.

Tabernik

Sela

Skvarc

, et al. Segmentation-based deep-learning approach for surface-defect detection. J Intell Manufact 2020; 31: 759–776.

38.

Jing

Wang

Ratsch

, et al. Mobile-Unet: an efficient convolutional neural network for fabric defect detection. Text Res J 2022; 92: 30–42.

39.

Tao

Zhang

, et al. Automatic metallic surface defect detection and recognition with convolutional neural networks. Appl Sci-Basel 2018; 8: 1575.

40.

Huang

Liu

Van Der Maaten

, et al. Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, pp. 4700–4708.

41.

Szegedy

Vanhoucke

Ioffe

, et al. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, pp. 2818–2826.

42.

Sandler

Howard

Zhu

, et al. Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, pp. 4510–4520.

43.

Shen

Sun

Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, pp. 7132–7141.

44.

Zhao

Shi

, et al. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, pp. 2881–2890.

45.

Chen

L-C

Zhu

Papandreou

, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018, pp. 801–818.

46.

Tsang

CSC

Ngan

HYT

Pang

GKH.

Fabric inspection based on the Elo rating method. Pattern Recognit 2016; 51: 378–394.

47.

Silvestre-Blanes

Albero-Albero

Miralles

, et al. A public fabric database for defect detection methods and results. Autex Res J 2019; 19: 363–374.

48.

Selvaraju

Cogswell

Das

, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int J Computer Vision 2020; 128: 336–359.