Abstract
Surface defect detection is critical for obtaining high-quality products. However, surface defect detection on circular tubes is more difficult than on flat plates because the surface of circular tubes reflect light, which result in missed defects. In this study, surface defects, including dents, bulges, foreign matter insertions, scratches, and cracks of circular aluminium tubes were detected using a novel faster region-based convolutional neural network (Faster RCNN) algorithm. The proposed Faster RCNN exhibited higher recognition speed and accuracy than RCNN did. Furthermore, incorporation of image enhancement in the method further enhanced recognition accuracy.
Introduction
Surface defect detection has been considered a major component in production. Whereas, defect detection on circular tubes is more difficult than on that on flat plates because of the complex structure of circular tubes.
Some scholars study the image segmentation method. Liu et al. [1] proposed a modified multi-scale block local binary mode (LBP) algorithm, in which the image is divided into small blocks and a grey histogram is generated as the eigenvector of the image. This algorithm not only is simple and efficient but also ensures high recognition accuracy because the size of the blocks is varied to find an appropriate scale to describe defect characteristics. Experiments revealed the application value of the multi-scale block LBP algorithm in online real-time detection systems. Peng et al. [2] proposed an approach on the basis of abnormalities of image moment characteristics for detecting surface defects of cabinets and established a Gaussian distribution model for normal, defect-free image blocks. The anomalous features in defect image blocks were acquired and the defect image blocks were recognized with Gaussian distribution model combined with segmentation threshold. Tao et al. [3] designed a novel cascaded automatic encoder structure for defect segmentation and location. In the cascade network, semantic segmentation was used to convert the input defect images into pixel-level prediction templates. Zi et al. [4] proposed an unsupervised natural surface defect detection method on the basis of low-order representations for texture, in which detection process was considered to be a new weighted low-order reconstruction model.
Some scholars study the improvement of neural networks. Wang et al. [5] proposed a method which combined an improved ResNet-50 and a reinforced faster region-based convolutional neural network (Faster RCNN) to decrease average run time and reinforce accuracy. Wu et al. [6] proposed a system calibration method based on bias parameterisation and practical scanning trajectory modelling, and defined a constraint function to display image straightness and scale error. The function was subsequently minimised to obtain the optimal estimate of system bias. This estimate was applied in system adjustment as well as reliable defect image reconstruction. Lv et al. [8]proposed a defect enhanced generative adversarial network to detect produced microcrack defects exhibiting apparent defect characteristics and high diversity. Wei et al. [11] introduced a weighted region of interest (ROI) pool to replace the ROI pool, which eliminated the regional misalignment caused by the two quantification processes. Han and Yu [12] proposed a defect detection method with stacked convolutional auto encoder. The proposed auto encoder was trained with non defective data and synthetic defect data produced with defect features on the basis of expert knowledge. Zhao et al. [13] reconstructed the convolutional layers for ResNet-50 with deformable convolution and reinforced identification for pointer surface defects using the feature extraction network.
The methods used for different detection objects are also different. Yang et al. [14] raised a defect detection algorithm of small parts following a single short detection network integrated with deep learning. Feng et al. [15] proposed a novel algorithm for detecting orbital defect targets. The network architecture of the algorithm consisted of a backbone network using MobileNet and numerous novel detection layers with multi-scale feature maps. Lei et al. [16] used a character recognition method on the basis of spatial pyramid character scale matching in recognizing carved characters of bearing dust cover. Tian et al. [17] segmented parts in accordance with plane light source image, followed by detection of defects from grey-level anomalies. Next, according to the reflection characteristics of the part surface, the impact exerted by reflection on the image was decreased through regulating camera exposure time. The position and direction of edges in the abnormal grey-level region of multi-angle light source image helped judge if an abnormal area belonged to a defect.
Different from the above research, this project mainly studies the defect detection of aluminum round tube, which needs to deal with problems such as reflection and surface bending [20, 21].
Surface defect detection of aluminium tubes
In this study, five types of surface defects on aluminium tubes were considered, namely dents, bulges, foreign matter insertions, scratches, and cracks.
Dents are caused by external forces on the tube surface, bulges occur because of the bonding of welding spots on the tube surface, foreign matter insertions are attributed to the embedding of other metal substances during processing, scratches are induced by sharp objects passing through the tube, cracks are generated because of tube cracking because of tool cutting. The dent is larger than the raised part and has reflections around it. The reflections in the bulge are more intense. The reflection light intensity of the foreign body embedded is small. There is a clear boundary between it and the rest of the tube. The scratch is thin, just one color. The crack is coarse. It is black in the middle, both ends are gray. We use an industrial camera to take pictures of defects. The camera is 2 megapixels. The frame rate is up to 30 frames. The processing speed is up to 3 ms.The lens is 3–8 mm in focus. The industrial cameras are equipped with light sources that can adjust light intensity, as shown in Fig. 1. Each defect is manually marked Mark the defect with a big box at the first time. Mark the defect with a small box at the second time. These defects were captured by an industrial camera and 20 photographs of each category were captured, as displayed in Fig. 2.
Hardware structure of industrial camera.
Overview of surface defects.
Aluminium tubes have a curved shape and generally reflect light, which can lead to defect detection errors. With the same resolution as industrial cameras, the defect recognition rate of aluminium tubes is generally lower than that of aluminium plates.
In this study, the surface defect detection method was combined with image enhancement with Faster RCNN to improve the recognition rate of Faster RCNN. The recognition results of Faster RCNN and RCNN were compared.
RCNN involved the following four steps during detection:
The visual method, such as selective search, was used to generate numerous candidate regions. Feature extraction was performed using CNN for each candidate region to form a high-dimensional feature vector. These eigen values were inputted into a linear classifier to calculate the probability of a certain category for determining the contained objects. A fine regression was performed on the location and size of the target peripheral box.
Defect recognition method based on Faster RCNN
Feature maps generated by CNNs at each stage were used in Faster RCNN, and the presented multi-level feature fusion network (MFN) was used to combine multiple levels of features into a single feature that contained location information of defects. The region proposal network (RPN) was used to generate ROIs, and a detector consisting of a classifier and a boundary box regressor was used to produce the final detection results. ResNet-50 was applied as the feature extraction network.
Improvements
The main differences between Faster RCNN and R-CNN are as follows:
An ROI pooling layer was introduced following the final convolutional layer. The loss function adopted the multi-task loss function, which directly incorporated bounding box regression into the CNN network for training. RPN replaced the former selective search method for generating the proposal window. The CNN for generating the proposal window was shared with the CNN for target detection.
The differences enabled the following improvements of Faster RCNN compared with RCNN.
RCNN was slow in testing because it decomposed an image into numerous proposal boxes. The image formed by stretching proposal box then extracted features individually through the CNN. Overlaps between these proposal boxes were abundant, and their eigen values could be shared. Therefore, this phenomenon resulted in considerable loss of computing power. Faster RCNN normalised the image and fed it directly into the CNN. In the feature map output by the final convolutional layer, proposal box information was added to share the previous CNN operations. The RCNN was slow in training because it stored the features extracted by the CNN on the hard disk before support vector machine (SVM) classification. This operation involved considerable data reading and writing and slowed the training speed. By contrast, Faster RCNN, only required feeding of one image into the network. For each image, CNN features and proposal regions were extracted at once. The training data were directly sent into the loss layer in graphics processing unit (GPU) memory, so that repetitive calculation of the first layers of features for the candidate area and massive data storage on the hard disk were not necessary. RCNN training required a large space. Independent SVM classifiers and regressors required numerous features such as training samples; thus, a large hard disk space was required. In Faster RCNN, unified category judgement and location regression was performed using a deep network and required no additional storage.
Algorithm steps
The network structure of Faster RCNN is displayed in Fig. 3.
The structure diagram of Faster RCNN.
When extracting image features, the CNN reduced network model complexity and weight number. Images were straightforwardly applied as network inputs.
The VGG16 network was applied in feature extraction. In contrast to LeNet, AlexNet, and ZFNet, VGG16 showed a deeper depth and more prominent feature extraction and satisfactory detection performance.
RPN
RPN was adopted for generating high-quality region proposal boxes, which shared image convolutional characteristics with the detection network. The target detection speed was considerably improved compared with that of the selective search method.
A sliding window (3
A 512-dimensional vector was obtained by the sliding (convolution) operation and subsequently fed into two parallel fully connected layers, the box-classification layer (cls), and the location regression layer (reg), thus obtaining information on classification and location.
The centre of each sliding window corresponded to k anchors. Each anchor corresponded to a dimension and aspect ratio. In RPN, three types of dimensions and aspect ratios such that each sliding window had nine anchors.
For each point, three sizes of boxes were involved, namely large, medium, and small, with a ratio of 2:1, 1:2, and 1:1, respectively.
Accordingly, at each sliding window, nine region proposals were predicted simultaneously so the box-regression layer had 4
Target recognition
Faster RCNN was used for detection and classification after generating the proposal region by using RPN. Faster RCNN implemented end-to-end joint training with excellent detection results.
RPN and Faster RCNN shared the convolution features extracted from the first five parts of VGG16. Faster RCNN used the high-quality proposal region of RPN in target recognition, greatly improving the target detection speed.
The purpose of network training is to minimize the loss function. The loss function is shown in Eq. (1)
Where a is the weight, which controls importance to classification or regression.
Alternate RPN and Faster RCNN training was adopted in network training.
First, RPN was trained. The network was not initialised with ImageNet but was directly trained with the training data. Model M1 was obtained when RPN training was completed, which was used to generate proposal region P1.
Second, Faster RCNN was trained to obtain model M2 with the proposal region P1 produced from the RPN during the first step.
Third, M2 was used to initialise RPN training and obtain model M3. Unlike in the first step, this phase of training fixed the shared convolutional layer parameters, with fine tuning for RPN parameters. Similarly, the trained RPN was used to generate proposal region P2.
Fourth, Faster RCNN was trained with M3 and P2 to attain the final model M4. This phase of training determined the parameters of the shared convolutional layers and merely trained the fully connected layers of Faster RCNN. Thus, these two networks shared the same convolutional layers and constituted an entire network.
Six thousand boxes remained after removing the bounding boxes among the original 20,000 boxes. After screening some boxes that overlapped and resembled each other, 2000 boxes remained. After sorting, the first 128 boxes were retained, which reduced the number of boxes and improved the detection speed.
Image preprocessing methods could be used to enhance image characteristics and improve contrast. The preprocessing methods used in this experiment included histogram equalisation and filter smoothing.
Contrast enhancement improved the contrast by linearly transforming the grey level into a specified range. Histogram equalisation refers to the equalised adjustment of contrast by using histograms.
Surface defects image after image enhancement.
In Eq. (2), MN indicates the total number of pixels inthe image,
Median filtering method, in which the median of the grey levels of points is used in the neighbourhood of the smoothing template to replace the grey level of the centre pixel can be represented as follows:
In Eq. (3),
The initial learning rate for training was set as 0.001, a quantity containing 32 observed values was used for each iteration, and the number of iterations was set to 100. The maximum and minimum effective thresholds for RPN were set to 0.7 and 0.3, respectively. A maximum threshold of 0.5 and a minimum threshold of 0.1 were configured when determining the category of the regression box.
The first way of marking for surface defects.
The enhanced image is shown in Fig. 4. Mark the defect with a large box at the first time, as shown in Fig. 5 and with a small box at the second time, as shown in Fig. 6. Through experiments, it can be seen that the size of the box has a great influence on the recognition rate.
The second way of marking for surface defects.
Category average pixel accuracy and test set recognition accuracy were used to evaluate the recognition effect of the three algorithms. The number of correct detections for each class divided by the actual number of targets for that class gives the accuracy of calculating the class model. The average calculation accuracy of 5 categories is the category average pixel accuracy. Test set recognition accuracy is equal to the number of correct identification divided by the total number of tests.
Faster RCNN exhibited higher recognition accuracy than that of RCNN, and Faster RCNN
Defect identification results after the first marking
Defect identification results after the first marking
Defect identification results after the second marking
Identifying surface defects on circular aluminium tubes is more difficult than that on flat plates because of frequent reflections. In this study, five surface defects, namely aluminium tubes, including dents, bulges, foreign matter insertions, scratches, and cracks, were investigated.
A method combining Faster RCNN and image enhancement was used for defect recognition, which achieved a recognition accuracy of 96%, which indicated excellent defect detection performance.
The initially labelled images were rough and inaccurate. Therefore, the model performance could not be further improved after training to a certain degree by using this dataset. When the data were relabelled in detail, satisfactory results were obtained in the improved dataset.
Footnotes
Acknowledgments
The authors acknowledge the Collaborative Innovation Project of Anhui Universities (No. Gxxt-2019-003 and No. Gxxt-2021-010).
