Abstract
A safe operation protocol of the wind blades is a critical factor to ensure the stability of a wind turbine. Sensors are most commonly applied for defect detection on wind turbine blades (WTBs). However, due to the high cost and the sensitivity to stochastic noise, computer vision-guided automatic detection remains a challenge for surface defect detection on WTBs in particularly, its accuracy in locating defects is yet to be optimized. In this paper, we developed a visual inspection model that can automatically and precisely classify and locate the surface defects, through the utilization of a deep learning framework based on the Cascade R-CNN. In order to obtain high mean average precision (mAP) according to the characteristics of the dataset, a model named Contextual Aligned-Deformable Cascade R-CNN (CAD Cascade R-CNN) using improved strategies of transfer learning, Deformable Convolution and Deformable RoI Align, as well as context information fusion is proposed and a dataset with surface defects categorized and labeled as crack, breakage and oil pollution is generated. Moreover to alleviate the problem of false detection under a complex background, an improved bisecting k-means is presented during the test process. The adaptability and generalization of the proposed CAD Cascade R-CNN model were validated by each type of defects in dataset and different IoU thresholds, whereas, each of the above improved strategies was verified by gradual ablation experiments. Finally experiments that compared with the baseline Cascade R-CNN, Faster R-CNN and YOLO-v3 demonstrate its superiority over these existing approaches with a maximum of 92.1% mAP.
Introduction
Being the core component of a wind turbine, the safe operation of the blade is vital for ensuring stability of the wind turbines. Most wind turbines work in distant or offshore areas under the backdrop of severe environmental conditions, which can inflict different types of defect on the surface of the blade including oil pollution, crack, sand hole, coating breakage, corrosion, surface icing and matrix aging. Meanwhile, the corresponding maintenance solution may differ depending on the type of defects found, often of which will also be determined by the severity grading of each type of defects that has occurred. Therefore, the size of the area affected by the defects has been of strategic concern and the strengthening of early detection with precise defect classification and positional information are necessary.
Sensors are commonly used for defect detections on wind turbine blades (WTBs), such as the ultrasonic detection method [1] and the fiber grating sensing method [2]. However, their applicability is limited by the high cost, the sensitivity to differentiate stochastic noise and the high risk of maintenance. An increasing number of field studies have now incorporated image processing techniques into defect detection on WTBs through deploying UAV (unmanned aerial vehicles). For example, a three-stage line-edge-quantification [3] has been used for detecting surface crack on WTBs in a wind farm in the USA. Furthermore, a crack analysis [4] method with pre-processing of motion blurring, image noise reduction and enhancement based on the use of the grey-scale value has been proposed in China. These methods still rely on the pre and post-processing of data which is time-consuming and often can only detect one pre-designated type of defect. Lately, the boom and advancement in machine learning has led to its adoption into WTBs inspection. The approach of manual feature extraction with classifier is extensively employed by Haar-like, LBP, SIFT and HOG. For example, Haar-like features with an extended cascading classifier were presented for detecting cracks on WTBs [5]. Unfortunately, essential features of the defects cannot be acquired by artificial designation which is detrimental to the detection results. Moreover, the approach is commonly adopted in binary rather than multi-type classification.
Surface defect detection has made considerable strides in recent years through the emerging use of convolutional neural network (CNN) in many areas, such as scratch detection on metal surface [6], damage detection on railway surface [7], defect detection on camshaft surface [8] and surface detection in solar cell [9]. In the field of object detection, CNN is applied extensively in one-stage or two-stage approaches. One-stage approaches, represented by YOLO [10], YOLO-v3 and SSD [11], directly detect objects without the region proposal network (RPN) which have a considerable detection rate. A detection system [12] was established based on improved YOLO for object detection in images from traffic scenes, of which the detection rate was 1.18 times higher than that of the traditional YOLO. Faster-SSD [13], which was built on the SSD prototype, is used for real-time detection with limited computation. In regards of the practical detections of real surface defects on WTBs, a YOLO-based Small Object Detection Approach (YSODA) [14] was previously devised in our group, which aims at small defects and efficiency. Nevertheless, classification and framing accuracy are more of a concern by system operators than the efficiency, especially for the high-altitude images taken by UAV. Consequently, two-stage methods, represented by Faster R-CNN [15], Mask-R-CNN [16], R-FCN, are more suitable for the application on automatic multi-type defect detection. Two-stage methods attempt to delineate the detection process into specific tasks in order to achieve better accuracy. For instance, an improved approach of four-type defects [17] based on Faster R-CNN is proposed to be used for the recognition of defects from the ground penetrating radar (GPR) profile of subgrade detection data. Blade inspection for the detection of five types of defect [18] based on deep learning and UAV has been constructed, but with no multi-type classification or defect localization capabilities. Therefore, while previous correlation studies have indicated that the above two-stage models offer higher accuracy in object detections in various industrial fields, its practicality in surface defect detection on WTBs remains inadequate. Further studies will be needed to deeply excavate the inherent characteristics of datasets and optimize structure of algorithm.
In this paper, a Contextual Aligned-Deformable Cascade R-CNN (named CAD Cascade R-CNN), built upon the Cascade R-CNN [19], is constructed to detect surface defects on WTBs. In order to enhance sensitivity towards defects of diverse and small dimensions, Deformable Convolution [20] and Deformable RoI Align, as well as context information fusion are employed. In addition, the improved bisecting k-means is incorporated in the test process to eliminate the disadvantages associated to the complex background. Meanwhile, PReLU [21] is also involved as a good candidate for boosting up the robustness of the model and promoting the convergence rate and the accuracy of detection. Last, elaborate experiments were conducted to verify the advancement and reliability of the proposed model. By collecting data from actual wind field and augmenting the training set, a dataset containing a total of 16649 labeled images is ultimately generated. The followings summarize the main contributions of our work:
This paper explores the applicability of a two-stage cascade network on defect detection of WTBs, which can automatically and simultaneously classify and localize surface defects with high-accuracy and within an acceptable time, thus allows the economical maintenance of the WTBs. A dataset, in which surface defects have been precisely classified and labeled as crack, breakage and oil pollution, is generated. Data augmentation, Gaussian Blur and Multiply in particular, are applied to extend training set in order to maintain the accuracy and robustness of the modeling. An improved model named CAD Cascade R-CNN is proposed in this paper. Aimed for high-accuracy, especially for the images taken by UAV, the model structure and test strategy are optimized by analyzing inherent characteristics of established dataset.
The rest of this paper is structured as follows. Section 2 presents the related work of surface defect detection on WTBs with the development of object detection. Section 3 describes the acquisition and establishment of image dataset from WTBs. Section 4 reveals the constructed model based on Cascade R-CNN and detailed the advancements of the improved strategies. In Section 5, exhaustive experiments are performed and the analyses are discussed. In Section 6, a conclusion is given and future works are outlined.
From a review on current research for surface defect detection, sensors are commonly used on WTBs. A novel hybrid signal processing technique [1] was proposed for non-destructive testing of WTBs using ultrasonic guided waves. In addition, fiber grating sensor [2] also has been recently applied to health detection of WTBs because of its immunity to lightning and electric shortage.
With the boom and advancement in machine learning, an increasing number of field studies have now incorporated it into defect detection on WTBs based on the images taken by UAV. The approach of manual feature extraction with classifier has been employed in the inspection. For example, based on the original and extended Haar-like features [5] extracted from the defects on WTBs, a cascading classifier was developed and trained to inspect the cracks automatically. Although it may get great performance for the simple projects, the accuracy is unsatisfactory for the advanced visual tasks especially the multi-type detection in practice.
In recent years, due to the success of CNN in the field of object detection, deep learning methods have led to its adoption into WTBs inspection in one-stage or two-stage. One-stage approaches have become popular, mostly due to their computational efficiency. YOLO [10] outputs sparse detection results by forwarding the input image once which enables real time object detection. In regards of the practical inspection of real surface defects on WTBs, YSODA [14] was previously proposed based on YOLO which aims at efficiency. SSD [11] detects objects by employing multiple feature maps at various resolutions in order to cover objects at different scales. However, the accuracies of these approaches are generally below that of two-stage. The two-stage methods detect objects by combining a proposal network and a region-wise classifier. After the success of the R-CNN [22], Fast R-CNN [23] and Faster R-CNN [15] were proposed to achieve further speed-up and higher accuracy. What’s more, the Faster R-CNN has become a representative framework in object detection in the industrial fields. For instance, the detection of the electrical equipment from distance using Faster R-CNN and the images collected by UAV has been constructed with great performance [24]. Later, more recent works have extended the architecture to settle various problems of detail, such as Mask R-CNN [16], R-FCN. Although the previous correlation studies have indicated that the two-stage models offer satisfactory accuracy in object detections in various industrial fields, its practicality in surface defect detection on WTBs remains inadequate. Therefore, further studies will be considered in this paper.
Data collection and dataset construction
Collecting image data by UAV
The premise of applying computer vision for automatic defect detection on WTBs is the validity of image data. The data used in this paper are collected by M200 UAV with an installed Zenmuse30 (Z30) PTZ camera as shown in Fig. 1.
More than 3000 image shots and dozens of videos of wind turbines together with the background are collected from actual wind fields with manual flight mode in Shandong, Fujian and Gansu provinces of China. The data are retrieved from a memory card for offline training. Images are extracted from the videos at 30 frames per second so that thousands of original images are obtained for filtering in the next step. It can be concluded from these images that complex backgrounds, as well as shooting image at volatile distance and angle are not conducive to the precise localization and classification so that further studies will be needed to explore the optimization for detection.
M200 UAV with an installed Z30 PTZ camera.
Since the collected data mentioned in Section 3.1 are unordered and contain useless information, it is necessary to screen for effective image data. In this paper, three common types of defect including crack, breakage and oil pollution are selected as the subjects for this research. Any image that contains any of these three defects is classified as a positive sample and the remaining data excluded as negative samples. Completely irrelevant images are discarded. Representative positive samples are shown in Fig. 2.
For better management of the data, each sample is assigned a unique six-digit sequence, i.e. 100001. In the meantime, to adapt to the format requirements of different detection models, all samples are resized to different resolutions, such as 300
Pre-labelling of the types and dimensions of the defects on the positive samples is a prerequisite for the supervised training. As a rule, only images with obvious and identifiable defects are labeled, whereas images with unrecognizable defects even in expert opinion are ignored. The defects are labelled by Labelimg [25] in Python platform which can automatically generate XML index files in VOC dataset format and include positional information of defects in the images. Moreover, defects are labelled on the images with pinpoint rectangle frames as the ground truths and divided into their corresponding types.
Building dataset
The distribution of the data in dataset is shown in Table 1. The dataset includes 864 pre-selected positive samples, in which 210 are crack, 216 are breakage, 426 are oil pollution and 12 are multi-defect images. The ratio of the training and the test set is designated to 7:3 and images are randomly assigned to each set according to this ratio from the three defect types and negative samples to ensure the same distribution. The images in the dataset are original with no augmentation yet.
Data distribution in dataset
Data distribution in dataset
Representative positive samples: (a) crack (b) breakage (c) oil pollution.
Principle of Cascade R-CNN
Based on the R-CNN [22] and Fast R-CNN [23] algorithms proposed in 2014 and 2015, Ross B. Girshick presented a new detection algorithm, Faster R-CNN [15], which was the most accurate model at that time. Faster R-CNN is composed of three sections: a feature extraction network, a region proposal network (RPN) and a Fast R-CNN detection network. Specifically, the output feature map of the image from the feature extraction network serves as the input to the other two networks. Then, the RPN generates a range of RoI proposals and transfers them to the Fast R-CNN. At last, Fast R-CNN is employed for bounding box regression of RoI proposals and predicting the defect types.
Structure of the Cascade R-CNN.
Implementation principles and procedures.
Nevertheless, considering the problem of false positives and the lack of robustness to diverse defect dimensions and shapes due to the fixed threshold when filtering positive proposals in RPN, Cascade R-CNN is a better choice for this study. Cascade R-CNN features two more Fast R-CNN detection networks when compare with Faster R-CNN, setting an increasing IoU threshold of 0.5, 0.6 and 0.7 to extract the positive proposals. The output from the previous network feeds as the input for the latter and allows sequential improvement of the detection accuracy and effective suppression of over-fitting. Moreover, using the mean classification probability from three networks instead of from a single network can further promote the precision on the classification. The structure of Cascade R-CNN is presented in Fig. 3.
In this paper, Cascade R-CNN is applied for the automatic detection of multi-type surface detects on WTBs. In order to overcome existing issues, we integrated a series of improvements to modify the traditional Cascade R-CNN and test process. The implementation principles and procedures are shown in Fig. 4.
The development environment is first established to conduct defect detection using deep learning. The environment is based on Tensorflow with Python language and accelerated calculation on GPU which is advantageous to training efficiency. Before the training, the initial key hyper parameters such as the learning rate, thresholds and iterations are defined according to the experience of experts and the characteristics of dataset.
Drawing lessons from the idea of transfer learning, ResNet101 [26] pre-trained on ImageNet [27] is applied in training to initialize the structure parameters of network for abating dependence of model on positive data and enhancing its generalization. After that, the augmented training set with priori information is employed to fine-tune the improved model so that a trained model which fits the features of the dataset and averts over-fitting can be acquired. Finally, the performance of the trained model is verified by the test set, and then improved bisecting k-means is applied in test process to exclude the influence of complex background. Abundant experiments are analyzed to validate the adaptability, robustness and high-accuracy of improved approach.
Since the angle and the distance at which images are taken by the UAV may not be controllable and the boundaries of the defects can be equivocal, these problems present difficulties for the effective detection of irregular or small dimension defects. Moreover, accurate detection can be interfered by the complexity of image background, which may induce a higher false detection. In order to address these issues, Deformable Convolution and Deformable RoI Align, context information fusion, improved bisecting k-means as well as detailed improvements such as PReLU and expanding anchors are all integrated into the Cascade R-CNN and the test process. In addition, in order to prevent over-fitting of the trained model, we augment the existing training set close to reality. The network structure of the improved model of CAD Cascade R-CNN is illustrated in Fig. 5 and the details of each strategy are described in the following sections.
Data augmentation
Due to the limitation of the positive samples, certain data augmentation methods are applied to expand the training set. Through the geometric and pixel transformation, more invariance features in the images can be emulated by the trained model with effective suppression of over-fitting. In particular, the augmentation methods of Gaussian Blur and Multiply are employed to improve generalization of the trained model close to reality. Meanwhile, the defect annotations are transformed together with the images during augmentation which reduce time and cost for the labeling. The descriptions of various methods used are shown in Table 2. After random enhancement, the training set is expanded to 16300, and unqualified data of which defects are out of the boundary are deleted. Representative images of the training set after data augmentation are shown in Fig. 6.
Data augmentation methods
Data augmentation methods
Structure of the CAD Cascade R-CNN.
To adapt to defects with irregular and small dimensions, which are difficult to be precisely detected and framed, Deformable Convolution and Deformable RoI Align is integrated into Cascade R-CNN. Since
Representative images of the training set after data augmentation.
Deformable Convolutions require information from the previous convolutions for learning the offsets well, only the convolutions of last three Units of conv4_x in feature extraction network are replaced by Deformable Convolutions. The RoI Pooling in Fast R-CNN is replaced by the Deformable RoI Align, which is developed from Deformable RoI Pooling. The sampling processes of Deformable Convolution and Deformable RoI Pooling are shown in Fig. 7.
Sampling processes: (a) Deformable Convolution (b) Deformable RoI Pooling.
Regular lattice sampling in standard convolution renders the network difficult to adapt to geometric deformation. To eliminate this limitation, Deformable Convolution and Deformable RoI Pooling are added with learnable offsets variables to the position of each sampling point of the convolution kernel. The convolution kernel samples near the current position based on the learned offsets instead of being limited to previous regular lattice points. This sampling method makes the convolution kernel better in learning the detect features and their diverse shapes.
Figure 7a reveals the sampling process of Deformable Convolution added with an additional convolution layer for the learning of the offsets based on the original convolution. The offsets act on the sampling points of the input feature map and then convolute the offsets sampling points to obtain the output feature map so that the features tallied with the defect shapes are transmitted backward for classification and regression.
Figure 7b reveals the sampling process of Deformable RoI Pooling added with a full connection layer for the learning of the offsets after RoI Pooling. The offsets also act on the sampling points and then the output proposals of a uniform size can be obtained by using interpolation method. In this paper, the proposals are unified to the size of 7
The RoI Pooling applied in Deformable RoI Pooling is a standard operation for extracting uniform size from each proposal and dimension reduction. However, it has an intrinsic issue because of floating-point rounding that can lead to the misalignments as shown in Fig. 8 (with 2
Schematic diagram of the RoI Pooling calculation.
where
To address the precision issue, RoI Align [16] is used to solve the mismatch problem. As shown in Fig. 9, bilinear interpolation is applied in RoI Align to compute the values of four regular sampling points in each bin, and the results are aggregated by max pooling to acquire the output feature values with no quantization. Expectedly, it has a positive effect on predicting small defects and accurate framing.
The forward calculation equation of RoI Align is shown as Eq. (2).
where
Schematic diagram of the RoI Align calculation.
Consequently, Deformable RoI Align is proposed to update Deformable RoI Pooling by replacing RoI Pooling with RoI Align in the process in order to avoid the negative effect of small translations. The forward calculation equation of Deformable RoI Align is shown as Eq. (3).
where
With the deepening of the layers, local feature information is often lost and the internal information of proposals may be insufficient which is unfavorable for regression as shown in Fig. 10. To address these issues, three accessional RoI align layers are applied in the model in order to learn the global information of the feature map exported from conv2_x, conv3_x and conv4_x respectively as shown in Fig. 5. Subsequently, three global features are added to the features of the proposals from Deformable RoI Align of three Fast R-CNNs so that adequate regression information can be obtained by proposals for the better localization results. Moreover, small defects can be more likely identified by retrieving lost feature information during the process of subsampling to promote classification and localization accuracies of the model.
Improved bisecting k-means
Since a wide range of focal length and complex background information of the images may lead to massive false detection. Meanwhile, proposals containing real defects with low confidence scores may be filtered out by setting a fixed confidence score threshold. In order to avert these issues, improved bisecting k-means is employed in the test process.
Examples of proposals: (a) containing sufficient internal information (b) containing insufficient internal information.
The flow chart of improved bisecting k-means is shown in Fig. 11. Euclidean distance is selected as the distance measure in the k-means process after experimental comparison with Cosine distance. Since the defect number per image in dataset is basically no more than 5, bisecting k-means is employed and the cluster with higher average scores is retained when the number of extracted proposals from RPN is greater than 5. This is iterated continuously until the number of proposals in cluster is less than or equal to 5. After that, the proposal with the highest confidence score is kept to prevent missed detections. At last, proposals with confidence scores higher than the set threshold are retained as well, thus obtaining the final predictions.
On the basis of the above, 9 anchors created from each pixel in RPN are expanded to 20 for adapting to multiple aspect ratios of defects in the dataset. It means that instead of three scales {8, 16, 32} and three ratios {0.5, 1, 2} applied in the generation of anchors are extended to four scales {4, 8, 16, 32} and five ratios {0.25, 0.5, 1, 2, 4}.
The ReLU activation functions of the network are replaced by PReLU and the definitions are shown in Eqs (4) and (5). On the basis of ReLU, PReLU is added with a learnable parameter
Full connection layers of the first two Fast R-CNN networks are replaced by global average pooling to reduce the parameters and prevent over-fitting of the trained model.
Configurations of the equipment
The flow chart of improved bisecting k-means.
Model training and key parameters
The development environment is based on Tensorflow with Python language and accelerated calculation on GPU to promote training efficiency. ResNet101 is selected as the backbone for feature extraction and the network which has been trained on ImageNet is applied to initialize the parameters of the model. In addition, main configurations of the equipment are shown in Table 3.
The end to end mode is employed for the training, which efficiency is higher than that of the alternating optimization without affecting accuracy. In addition, an exponential decayed learning rate is applied in the training and the learning rate is initialized to 0.0001. With the increase in iterations, the learning rate decreases exponentially to ensure that the model does not fluctuate too much in the later stage of the training and that it can be closer to the optimal solution. Moreover, the maximum iteration is designated to 70000 for complete training.
Experiments and evaluation
Experimental design
The benefits gained in the detection accuracy and characteristics of the model will be progressively presented by experiments as follows.
Results in different IoU thresholds
Results in different IoU thresholds
Ablation experiments of improved strategies
Comparisons of AP for three types surface defects on the WTBs.
Examples of defect detection results: (a) manual annotations (b) prediction results.
Comparisons of the loss of CAD Cascade R-CNN with ReLU and PReLU.
Examples of detection results: (a) without improved bisecting k-means (b) with improved bisecting k-means.
Comparative analysis of accuracy mAP.
Above all, comparisons of AP for three types of surface defects based on Cascade R-CNN and CAD Cascade R-CNN are made to verify the adaptability and high-accuracy of the proposed model. With the rise of IoU threshold which is the evaluation index of positive samples, the stability of detection accuracy can validate the robustness and regression ability of the models, therefore, different IoU thresholds are employed in the test.
To evaluate each improved strategy, gradual ablation experiments are designed and shown in Table 4. All models are based on the premises of adopting pre-trained network and extracting 20 anchors. Model 1 is the original Cascade R-CNN framework, and the Deformable Convolution and Deformable RoI Align is employed in Model 2. Model 3 applies the context information fusion on the basis of Model 2, and Model 4 is the proposed CAD Cascade R-CNN using PReLU as the activation function instead of ReLU based on Model 3.
Further comparisons with one-stage YOLO-v3, two-stage Faster R-CNN and Cascade R-CNN are conducted to demonstrate the superiority of CAD Cascade R-CNN.
In order to evaluate the detection performance of the models without bias, average precision (AP) and mean average precision (mAP) are used in this paper which are the most commonly used metrics in the field of object detection. These evaluation indexes are on the basis of two parameters: Precision and Recall.
Precision means the proportion of predicted defect samples (positive samples) that are actually having defects, as given by Eq. (6).
where
Recall means the proportion of defect samples that are accurately predicted to be defects, as given by Eq. (7).
where
A set of Precision and Recall under a series of confidence score thresholds can be calculated for each type of defect so that a PR curve with Recall as abscissa and Precision as ordinate is obtained. The area under the PR curve is the AP for each type of defect and mAP is obtained by averaging the AP from all types of defect, as given by Eq. (8). The value of mAP must lie between [0, 1].
where
Compared with original Cascade R-CNN, CAD Cascade R-CNN achieves a higher AP for all types of defect as shown in Fig. 12. Moreover, the adaptability of it is also validated by the balanced results of three kinds of defect.
As shown in Table 4, the improved model confers stable detection accuracies with increasing evaluation stringencies while the accuracy of the comparison model greatly dropped, which reflects the robustness and regression accuracy of CAD Cascade R-CNN. In order to further demonstrate the flexibility and accuracy of the model, examples of detection results compared to manual annotations are shown in Fig. 13.
The results of ablation experiments are shown in Table 5. It can be concluded that each improved strategy can effectively enhance the accuracy mAP to a certain extent within an acceptable consumption. The application of Deformable Convolution and Deformable RoI Align to the model improves mAP by 6.3%. In addition, mAP further increased by 4.3% after employing context information fusion. Using PReLU instead of ReLU as the activation function is beneficial to the detection and the convergence rate with effective suppression of over-fitting. The comparisons of loss are shown in Fig. 14.
By showing the friendliness of improved bisecting k-means to images with complex background, the test comparisons are shown in Fig. 15 and unveil the robustness and generalization of the algorithm.
Finally, results comparing the mAP of Faster R-CNN, YOLO-v3, Cascade R-CNN and CAD Cascade R-CNN are shown in Fig. 16. CAD Cascade R-CNN achieved the highest mAP of 92.1%, which surpasses all these existing approaches and attests to its superiority.
Conclusions
Since the angle and the distance at which images are taken by the UAV may not be fully controllable and the boundaries of the defects can be equivocal, these problems present challenges for the effective detection of many defects with diverse or small dimensions. This paper proposes an improved CAD Cascade R-CNN aimed for high-accuracy automatic and stable detection of surface defects on WTBs.
Robustness and generalization of the model are also necessary for field detection. By collecting data from actual wind field, a dataset containing pre-selected 1164 original images of 3 common types of defect categorized as crack, breakage and oil pollution is established, with the type label and positional information of each defect. The images in the training set are then augmented to 16300 with the defect annotations transformed, especially by Gaussian Blur and Multiply, to further enhance the robustness of the trained model.
The accuracy and excellent characteristics of the trained CAD Cascade R-CNN are guaranteed with exhaustive experiments. First, the robustness and adaptability of the model is demonstrated by the detection results on all types of defect and with varying IoU thresholds. After that, the availability of each improved strategy is validated by ablation experiments. Application of Deformable Convolution and Deformable RoI Align in CAD Cascade R-CNN enhances mAP by 6.3% which is useful for the detection of defects with irregular and small dimensions. Context information fusion improves mAP by 4.3% for the detection of small defects and confers better regression. Utilizing PReLU instead of ReLU prevents the gradient from disappearing and promotes the convergence rate and accuracy. Meanwhile, the detrimental influence of complex background on detection results is circumvented by employing improved bisecting k-means in the test process. At last, by comparing with existing approaches, CAD Cascade R-CNN achieves the highest mAP of 92.1% among all, outperforming YOLO-v3 (69.7%), Faster R-CNN (77.6%) and Cascade R-CNN (80.9%). Through utility this approach as the protocol for actual surface defect detection on WTBs, the inspectors can intuitively assess the condition of WTBs and minimize potential downtime to boost productivity and economic benefits.
This paper shows the possibility to enhance the adaptability of the model to detect defects of diverse and small dimensions, as well as under complex backgrounds by optimizing the framework and the test strategy. Meanwhile, a solution is proposed to prevent over-fitting caused by too few samples and the performance of detection is guaranteed by the improved model. The algorithm can be adapted to various blade backgrounds and complex environmental conditions, which can be integrated to aerial photography of UAV and ground mobile robot shooting in actual wind farms.
In the future, the following works will be needed for further assessment:
Promotion of CAD Cascade R-CNN for other types of surface defects on WTBs based on more defect samples. Consideration of the detection rate for a more efficient model at a desired or constant accuracy.
Footnotes
Acknowledgments
This work is supported by the National Natural Science Foundation of China (50776005, 51577008).
