Component identification and defect detection in transmission lines based on deep learning

Abstract

Ensuring the stable and safe operation of the power system is an important work of the national power grid companies. The power grid company has established a special power inspection department to troubleshoot transmission line components and replace faulty components in a timely manner. At present, assisted manual inspection by drone inspection has become a trend of power line inspection. Automatically identifying component failures from images of UAV aerial transmission lines is a cutting-edge cross-cutting issue. Based on the above problems, the purpose of this article is to study the component identification and defect detection of transmission lines based on deep learning. This paper expands the dataset by adjusting the size of the convolution kernel of the CNN model and the rotation transformation of the image. The experimental results show that both methods can effectively improve the effectiveness and reliability of component identification and defect detection in transmission line inspection. The recognition and classification experiments were performed using the images collected by the drone. The experimental results show that the effectiveness and reliability of the deep learning method in the identification and defect detection of high-voltage transmission line components are very high. Faster R-CNN performs component identification and defect detection. The detection can reach a recognition speed of nearly 0.17 s per sheet, the recognition rate of the pressure-equalizing ring can reach 96.8%, and the mAP can reach 93.72%.

Keywords

Power line detection deep learning component recognition faster R-CNN network model

1 Introduction

With the development of the second industrial revolution in the last century, power technology has been widely used and has brought great progress to modern society [1]. In today’s fast-developing era of the information Internet, the demand for electricity is constantly and rapidly developing. Computers, mobile phones, smart appliances, and other electrical equipment make us increasingly dependent on electricity. Electricity has become a necessity for people’s lives. Because the country has a wide area, the economically developed areas are not consistent with the power-intensive areas, so grid companies need to expand the lines to solve the problem of unbalanced power consumption. During the “Thirteenth Five-Year Plan” period, China will make every effort to promote industrial innovation, strive to promote the close integration of smart grid construction with “Made in China 2025” and “Internet+”, promote the industrial competitiveness of smart grid equipment, and strive for “One Belt One Road” Guided by the expansion of smart grid equipment to overseas markets, promoting the country’s leading position in smart grid technology [2]. At present, China has invested 100 billion yuan to build three major transmission channels connecting the west and the east and has significantly expanded the ultra-high-voltage and large-capacity power lines. Large-scale power systems continuously supply power to users through high-voltage transmission lines.

Transmission lines can be basically divided into two categories: one is a cable transmission line, and the other is an overhead transmission line [3]. The former is mainly buried underground, its cost is relatively expensive, and failures are not easy to detect [4]. The components of the overhead transmission line include wires, overhead ground wires, insulator strings, pole towers, and grounding devices. The grid company transmits electrical energy through the transmission wires, and the wires are consolidated at the top of the tower by insulators [5, 6]. The latter, because of its lower production cost and relatively simple construction is also a method often used in China’s current power transmission [7, 8]. Because the transmission lines span different regions and different climatic environments, the natural environment is very harsh [9]. In addition, due to the long-term exposure of these transmission components to harsh outdoor environments, the components may suffer from flashover, aging, and mechanical damage. Once these problems are not dealt with in time, it is very easy to cause major transmission accidents [10, 11]. These transmission accidents will lead to very serious consequences such as large-scale power outages and line damage, which will adversely affect the stable operation of the power system and cause serious economic losses to society [12, 13]. Therefore, we need to regularly check the transmission lines. However, due to the great differences in the terrain across China, many areas of the transmission line need to pass through mountains and rivers, and through some communication blind spots, these factors bring great inconvenience to the inspection and maintenance of the transmission line [14]. At present, it is difficult to meet the routine maintenance requirements of high-voltage transmission lines with traditional manual maintenance methods [15]. This backward method will also bring great hidden dangers to the safe operation of the power grid [16, 17]. Grid companies in developed regions at home and abroad use helicopters or drones to check transmission lines. This method relies on advanced equipment and has the advantages of high efficiency, fastness, stability, and safety. This method has gradually become an important supplementary method for the inspection of transmission lines in China [18].

In recent years, deep learning has achieved very good results in the field of image recognition and detection [19]. Priyadarshini proposed an image recognition algorithm for semantic segmentation of cracks and leaks in subway shield tunnels based on the feature layer of a full convolutional network (FCN). The self-developed mobile tunnel inspection image acquisition device (MTI-200a) was used to collect the defect images in the training data set and the test data set. After the image data set is established, the FCN model of cracks and leaks is trained through multiple iterations of forwarding reasoning and backward learning, respectively [20]. Through the corresponding FCN model, the two-stream algorithm is used to achieve the semantic segmentation of the defect image [21]. The crack is identified through the sliding window assembly operation, and the other stream is identified by adjusting the size of the interpolation operation. Compared with the commonly used region growth algorithm (RGA) and adaptive threshold algorithm (ATA), this method has obvious advantages in recognition results, inference time, and error rate. Two defect non-overlapping (TDN) images, two Overlapping Defective (TDO) images. This method can quickly and accurately identify defects in the structural health monitoring and maintenance of subway shield tunnels [22, 23]. Over the past few years, interest in motion and gesture recognition has increased dramatically. Tao reviews current deep learning methods for sequential image motion and gesture recognition. They introduced a taxonomy that summarizes important aspects of deep learning to deal with these two tasks. They reviewed the details of the proposed architecture, fusion strategy, main data set, and competition. They summarized and discussed the main work proposed so far, paying special attention to how to deal with the time dimension of data, discussing their main characteristics, and identifying opportunities and challenges for future research [24, 25]. Due to the unrestricted marine environment, underwater target recognition is a challenging task [26]. As the number of data increases, deep learning methods have been successfully applied in aerial target image recognition. However, Tim’s research shows that deep neural networks (DNNs) are susceptible to overfitting of small samples. Underwater image acquisition often requires a lot of manpower and material resources, and it is difficult to obtain sufficient sample images for dnn training. In addition, images taken by underwater cameras are often degraded by noise. Taking live fish recognition as an example, Tim proposed an underwater image recognition framework for small samples. First, an improved median filtering method is proposed to suppress the noise of fish school images. Then, the convolutional neural network is used to pre-train the images from the world’s largest image recognition database Image Net. Finally, the pre-trained fish image is used to fine-tune the trained neural network and test the classification performance. The experimental results show that this method can effectively identify fish species and provide an effective method for solving the problem of recognition in the case of small samples [27, 28].

This article mainly studies the identification and detection methods of power components in transmission lines. Starting from sample preprocessing and deep learning network framework improvement, target detection is performed on key equipment in aerial inspection images of power inspection to achieve improved detection rate and detection accuracy. The main work includes: introducing the current identification technology, the identification of key equipment of transmission lines, and the current status of deep learning research, and analyzing the advantages and disadvantages of some existing methods; then, all the image sample predictions are discussed processing methods and combined with the actual project requirements, an innovative self-cutting algorithm is proposed to expand the sample more effectively; secondly, the existing deep learning algorithms are improved to realize the improvement of transmission line component recognition accuracy; finally, the improved algorithm proposed in this paper is compared with other deep learning detection methods, and the experimental results are displayed and analyzed.

2 Proposed method

2.1 Traditional target detection method

The traditional target detection method usually uses a sliding window to obtain a large number of potential candidate regions [29]. The size of the candidate regions generated by sliding frames is fixed, and a large number adds a large burden to subsequent calculations [30]. In addition, in the aspect of feature extraction, traditional object detection algorithms will consider designing many very complicated manual features to perform feature extraction on images. When encountering complex and changeable data sets, they are often poorly robust. Deep learning has certain universality, so this chapter mainly discusses related research on object detection based on deep learning [31].

Object recognition and localization is an important research direction in the field of machine vision and pattern recognition. The subject involves multiple research contents such as data processing, feature extraction, classifiers, and so on. The data needs to be pre-processed before it enters the network. It is the processing of the detected image in the early stage of object recognition, such as image denoising, image cropping, brightness transformation, normalization, and other operations [32]. After data pre-processing, feature extraction operations are performed. There are many traditional feature extraction methods, such as SIFT and SURF for local feature extraction; Haar and LBP for face feature extraction [33]; HOG for pedestrian detection, and so on. The role of the classifier is to reasonably classify the extracted features. Common classifiers include SVM, Bayes. [34].

However, in practical applications, the background of the image is diverse, the shape of the object is variable, and the angle of view of the camera changes a lot, which makes the task of object detection more difficult. In recent years, deep learning has made rapid progress in the field of security, especially in the areas of object recognition, detection, and tracking. This chapter mainly studies how to use the results of deep learning in computer vision to achieve the target detection of transmission lines. Generally speaking, object recognition and positioning are divided into two parts. First, find the positions of all foreground targets in the scene to get the position information of the candidate frame of the target; the other part is to determine which object the object belongs to based on the found object category information. As shown in Fig. 1, this process is target detection based on deep learning algorithms [35].

Fig. 1

Deep learning-based object detection steps.

Deep learning-based object detection algorithms usually have four steps.

The first step is to perform image preprocessing on the input image or video frame, including: normalizing the data to reduce the difference between different pictures; and performing noise reduction, rotation, and scaling operations on the image.

The second step is to generate candidate frames that potentially contain targets by sliding frames or RPN networks.

The third step is feature extraction to obtain the feature vector, and the fourth step is to use the feature vector to classify the candidate frames.

The order of the second and third steps directly affects the entire calculation of the model. If the candidate frames are generated directly first since the image will contain many objects with different sizes and shapes, the general frame selection method needs to traverse the entire image, so many redundant frames will be generated. If feature extraction is performed on these redundant frames separately, the calculation pressure will be greatly increased. So the idea of the RPN network is to first use a convolutional network for feature extraction, then share features, and perform sliding frame selection on the extracted feature layer. Common candidate region methods include selective search, edge boxes, and RPN based on convolutional neural networks.

The first use of convolutional neural networks for feature extraction in target detection is in the R-CNN network the scholar Rossi. Girshick proposed an R-CNN network using SS plus CNN, which opened up the use of deep learning to solve target detection of new ideas. In this method, R-CNN uses a selective search algorithm to extract 2000 candidate frames from the original image. Then each candidate box is scaled to a fixed size, all feature maps uniformly input to a CNN convolutional network to extract features, and the fourth step is to input the extracted features to an SVM classifier for classification. R-CNN’s subsequent SPP-Net, FastRCNN, FasterRCNN, and other networks, the basic process is similar to the above process, except that the region proposal extraction method, and the region selection and feature extraction order are different.

At present, deep learning-based object detection methods are mainly divided into two categories. The first category is one-stage deep learning methods, such as SSD, YOLO. The second category is based on two-stage methods, mainly represented by FasterR-CNN.

2.2 Faster R-CNN based target detection method

(1) Framework introduction of Faster R-CNN

The framework flow of FasterR-CNN is shown in Fig. 2. It is mainly composed of the following parts: The first part is the RPN network. This module is responsible for generating the coordinates of the candidate box and whether it is a foreground score. The second part is the FastR-CNN detection module responsible for detection. In the overall structure, the previous convolutional network is used to extract features, and the later RPN network and FastR-CNN detection network share the previous features. The RPN network will generate candidate boxes of different sizes on the anchor points of each feature map. The detection network will detect these candidate regions and identify the target category in the candidate regions.

Fig. 2

Introduction to Faster R-CNN framework.

(2) Regional generation network of Faster R-CNN

The innovation of FasterR-CNN network lies in the RPN network. The target detection network before this method first uses the SS algorithm to perform sliding frame selection on the entire feature map. For example, the SPP-Net network and FastR-CNN network both improve the traditional sliding marquee detection network and improve the efficiency of the detection network. However, both of these methods take a long time to generate candidate region boxes. But the region generation network (RPN) method shares the features generated by the convolutional network, so the total detection region generation consumes very little time. The basic idea of the area generation network is shown in Fig. 3.

Fig. 3

Faster R-CNN generated anchor schematic.

The network generates anchors for each anchor point on the feature map and performs regression and classification of the anchors and foreground backgrounds. Anchor bounding boxes are some candidate regions defined in advance: each anchor point on the feature map will generate 9 different candidate regions with an area of {128 * 128, 256 * 256, 512 * 512} and three ratios of length and width {1:1, 1:2, 2:1}. These candidate regions are called anchors.

The structure of the regional generation network is a full convolution structure. This structure can make the entire network do end-to-end training. Its operation process is as follows:

Generate anchors, and use border regression for all anchors. The anchors generated here are exactly the same as during training.

The anchors are sorted according to the input candidate frame score, and the first 6000 anchors are extracted, that is, the foreground candidate frame after the correct position is extracted.

Map the candidate anchors back to the original image to determine whether the target frame exceeds the boundary in a large range, and remove candidate frames that seriously exceed the boundary.

Perform a non-maximum suppression operation.

The target boxes are sorted according to the scores, and the first 300 results are extracted as the output of the RPN network.

The RPN network outputs the coordinates of the last foreground box and the score value of the probability.

(3) Loss function design

The loss during network training consists of two parts, one is the loss of the regression position, and the other is the classification loss. The total loss function can be expressed as: $L (x, c, l, g) = \frac{1}{N} (L_{conf} (x, c) + {aL}_{loc} (x, l, g))$ (1)

Where N is the number of matches between the default box and the real box, α is the weight factor and is generally set to 1, C is the confidence of each class, l and g are the parameters of the default box and the real box, including the center position coordinates and width and height. _Conf (x, c) is the loss of classification confidence, using multi-class loss Softmax, L_loc is used to return the center position and width and height of bounding boxes, and using Smooth_L1 loss, Smooth_L1 is calculated as follows: ${Smooth}_{L 1} (X) = f (X) = {\begin{matrix} | X | - 0.5 | X | \geq 1 \\ 0.5 X^{2} | X | < 1 \end{matrix}$ (2)

2.3 Bilinear interpolation image compression algorithm

The image size of aerial images is very large, and the deep learning object detection framework often performs a compression process on the images [36]. Therefore, before the picture is input into the deep learning framework, the image is generally compressed.

A typical SSD training sample size is 300×300. However, the insulator aerial images are often ultra-clear pictures with large pixels, so it is necessary to compress the pictures to improve the training speed. This article compresses the original image to 1024×1024. Image compression refers to reducing the original image into a new image at a specified ratio.

If you directly scale the images by scaling the images, the coordinates of many points are calculated as decimals. Therefore, this paper uses the interpolation algorithm to approximate the coordinates. Interpolation algorithms usually include nearest neighbor interpolation, bilinear interpolation, and higher-order interpolation. Nearest neighbor interpolation is the simplest interpolation method. The output pixel value of this method is the pixel value of the nearest sampling point in the input image. Higher-order interpolation can save more details of the original image than neighboring interpolation, but this method requires longer calculation time. Bilinear interpolation is the algorithm used in this paper. It specifies the output pixel value as the weighted average of the pixel gray values of the sample points in the neighborhood nearest to it in the input image [37]. Considering that it takes a lot of time to process large batches of samples, this paper adopts the bilinear interpolation compression method.

The steps for bilinear interpolation are as follows:

(1) Calculating the scaling factor

First, calculate the horizontal compression factor H_f and the vertical compression factor V_f. Assuming that the pixel point before scaling is P₀(x₀,y₀), the corresponding pixel point of the scaled image is P₁(i,j). The scaling formula is as in formula (3): ${\begin{matrix} H_{f} = srcWidth / dstWidth \\ V_{f} = srcHeight / dstHeight \end{matrix}$ (3)

Where srcWidth and srcHeigh represent the length and width of the original image; dstWidth and dstHeigh represent the length and width of the new scaled image.

(2) Calculate the target point in the original image

Since the coordinates of the scaled points are not necessarily integers after they are mapped back to the original image, the original pixel coordinates P₀(x₀, y₀) corresponding to the scaled pixels is first obtained here. ${\begin{matrix} x_{0} = i \times H_{f} \\ y_{0} = j \times V_{f} \end{matrix}$ (4)

(3) Calculate the coordinates of the 4 points closest to P0

The third step needs to calculate the coordinates (x₁, y1), (x₁, y2), (x₂, y1), (x₂, y2) of the four pixels closest to P₀(x₀, y0) in the original image.

(4) Calculate weight ${\begin{matrix} w_{1} = (x_{0} - x_{1}) (y_{0} - y_{1}) \\ w_{2} = (x_{2} - x_{0}) (y_{1} - y_{0}) \\ w_{3} = (x_{0} - x_{1}) (y_{2} - y_{0}) \\ w_{4} = (x_{2} - x_{0}) (y_{2} - y_{0}) \end{matrix}$ (5)

(5) Calculate P₀ point coordinates

$\begin{matrix} P_{0} (x_{0}, y_{0}) = & w_{1} P_{1} (x_{1}, y_{1}) + w_{2} P_{2} (x_{2}, y_{1}) + \\ w_{3} P_{3} (x_{1}, y_{2}) + w_{4} P_{4} (x_{2}, y_{2}) \end{matrix}$ (6)

According to the above formula, all the sample pictures are traversed in order to obtain the compressed image.

2.4 Sample pretreatment process

This article does not crop all the sample pictures when processing the samples, so we must first make a judgment on the proportion of the objects in the sample pictures. This article divides the goals into small goals, medium goals, and large goals according to the target occupying less than 5%, between 5% 12%, and greater than 12%. Finally, only the pictures classified as the small target and medium target are operated. For the pictures containing samples of the large target, no cropping operation is performed in this article.

The details of the sample pretreatment process used in the actual experiments in this paper are shown in Fig. 4 below:

Fig. 4

Sample expansion and preprocessing process.

First, calculate the size of the target frame, and then crop the small and medium targets. The processed image and the original image are subjected to conventional operations such as image compression, brightness conversion, and horizontal flip. Then this part of the expansion and the original picture set together form a training sample library.

3 Experiments

3.1 Experimental data preparation

The data set in this article is mainly based on the photos taken during the sampling inspection. This photo covers the four seasons of spring, summer, autumn, and winter. The shooting locations are diverse and the resolution is high. This article focuses on the identification and defect diagnosis of transmission line components and chooses five different types of components: equalizing ring 1, equalizing ring 2, complete vibration hammer, bad vibration hammer, and bird’s nest model training and sampling are selected as the default values. Among them, the model training samples have 1,000 training samples in each category, a total of 5000 training samples, and the image size of each sample is 6000×4000.

3.2 Experimental data set processing

First, the training samples are uniformly scaled to 1200×800, and then the complete power component target displayed in each image of the training set is labeled as a frame, the frame coordinates are recorded, and a classification label is provided.

In the test sample, 500 images of each category were selected as the test set, and the test set did not contain training samples. When identifying and classifying the test group, all electrical components (including incomplete and electric components) included in the training sample must be marked and classified and scored. Think of this as a successful identification.

3.3 Experimental model training

This paper is based on the MXNET framework and uses Faster-RCNN to implement network model training. The VGG16 network and the ResNet-101 network were used to initialize the pre-trained ImageNet network to obtain three models with different network layers. The data set of region 2.2 is used as a training sample. The training number of each model training is 20, the batch size is 128, the learning rate is 0.001, the weight attenuation rate is 0.0005, and the number of candidate regions before and after non-maximum suppression is 6000 and 300, respectively. In this experiment, as a criterion for evaluating the quality of the model, the correct rate, recall rate, and missing recognition rate were used. The correct percentage indicates the ratio of the number of targets that are correctly identified to the number of all targets that are identified. The ratio of the number of sample targets. The unrecognized recognition rate indicates the ratio of the number of unrecognized targets to the number of all sample targets.

4 Discussion

4.1 Comparison of model recognition effect

In the experiment, the test set in section 3.3 was used as the test set for 3 different models. The two models obtained in Section 3.3 were used as the test models. The accuracy rate, recall rate, and missed recognition rate were used as the criterion for evaluating the quality of the model. The application of the two models in the identification of transmission line components is shown in Tables 1 and 2.

Table 1
VGG16 model verification results

Category Validation set Correctly identified Error recognition Missing identification Recall Correct rate Miss recognition rate

Equalizing ring 1 500 422 8 70 0.844 0.981 0.140

Equalizing ring 2 500 411 6 83 0.822 0.986 0.166

Birdhouse 500 408 2 90 0.816 0.995 0.180

Good shock hammer 500 380 10 110 0.760 0.974 0.220

Bad shock hammer 500 366 12 122 0.732 0.968 0.244

mAP 0.795 0.981 0.190

Category	Validation set	Correctly identified	Error recognition	Missing identification	Recall	Correct rate	Miss recognition rate
Equalizing ring 1	500	422	8	70	0.844	0.981	0.140
Equalizing ring 2	500	411	6	83	0.822	0.986	0.166
Birdhouse	500	408	2	90	0.816	0.995	0.180
Good shock hammer	500	380	10	110	0.760	0.974	0.220
Bad shock hammer	500	366	12	122	0.732	0.968	0.244
mAP					0.795	0.981	0.190

Table 2

ResNet-101 model verification results

Category	Validation set	Correctly identified	Error recognition	Missing identification	Recall	Correct rate	Miss recognition rate
Equalizing ring 1	500	458	1	41	0.916	0.998	0.082
Equalizing ring 2	500	457	1	42	0.914	0.998	0.084
Birdhouse	500	453	0	47	0.906	1	0.094
Good shock hammer	500	426	1	73	0.852	0.998	0.146
Bad shock hammer	500	412	2	86	0.824	0.995	0.170
mAP					0.882	0.998	0.115

The accuracy of the five types of targets detected under two different models is shown in Fig. 5.

Fig. 5

Accuracy of 5 types of targets detected under two different models.

According to the experimental results shown in Tables 1 and 2 and Fig. 5, the recall rate, correct rate, and missed the recognition rate of the 5 types of targets detected under 2 different models can be obtained. For the recall rate, the ResNet-101 model recall rate is the highest, followed by the VGGl6 model. In terms of accuracy, the ResNet-101 model is the best, followed by the VGGl6 model. For the leak recognition rate, the ResNet-101 model has the lowest leak recognition rate, followed by the VGGl6 model. It can be seen from Tables 1 and 2 that the recognition effect for the pressure-equalizing ring 1, the pressure-equalizing ring 2, and the bird’s nest is far superior to that of the shockproof hammer. The reasons may be as follows: 1) the structural characteristics of the vibration hammer are not obvious enough; 2) the training sample data of the vibration hammer is insufficient, which leads to the model’s generalization ability is not good enough.

4.2 Impact of convolution kernel size on recognition effect

According to the experimental results in Tables 1 and 2, aiming at the problem of unsatisfactory seismic hammer recognition and single picture recognition time optimization, the optimal model ResNet-101 model is selected for experiments. The experimental direction is to modify the convolution kernel of its convolution structure. It is mainly to modify the size of the first layer convolution kernel. According to the content in Section 1, it can be known that different convolution kernel sizes have an impact on recognition accuracy and recognition time.

The size of the convolution kernel in the first convolution layer of the ResNet-l01 model network is 7, according to the experimental idea, gradually reduce the size of the convolution kernel for model training, and the parameter settings are consistent with the parameter settings in Section 3.3, and then use the trained model, the samples were tested, and the recall rate and recognition time were used as experimental indicators. The number of all samples was 500. The experimental results are shown in Table 3.

Table 3
Recall rates for different convolution kernel sizes

Convolution Equalizing ring 1 Equalizing ring 2 Birdhouse Good shock hammer Bad shock hammer

99 0.928 0.921 0.914 0.868 0.834

77 0.916 0.914 0.906 0.852 0.824

55 0.902 0.901 0.891 0.838 0.811

33 0.889 0.885 0.879 0.824 0.801

1*1 0.872 0.870 0.868 0.818 0.788

Convolution	Equalizing ring 1	Equalizing ring 2	Birdhouse	Good shock hammer	Bad shock hammer
9*9	0.928	0.921	0.914	0.868	0.834
7*7	0.916	0.914	0.906	0.852	0.824
5*5	0.902	0.901	0.891	0.838	0.811
3*3	0.889	0.885	0.879	0.824	0.801
1*1	0.872	0.870	0.868	0.818	0.788

The recall rate of the detected 5 types of targets at different convolution kernel sizes is shown in Fig. 6.

Fig. 6

Recall rates for different convolution kernel sizes.

According to Fig. 6, it can be seen that different convolution kernel sizes have an impact on the accuracy of the detection, and it can be obtained that as the size of the convolution kernel decreases, the recall rate continues to decrease. Kernels have large receptive fields, and convolution kernels with receptive fields have high recognition accuracy. As the size of the convolution kernel decreases, the recognition time of a single picture also decreases. This is because the number of parameters obtained by the recognition of different convolution kernels is different, and the parameters obtained by small convolution kernels. The number is small, and the number of parameters obtained by the large convolution kernel is large. Therefore, the model can be optimized by adjusting the size of the convolution kernel to meet its actual needs-whether a higher recall rate or a lower recognition time is required.

4.3 Rotate transform to augment the dataset

In this experiment, the images of the five different components in the training set in section 3.2 were rotated by 90°, 180°, and 270°, respectively, and a total of 20,000 training samples were obtained. The set is used as a training sample for the ResNet-101 model. The model is trained. The training parameter settings are consistent with the parameter settings in Section 3.3. Then the tested model is used to test the test set in Section 3.2. The recall rate, correct rate, and missed the recognition rate of the images of 5 different parts are shown in Fig. 7.

Fig. 7

ResNet-101 model verification results for 5 types of targets after data expansion.

According to the experimental results in Table 4 and Fig. 7, it can be obtained that the training model obtained by using the expanded data set for model training, the recall rate of any category identified using this training model is better than the recall rate before expansion, and the correct rate, the sum recognition rate is also better than that before the expansion, so it can be obtained that expanding the data set by the rotation transformation method can improve the recognition accuracy rate.

Table 4

ResNet-101 model verification results after data expansion

Category	Validation set	Correctly identified	Error recognition	Missing identification	Recall	Correct rate	Miss recognition rate
Equalizing ring 1	500	484	1	15	0.968	0.998	0.030
Equalizing ring 2	500	482	1	17	0.964	0.998	0.034
Birdhouse	500	479	0	21	0.958	1	0.042
Good shock hammer	500	453	1	46	0.906	0.998	0.092
Bad shock hammer	500	449	2	49	0.890	0.996	0.098
mAP					0.882	0.998	0.115

4.4 Impact of sample expansion on detection accuracy

In the experimental part, this paper experimentally verifies the adaptive cropping algorithm proposed in the previous article. The self-cropping algorithm crops the sample picture into one or more small pictures according to the position of the target frame marked by the sample. In this algorithm, the crop coefficient K needs to be calculated based on the annotation information of the image XML. The coefficient K needs to be compared with the threshold Q to determine whether the original image is cut into one or two independent small images. Therefore, the value of the threshold Q directly affects the experimental results. In this set of comparative experiments, the basic network used in the experiment is resnet101 + FPN-SSD, and the iteration algebra is set to 100,000 generations. The experimental results are shown in Fig. 8.

Fig. 8

Experimental results at different thresholds Q.

It can be found from Fig. 8 that the experimental results increase first and then decrease with the change of the threshold. The reason for this phenomenon is that when the threshold is increased, more small images of small-scale targets are cropped. However, because the detection samples are scattered at different scales, improving the accuracy of target recognition at small scales does not necessarily change the targets at other scales. The final experimental results show that the threshold Q is better than 0.24 [38].

5 Conclusions

The problem of external force destruction of transmission lines seriously threatens the safe and stable operation of the power system. How to use the existing power grid video surveillance equipment to develop a transmission line system that can automatically identify targets and implement alarm functions is an urgent task at present, and it is also an important part of the country’s strategy for implementing smart grids. Aiming at the specific scenario of transmission line component identification and defect detection, this paper uses the Faster-RCNN network structure to mainly study the recall rate and time of different network model detection results.

In this paper, the two mainstream target detection algorithms FasterR-CNN and SSD are introduced and the advantages and disadvantages are analyzed, and the directions for improvement and optimization are proposed. At the same time, the semantic segmentation method MaskR-CNN is introduced, and this method is applied to the detection of string drop of insulators. Based on the previous research, an object detection framework based on FPN-SSD is proposed. The framework is based on the SSD algorithm and adds an FPN feature pyramid structure. This structure can effectively improve the fusion of context and semantic information, and this multi-scale structure can improve the accuracy of small target recognition. In addition, the feature extraction network was replaced with the resnet101 network. Finally, the average accuracy of the algorithm on this sample set reached 89.3%.

This article only targets the common 5 types of transmission widgets, but the deep learning-based algorithms are universal, and more categories can be added later for research. The sample pictures used in this paper are from aerial inspection photos of power grid drones, but the environment of the transmission line changes very much, so it is necessary to add sample pictures of different backgrounds and different lines to verify the effectiveness of the algorithm. Moreover, this article mainly focuses on the target detection of aerial photography components of transmission lines and does not conduct deeper research on component failure detection. In the later stage, fault detection can be carried out on the basis of target detection.

6 Declarations

Ethical Approval and Consent to participate: Approved.

Consent for publication: Approved.

Availability of supporting data: We can provide the data.

Competing interests

These are no potential competing interests in our paper. And all authors have seen the manuscript and approved it to submit to your journal. We confirm that the content of the manuscript has not been published or submitted for publication elsewhere.

Funding

This work was supported by Key Project of Natural Science Basic Research Plan in Shaanxi Province of China (Grant No. 2018ZDXM-GY-169); Key project of Natural Science Basic Research Plan in Shaanxi Province of China (Grant No. 2019ZDLGY18-03).

Author’s contributions

All authors take part in the discussion of the work described in this paper.

Footnotes

Acknowledgments

The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.

References

Cao

, Wang

, Fan

, Nojavan

, Jermsittiparsert

, Risk-constrained stochastic power procurement of storage-based large electricity consumer, Journal of Energy Storage (2020), 28. DOI:10.1016/j.est.2019.101183

Guo

, Research on Location Selection Model of Distribution Network with Constrained Line Constraints Based on Genetic Algorithm, Neural Computing and Applications 2019(1) (2019), 1–11.

Gómez-Aguilar

J.F.

, Dumitru

, Fractional Transmission Line with Losses, Zeitschrift Naturforschung Teil A 69(10-11) (2015), 539–546.

Sardar

, Husnine

S.M.

, Rizvi

S.T.R.

, Multiple travelling wave solutions for electrical transmission line model, Nonlinear Dynamics 82(3) (2015), 1317–1324.

Moravej

, Pazoki

, Khederzadeh

, New Pattern-Recognition Method for Fault Analysis in Transmission Line with UPFC, IEEE Transactions on Power Delivery 30(3) (2015), 1231–1242.

, Rong

, Salehian

, Gagnon

, Cloud transmission: A new spectrum-reuse friendly digital terrestrial broadcasting transmission system, IEEE Transactions on Broadcasting 58(3) (2012), 329–337.

Song

, Zhang

, Fan

, Compact Broadband Bandstop Filter Based on Composite Right/Left Handed Transmission Line, Electromagnetics 37(4) (2017), 196–202.

Aslan

Ý.

, Exact Solutions for a Local Fractional DDE Associated with a Nonlinear Transmission Line, Communications in Theoretical Physics 66(9) (2016), 315–320.

, Ge

, Liu

, Transmission Line Rating Attack in Two-Settlement Electricity Markets, IEEE Transactions on Smart Grid 7(3) (2017), 1346–1355.

10.

Liu

, Li

, Liu

, Masking Transmission Line Outages via False Data Injection Attacks, IEEE Transactions on Information Forensics & Security 11(7) (2016), 1–1.

11.

Ritzmann

, Wright

P.S.

, Holderbaum

, A Method for Accurate Transmission Line Impedance Parameter Estimation, IEEE Transactions on Instrumentation & Measurement 65(10) (2016), 1–10.

12.

, Schaekers

, Schram

, Multi-Ring Circular Transmission Line Model for Ultralow Contact Resistivity Extraction, IEEE Electron Device Letters 36(6) (2015), 1–1.

13.

Ding

, Bo

, Li

, Optimal Power Flow with the Consideration of Flexible Transmission Line Impedance, IEEE Transactions on Power Systems 31(2) (2015), 1–2.

14.

Qiu

, Pan

, Ni

, Equal transfer process-based distance protection for wind farm outgoing transmission line, Power System Protection & Control 43(12) (2015), 61–66.

15.

Elsayed

, Elhoseny

, Sabbeh

, Riad

, Self-maintenance model for wireless sensor networks, Computers & Electrical Engineering 70 (2018), 799–812.

16.

El-Borai

M.M.

, El-Owaidy

H.M.

, Ahmed

H.M.

, Exact and soliton solutions to nonlinear transmission line model, Nonlinear Dynamics 87(2) (2016), 1–7.

17.

Z.-H.

, Wang

H.-G.

, Wang

Y.-C.

, Line-grasping control for a power transmission line inspection robot, Journal of Jilin University 45(5) (2015), 1519–1526.

18.

Liu

, Li

, Jiang

, Transient Characteristics of Potential Transfer of Live Working on±1 100kV UHVDC Transmission Line, High Voltage Engineering 43(10) (2017), 3149–3153.

19.

Pan

, Liu

, Cao

, Li

, Chen

C.-H.

, Visual Recognition Based on Deep Learning for Navigation Mark Classification, IEEE Access 8 (2020), 32767–32775.

20.

Tian

, Song

, Wang

, Zhou

, A Theoretical Calculation Method of Influence Radius of Settlement Based on the Slices Method in Tunnel Construction, Mathematical Problems in Engineering Volume: 2020 Article Number: 5804823. (2020).

21.

Krishnaraj

, Elhoseny

, Lydia

E.L.

, Shankar

, ALDabbas

, An efficient radix trie-based semantic visual indexing model for large-scale image retrieval in cloud environment, Software Practice and Experience (2020), In Press. (DOI:https://doi.org/10.1002/spe.2834)

22.

Panda

, Sengupta

, Roy

, Energy-Efficient and Improved Image Recognition with Conditional Deep Learning, ACM Journal on Emerging Technologies in Computing Systems 13(3) (2017), 1–21.

23.

Barton

T.W.

, Jurkov

A.S.

, Pednekar

P.H.

, Multi-Way Lossless Outphasing System Based on an All-Transmission-Line Combiner, IEEE Transactions on Microwave Theory & Techniques 64(4) (2016), 1–14.

24.

Zhou

, An Image Recognition Model Based on Improved Convolutional Neural Network, Journal of Computational & Theoretical Nanoscience 13(7) (2016), 4223–4229.

25.

Renhua

, Yichao

, Yuanjing

, Condition Assessment Method for Transmission Line with Multiple Outputs Based on Variable Weight Principle and Fuzzy Comprehensive Evaluation, High Voltage Engineering 43(4) (2017), 1289–1295.

26.

Abdel-Basset

, Mohamed

, Elhoseny

, Chakrabortty

R.K.

, Ryan

, A hybrid COVID-19 detection model using an improved marine predators algorithm and a ranking-based diversity reduction strategy, IEEE Access 8(1) (2020), 79521–79540. (DOI: 10.1109/ACCESS.2020.2990893)

27.

Albrecht

, Slabaugh

, Alonso

, Deep learning for single-molecule science, Nanotechnology 28(42) (2017), 423001.

28.

Senior Member, IEEE, Mingjun Xia. A New Optical Gain Model for Quantum Wells Based on Quantum Well Transmission Line Modeling Method, IEEE Journal of Quantum Electronics 51(3) (2015), 1–8.

29.

Elhoseny

, Multi-object Detection and Tracking (MODT) Machine Learning Model for Real-Time Video Surveillance Systems, Circuits, Systems, and Signal Processing, First Online: 20 39 (2019), pp. 611–630. (DOI: https://doi.org/10.1007/s00034-019-01234-7)

30.

O’Shea

T.J.

and Hoydis

, An Introduction to Deep Learning for the Physical Layer, IEEE Transactions on Cognitive Communications & Networking 3(4) (2017), 563–575.

31.

Noda

, Yamaguchi

and Nakadai

, Audio-visual speech recognition using deep learning, Applied Intelligence 42(4) (2015), 722–737.

32.

Habibi

, Weber

and Neves

, Deep learning with word embeddings improves biomedical named entity recognition, Bioinformatics 33(14) (2017), i37–i48.

33.

Zhang

, Xiao

, Yang

L.X.

, Xiang

and Zhong

, Secure and Efficient Outsourcing of PCA-Based Face Recognition, IEEE Transactions on Information Forensics and Security (2019).

34.

Yuan

, Lu

and Xue

, Droiddetector: Android malware characterization and detection using deep learning, Tsinghua Science and Technology 21(1) (2016), 114–123.

35.

Tang

, Qin

and Liu

, Deep learning for sentiment analysis: successful approaches and future challenges, Wiley Interdisciplinary Reviews Data Mining & Knowledge Discovery 5(6) (2015), 292–303.

36.

Geetha

, Anitha

, Elhoseny

, Kathiresan

, Shamsolmoali

and Selim

M.M.

, An evolutionary lion optimization algorithm-based image compression technique for biomedical applications, Expert Systems (2020), In Press.

37.

Yang

, Sun

and Wang

, No-reference image quality assessment based on sparse representation, Neural Comput & Applic 31 (2019), 6643–6658.

38.

Zhou

, Ke

and Luo

, Multi-camera transfer GAN for person re-identification, J Vis Commun Image Represent 59 (2019), 393–400.