Abstract
Blue calico is a highly valued folk handicraft that forms part of China’s national intangible cultural heritage. Thus, blue calico is a worthy target for reconstruction using modern image processing technology. Extracting the visual components or elements of a blue calico pattern is one way to capture the underlying design and enable innovation in traditional patterns using modern techniques. This paper presents a method of element extraction and classification based on a smart convolutional neural network (CNN), with an improved CifarNet structure, which we call CalicoNet. Initially, the algorithm for element extraction is implemented to generate element samples of blue calico. This process includes gray scaling, binarization, and contour extraction. We construct a data set of elements with 12 types. Then, four critical hyper-parameters, the batch-size, dropout ratio, learning rate, and pooling strategy, are optimized by a comparative analysis. A combination classifier strategy is subsequently added to the fully connected layers to strengthen the feature expression in the corresponding classes. Finally, the superiority of the proposed CalicoNet is verified through a comparison with other sophisticated CNNs. Experimental results demonstrate that CalicoNet achieves a validation accuracy of 99.2% for the training set, a total time of 1.13 hours for the whole data set, and a test mean accuracy precision of 98.66%. The robust performance of the proposed method across the element data set indicates that CalicoNet is a promising approach for element extraction and classification.
China’s blue calico is very famous because it used to be a very popular folk handicraft. At present, it is also a part of China’s national intangible cultural heritage. With the rapid development and wide application of image processing technology, it is of great value and significance to use digital methods to reconstruct blue calico. Extracting and classifying the pattern components or elements of blue calico is the basis for its innovation. For example, similar to building blocks, we can select the required elements from the element database to generate a new type of blue calico. Alternatively, we can pre-draw the outline of the blue calico and automatically fill the polygon with elements from the database. To this end, it is necessary to develop and innovate blue calico with digital restoration and reproduction technology, and this is gradually becoming possible. Elements are the smallest unit that constitute the larger vein patterns, and different vein patterns form different styles of blue calico. The themes expressed by vein patterns in blue calico often involve plants, animals, geometry, symbols, utensils, humans, etc. 1 These vein patterns are composed of a number of elements, such as geometric circles, lines, triangles, and other elements with different sizes and shapes. These patterns are separated but interlaced with each other, forming a unique style of “separate strokes, connected meaning.”2,3
By scaling, rotating, and locally deforming these elements, designers can generate various vein patterns with different blue calico shapes. The elements can also be extracted to help analyze the different designs of the vein patterns. The automatic extraction and classification of elements into a classification database can provide important raw materials for creating new vein patterns, which will play an extremely important role in the digital inheritance and innovation of blue calico. Therefore, the first and most important task is to extract the elements from the vein patterns in blue calico. Element extraction can be realized by the methods of contour extraction. Then the second task will be to effectively classify the large number of elements obtained. In this paper, a novel method is proposed based on a contour tracking algorithm to extract the elements of a blue calico vein pattern, and an improved convolutional neural network (CNN) (CalicoNet) based on CifarNet 4 is developed. Initially, a method to extract elements of blue calico based on contour fitting is implemented. These elements constitute the sample data set in the form of independent sub-images. Then, a nine-layer structure is successively developed based on the functions of different convolutional strides. Optimization of the CifarNet structure from the top layer to the lower layers is performed to investigate the performance of the newly formed networks. This step provides insight into the network’s internal structure. Finally, hyper-parameter optimization is conducted to obtain better results. Modifications are then performed to solve the overfitting problem, which is the main obstacle to improved validation accuracy.
The remainder of this paper is organized as follows. The second section presents an overview of previously published methods of contour extraction and image classification. The third section introduces the proposed method and materials, including the element categories, element extraction, CNN structure, and the proposed CalicoNet architecture. The fourth section reports the experimental results of element extraction and classification based on an element data set by using the improved CNN structure. Finally, conclusions and recommendations for future work are provided in the fifth section.
Related work
In this section, we summarize recent progress in contour extraction and image classification.
Contours are one of the most important features of objects because they can be used to distinguish different regions of a shape and play an important role in object detection by effectively representing the image structure with a large spatial range.5,6 Traditional contour detectors are mainly based on the low-level measurement of discontinuities of image features. Canny 7 and Pablo et al. 8 used the grayscale gradient of image pixels to extract contours. However, this method was prone to be affected by boundary leakage for complex images. Zhang et al. 9 proposed a method where the global Grnwald–Letnikov (G-L) fractional gradient was fused into a region scalable fitting (RSF) model to enhance the gradient information of grayscale inhomogeneity and weak texture regions, and the adaptive fractional order based on the gradient modulus and information entropy of the image was utilized to achieve contour extraction. However, the method was very computationally expensive, thus taking a large amount of time to process the images. By combining fuzzy clustering with active contour methods, Sun et al. 10 proposed a robust local segmentation based on the fuzzy active contour method. The average fuzzy energy function was used to construct a narrow band on the evolution curve through morphological expansion and corrosion operations, so that the minimum value of the fuzzy energy function was solved in the narrow band to achieve local segmentation. However, the segmentation method was sensitive to the initial parameters of the model, and lacked smoothing constraints. Therefore, it was not easy to obtain continuous smooth contours for objects. Tian et al. 11 used the triangle method to approximately track the boundary of a contour, but the generated contour was also not smooth. In the printed fabric industry, Kuo et al. 12 created an image data set of printed fabrics with repeating dot patterns to alleviate issues associated with the management, and searched for numerous dot printed fabrics.
The fundamental purpose of image classification is to assign images to different categories according to their features. Some technologies applied to the classification and recognition of textile images mainly use the low-level features obtained by image processing methods, such as color, shape, edge, structure, and so on. 13 Mustafic and Li 14 used the F-values from multivariate analysis of variance to select the three most contributing color features from blue and ultraviolet (UV) light-emitting diode (LED) data sets. The proposed linear discriminant analysis model achieved classification rates of 80% or higher for bract, green leaf, hull, paper, plastic bag, plastic packaging, seed, and stem, and classification rates between 60% and 80% for bark, seed coat, and twine. Zheng et al. 15 proposed a new local orientation feature for fabric structure detection by using high-resolution images, which could recognize three kinds of yarn-dyed cotton fabrics. Yildiz et al. 16 and Yildiz 17 captured fabric images using a thermal camera during the fabric flow in a quality control machine and identified defective areas on a fabric surface using the heat difference occurring between the defective and defect-free zones. In order to convert images into the gray level, they proposed a principal component analysis-based dimension reduction stage. Then, a gray-level cooccurrence matrix was used for feature extraction.18,19 The defects were classified by a k-nearest neighbor algorithm to achieve a classification average accuracy rate of 96%. Principal component analysis was applied to select the optimal features from 113 wavelengths covering the spectral range from 425 to 700 nm. 20 Then, linear discriminant analysis using the selected wavelengths achieved an average classification accuracy of 90% across all samples.
In order to improve the classification and recognition effectiveness, some traditional neural network techniques have been applied. Kuo and Juang 21 proposed methods based on a back-propagation neural network (BPNN) for defect classification and developed an automated defect classification method for embroidered textiles. Although the BPNN required the least amount of time to train, the recognition rate was slightly lower when compared to the characteristic procedure classification. Ding et al. 22 proposed a unified classification model applying the nearest neighbor method by utilizing both texture and spatial features. By training and testing 13 images of four kinds of traditional She nationality costumes, the average accuracy of cross-validation was 92.3%. In recent years, methods based on deep learning have made remarkable achievements in image classification.23,24 Deep learning uses a learning network of perceptrons with multiple hidden layers. It can automatically learn feedback and optimize the appropriate image features, and then combine and abstract from the low-level features to build a high-level representation. A CNN is the most widely used method in deep learning, and the most effective model for acquiring image classification features. 25 The local connection, weight sharing, and pooling operations make it possible to effectively reduce the complexity of the network, simplify the training parameters, and maintain invariance under translation, distortion, and scaling of the images. The CNN has strong robustness and fault tolerance, while being easy to train and optimize. The CNN has been widely used in natural scene text recognition, 26 face and organ classification and recognition,27,28 behavior recognition, 29 and crop classification and recognition.30–32 In the past two years, the CNN has also been applied in textile fault detection. Jing et al. 33 proposed a color fabric defect detection algorithm based on a deep CNN, which could perform classification and detection of 20 kinds of defect. Wang and Jin 34 used an improved AlexNet model and Sigmoid classifier with 50 iterations of training on the Ordos standard cashmere and wool data sets. The test accuracy could reach 92.1%. He et al. 35 trained a faster region-based convolutional neural network (Faster-RCNN) model consisting of 13 convolutional layers, 13 sampling layers, and four pooling layers. The detection rate of foreign fibers in seed cotton images reached 90.3% in the test set. Wang et al. 36 proposed a CNN-based method for the recognition of broken spindle positions. The recognition accuracy of the neural network was over 97%. Wang et al. 37 proposed a CNN-based emotional classification method for fabric images, which combined manual emotional features. The accuracy of emotional classification was 89.7%.
Based on the above-mentioned studies, many researchers have sought the best methods for contour extraction and image classification. Although some algorithms are useful in the textile field, the use of these methods to solve multi-class element extraction and classification problems, as proposed herein, faces two issues.
Previous studies on contour extraction have been mainly used on images with a complex background, and the extracted contour is not smooth, or even may fail. Therefore, designing and optimizing these algorithms for the element extraction of blue calico will be simple and resource-saving. Although some effective algorithms involving image processing features, such as color, shape, and edge detection, have been developed successfully based on textile characteristics to implement classification, their focus has been on the whole image rather than specific elements as found in blue calico. Also, there is room for improvement in their accuracy.
In order to solve the above-mentioned problems, we present the methods of element extraction using contour tracking and element classification using CalicoNet. The contributions of this study are as follows.
A contour tracking algorithm is developed and applied to extract elements, and a sample data set is established for blue calico. The proposed CalicoNet is implemented to classify elements of the vein patterns for blue calico using a CNN. Based on the external appearance of the elements, the network is modified into a nine-layer structure for efficient matching and identification. Combination classifier concepts are assembled into the combination layer to optimize the features learned by CalicoNet.
Methods and materials
In this section, we review the four aspects of our method: element categories and system design, element extraction, CNN structure, and CalicoNet architecture.
Element categories and system design
Blue calico consists of many different vein patterns, and elements are extracted from the vein patterns. Various types of vein pattern can be reconstructed using a single element or a combination of elements, such as points, lines, or plane configurations. In total, 358 blue calico images were directly acquired from exhibitions and factories in Nantong, Jiangsu Province, and Jiaxing, Zhejiang Province, China. These images were captured by a digital camera (Olympus SP-600UZ) with a resolution of 3968 × 2976. The camera is sufficient for the threshold and segmentation of the blue calico elements. These captured blue calico images have two important features.
High fidelity of color. Thanks to the development of sophisticated image acquisition technology, the collected images are free from distortions and remain clear. There are two kinds of blue calico, white flowers with a blue background, and blue flowers with a white background. Sometimes, the white background is mixed with black, but contours of the whole image are generally distinct. The contours of the calico are uniform and the edges are clear in the image. The contours are mostly smooth and rounded, with a small amount of abrupt change, and a small number of stains. Four images among the captured blue calico are shown in Figure 1. They represent image examples of plant, animal, geometric, and human blue calico, respectively.
Blue calico: (a) plant; (b) animal; (c) geometry; (d) human.

Traditionally, a blue calico image is composed of many elements with different categories that are recognized by calico craftsmen. However, we have not found any evidence to confirm the number of classifications of elements determined in the previous literature. Moreover, we have not found any data set on the elements of blue calico. Therefore, we adopt the view of the trained calico craftsmen from the factories who propose that elements of the vein pattern consist of appropriately 12 categories based on key factors such as geometric shape, symbolic meaning, and frequency of use. Within the same category, the calico can be scaled or rotated. Likewise, elements with different sizes and orientation but having similar shapes are regarded as the same elements. The 12 common elements are shown in Figure 2. They are the circle element (CE), rice element (RE), column element (CLE), shell element (SE), rhombus element (RBE), tortoiseshell element (TTE), triangle element (TGE), crescent element (CSE), four-leaf element (FLE), fish-scale element (FSE), mountain element (MTE), and three-section element (TSE). Some elements come from Figure 3.
The elements of blue calico: (a) circle element; (b) rice element; (c) column element; (d) shell element; (e) rhombus element; (f) tortoiseshell element; (g) triangle element; (h) crescent element; (i) four-leaf element; (j) fish-scale element; (k) mountain element; (l) three-section element. Element extraction from blue calico: (a) the result of the segmentation of Figure 1(a); (b) contours with a serial number for Figure 1(a); (c) sub-images; (d) contours with a serial number for Figure 1(d); (e) contours with a serial number for a plant blue calico image.

Element extraction
The element extraction process includes background segmentation, element contour optimization, and element sample construction. A concise flow chart of element extraction is shown in Figure 4. In Figure 4, Num(•) and Tnum will be explained in Equation (2). The process results are shown in Figure 3.
Flow chart of element extraction.
Background segmentation
In order to extract each element from the blue calico, it is necessary to binarize the captured image. The initial photograph is firstly transformed into a grayscale image, then the binarization of the grayscale image is performed with the segmentation threshold proposed by Liu et al. 38 The value is 1 if the pixel is part of the element, and the value is 0 if the pixel is part of the background. Figure 3(a) is the result of the segmentation, where 1 is changed for 255 for visibility reasons in the image.
Element contour optimization
After binarizing the blue calico image, the pixels that belong to the element contour must meet the following conditions of contour tracking. If the value of a pixel is non-zero and there is at least one zero pixel in the region connected with it, then the pixel is located on the contour. A neighborhood of eight connected regions is used to determine the state of each pixel with regard to the contour. The contours are composed of a number of pixels, which are sorted automatically in a certain direction, such as counterclockwise, which is contour tracking.
Suppose a blue calico image has t(t ≥ 1) elements, and each element belongs to a contour, then the whole binary image will have t contours. The contours of all the elements in the blue calico image can be represented as follows
The resulting contour of each element consists of a series of oriented pixels, with the following features. The contour is only one pixel in width. Due to the abruptness of the element, such as prominent edges and corners, isolated points, overlaps, etc., the contour is not smooth or even a pseudocontour. Isolated points occur when the number of pixels in the contour is insufficient to represent the desired contour of the vein pattern element. In the process of image digitization, these defects need to be removed to make the contour smooth and attractive. Contour optimization based on the mean value method
39
is adopted. Since blue calico is handmade, there will be some defects or artefacts in the obtained image. These defects usually exist in the form of outliers, which represent the few pixels in the contour that belong to the artefact. They need to be deleted. When extracting the contour, a threshold value Tnum can be set to remove the contour whose number of pixels in the contour is less than Tnum. The definition is as follows
The result of the contour extraction process is shown in Figures 3(b), (d), and (e). To better understand the results of the contour extraction, we have filled each element with a different random color. The number on each element represents the order of the element.
Element sample construction
Each contour is clipped in order of its index number to generate an independent sub-image of blue calico elements, which is then saved. The size of the sub-image is determined by the coordinates of its contour pixels. The difference between the maximum x-coordinate and minimum x-coordinate of the contour pixels is calculated, and the difference between the maximum y-coordinate and minimum y-coordinate of the contour pixels is also calculated. The larger of the two difference values is taken as the size of the sub-image. Therefore, the shape of each obtained element sub-image is a square. These sub-images will form the CNN’s sample data set. The whole process of extracting the elements of blue calico is shown in Figure 3. After image preprocessing, the binary image is obtained, as shown in Figure 3(a). The contour tracking algorithm is then used to obtain the contour of each pattern element in the binary image. In order to see the extraction results more clearly, the region of each element in the image is filled with a random color, and numbered according to the order of extraction, as shown in Figure 3(b). By using a contour optimization algorithm, each element that meets the requirements is extracted, and the sub-image of each element with smooth, clear, and independent edges is generated sequentially. As shown in Figure 3(c), in total 131 sub-images are included in the data set. Image sizes are different in the data set, but the width and height of each sub-image are the same. As show in Figures 3(d) and (e), each element can be extracted from the blue calico. The experimental results show that the method of contour extraction is efficient.
CNN structure
The CNN uses network spatial relations to reduce the number of parameters needed for learning and to improve the training performance of the algorithm. The CNN consists of an input layer, a compound hidden layer, and an output layer. The hidden layer contains multiple convolution layers, pooling layers, and fully connected layers. The training process of the whole network is divided into forward propagation and backward propagation phases. Forward propagation classifies the input data with the current network parameters. Backward propagation updates the network training parameters.
Forward propagation. For the whole network, it is assumed that the image data of the original training set Backward propagation. This is also known as error propagation. For m training samples, the result of the linear prediction of each category will be output by the forward propagation phase of the network. According to the result and the expected output of network, the overall objective function of the network can be defined as follows
Proposed CalicoNet architecture
CalicoNet architecture
The CalicoNet architecture is based on CifarNet.
4
CifarNet has achieved excellent performance in the classification of various natural scene target images tagged using the CIFAR-10 and CIFAR-100 databases. The novel CalicoNet can be considered as a local feature self-learning method from low-level to high-level layers. There are three main convolutional layers and three pooling layers. Basic features are learned by the lower level layers, while more complex and abstract features are learned by the higher level layers. Interleaved with MAX and AVE pooling manipulation, they can capture deformable and invariant features through affine transformation. The latter fully connected layers can capture complex cooccurrence statistics, which drop the semantics of a spatial location. The final classification layer accepts the previous feature representation for the recognition of a given element. The final layer outputs a decision that is produced based on the end-to-end network feature map. This architecture is appropriate for learning local features from the element image data set. The overall structure of CalicoNet is illustrated in Figure 5.
Input image layer. Although the width and height of each sub-image are the same, all the sub-images are not the same size so they are scaled to 32 × 32 pixels. Convolution layer. The convolution layer is used to extract image features. C1, C2, and C3 are all convolution layers in Figure 5. The size of the convolution core is 5 × 5, with a stride of 1, and the width and height are padded with 2 pixels. After the three convolution layers, 48 feature maps of 32 × 32 pixels, 64 feature maps with 16 × 16 pixels, and 128 feature maps with 8 × 8 pixels are obtained, respectively. The ReLU activation function is used to extract the image features. Pooling layer. S1, S2, and S3 are the pooling layers in Figure 5, each unit of which is connected to a 3 × 3 neighborhood of the previous convolution feature map with a stride of 2, and the width and height are padded with 1 pixel. S1 adopts the MAX pooling strategy, while S2 and S3 adopt AVE. After three pooling layers, the output feature maps are 16 × 16 pixels, 8 × 8 pixels, and 4 × 4 pixels, respectively. Fully connected layer. FC1 and FC2 are the fully connected layers, in which the number of features output by FC1 and FC2 are both 256. The classifier is used to calculate the probability of belonging to each output category and produce the final classification result. At the output end of FC1 and FC2, there is a dropout layer to reduce the risk of overfitting. There is also a combination classification strategy to add to the structure, including a SoftMax classifier, a support vector machine (SVM),
40
and a random forest classifier (RFC).
41
Finally, the output of classification is generated as one of 12 possibilities.
The overall framework of the CalicaNet architecture.

The major innovations and contributions of CalicoNet are as follows.
Use of different pooling strategies. The pooling layers are located behind the convolution layer and aim to reduce the dimension of the feature map. Both MAX and AVE strategies are used. The MAX option provides better retention of textile features. The AVE strategy retains the overall data features and highlights background information. In the CifarNet model, there are three pooling layers, S1, S2, and S3, with 3 × 3 neighborhoods and a stride of 2. Different combinations of pooling strategies will affect the average classification accuracy of training and testing. Addition of a combination classification strategy to the structure. In a traditional CNN structure, an individual classifier is commonly used in the final fully connected layer as the label decision indicator. The learned features usually work best with the internal structure of a particular classifier. However, the decision made by an individual classifier sometimes does not represent the full information of the sample features. Therefore, a combination classifier is adopted in CalicoNet to determine the best matches between features and classifiers. The individual classifiers include SoftMax, SVM, and RFC. We use all three types of classifier between FC2 and the final classification layer to complete the combination process. This produces an output decision using the new convolutional features combined with the SoftMax, SVM, or RFC. The final output decision is made on the principle of the “minority is subordinate to the majority” with the aim of improving the final accuracy. Adjust CalicoNet to find the optimal network structure. The objective of this step is to improve the classification of the elements. The various strategies for the network structure are also investigated in this step. Choosing the best strategy can enable fast running times with high accuracy. The performance and corresponding running time of this model are analyzed by adjusting the structural parameters and pooling strategies until an optimal network structure is found. Add dropout techniques to CalicoNet. Dropout involves the random deletion of a proportion of neurons to reduce overfitting during training. By adding this dropout technology, the generalization performance of CalicoNet can be improved. In CalicoNet, the two fully connected layers, FC1 and FC2, employ dropout.
Parameter design and computation from CalicoNet
CNN: convolutional neural network; LRF: local receptive field.
Experimental results and discussion
Experimental results are presented in this section to demonstrate the feasibility of our proposed method. A server running Windows 7 with an Intel®CoreTM i7-7700K CPU@4.20GHz × 8 processor, 8 GB RAM (DDR4 2400 MHz × 2), and a GPU of the Nvidia GTX 1080 was used for image processing and data training. Image prepossessing was implemented using C++ and Python version 2.7. Subsequent CNN construction and training algorithms were implemented using the Caffe framework. The overall procedure of the experiment in this paper is explained as a block diagram shown in Figure 6. In Figure 6, pattern elements extracted from blue calico in the training phase and testing phase were also explained. In the training phase, the element samples of the training set were trained using the proposed CalicoNet. In the testing phase, the element samples of the test set were tested using the trained CalicoNet. Then, 12 classification results could be obtained from the CalicoNet. In addition, some interesting performances were discussed and analyzed in the experiments.
Block diagram representation of the overall procedure for element classification. CE: circle element; RE: rice element; CLE: column element; SE: shell element; RBE: rhombus element; TTE: tortoiseshell element; TGE: triangle element; CSE: crescent element; FLE: four-leaf element; FSE: fish-scale element; MTE: mountain element; TSE: three-section element.
Performance comparison with other contour extraction algorithms
In order to accurately evaluate the quality of various contour extracting algorithms, two criteria were adopted: the area-based DC (Dice Coefficient)
43
and contour-based CC (Contour Coefficient). They are defined as follows
The performance of element contour extraction based on the DC and CC
Element sample data set
Distribution of samples in the data set
CE: circle element; RE: rice element; CLE: column element; SE: shell element; RBE: rhombus element; TTE: tortoiseshell element; TGE: triangle element; CSE: crescent element; FLE: four-leaf element; FSE: fish-scale element; MTE: mountain element; TSE: three-section element.
Hyper-parameter optimization for CalicoNet
Hyper-parameters include the batch-size, dropout ratio, learning rate, and pooling strategy. The effects of the four parameters were analyzed in this section. The optimization of the network architecture was carried out on the validation set, just like other CNNs.
Effects of batch-size
The batch-size represents the image sample size when training the CNN, and this quantity is chosen to be the direction in which the gradient is the steepest. There is a close relationship between the rate of gradient descent and batch-size. In this paper, the experimental results were compared using nine different values of the batch-size. As shown in Figure 7, with an increasing batch-size, a better result could be obtained. Moreover, with a larger batch-size, the convergence speed for the validation accuracy increased. However, a batch-size that was too large could consume excessive computer resources, risking a system crash. Therefore, a batch-size of 256 was chosen as the optimum for training the CalicoNet.
Effects of batch-size on validation accuracy.
Effects of dropout ratio
Dropout is a powerful technique to address the problem of limited data. The process removes a proportion of network units with a fixed probability during the training phase, using the whole architecture at test time. The dropout technique is adopted to avoid the problem of overfitting the training data. In this experiment, the effect of varying this hyper-parameter was explored using values from 0.01 to 0.99, which is the generally recommended range.
44
As shown in Figure 8, with an increase in the dropout ratio, the validation accuracy increased from 0.01 to 0.4. However, once the dropout ratio was greater than 0.4, it began to decrease again. Since dropout could be seen as having a similar effect to activation function fitting, a high dropout ratio implied that more sub-models were used. A smaller dropout ratio suggested that useful information was easily filtered out for effective feature expression of the element. Thus, the network performed best with an intermediate dropout ratio (such as 0.4). However, a dropout ratio that was too high could result in an insufficient number of neurons to accurately model the relationship between the input and output. Thus, the dropout ratio for CalicoNet was set to 0.4.
Effects of dropout on validation accuracy.
Effects of learning rate
In a traditional CNN, different learning rates affected the speed of convergence for validation accuracy and loss, namely they converged to a stable state when the training stage terminated.42,45 The effectiveness of the two indices for validation accuracy and loss were compared for different leaning rate values. Figure 9 showed the comparison for six values of the learning rate in the range of values was from 5 × 10–6 to 1 × 10–3, based on experimental tests. The y-coordinate denoted the evaluation index, while the x-coordinate denoted the number of training iterations needed. A high learning rate could result in rapid convergence, whereas a low learning rate might result in slow convergence. When the learning rate was 5 × 10–4, the highest validation accuracy and lowest training loss were achieved. Therefore, a learning rate of 5 × 10–4 was chosen.
Effects of learning rate on validation accuracy and loss: (a) validation accuracy; (b) validation loss.
Effects of pooling strategy
The pooling layers are located behind the convolution layer and aim to reduce the dimension of the feature map. The pooling strategies include MAX and AVE. The MAX strategy improves the retention of texture features. The AVE strategy can retain overall data features and highlight background information. In the CalicoNet structure, there were three pooling layers, S1, S2, and S3, with 3 × 3 neighborhoods and a stride of 2. Different combinations of pooling strategies would affect the classification accuracy. Supposed that S1 + S2 + S3 was used to represent the combination of the three pooling layers. The possible combinations of strategy then could be represented by eight types: MAX + MAX + MAX (MMM); MAX + MAX + AVE (MMA); MAX +AVE + MAX (MAM); MAX + AVE + AVE (MAA); AVE + AVE + AVE (AAA); AVE +AVE + MAX (AAM); AVE + MAX + AVE (AMA); and AVE + MAX + MAX (AMM). Figure 10 showed a comparison for the eight options for the pooling strategy based on our experiments. The y-coordinate denoted the evaluation index, while the x-coordinate denoted the pooling strategy. When the pooling strategy was MAA, the highest validation accuracy was achieved. Therefore, a pooling strategy of MAA was chosen.
Effect of pooling strategy on validation accuracy.
Performance comparison between the combined and individual classifiers
Comparison between combination and single classifiers
SVM: support vector machine; RFC: random forest classifier.
Performance comparison with other sophisticated CNNs
The performance of CalicoNet compared with other sophisticated convolutional neural networks CNNs
TAET: test accuracy for each type; CE: circle element; RE: rice element; CLE: column element; SE: shell element; RBE: rhombus element; TTE: tortoiseshell element; TGE: triangle element; CSE: crescent element; FLE: four-leaf element; FSE: fish-scale element; MTE: mountain element; TSE: three-section element; TT: total time; MMR: modeling memory requirement; TMAP: test mean accuracy precision.
Table 5 presents a performance comparison for the 12 elements with different measurement parameters, such as test accuracy for each type (TAET), total time (TT), modeling memory requirement (MMR), and test mean accuracy precision (TMAP), for the data set. Adopting different construction scales for the CNN meant that the TAET of the 12 elements and the TT (including training and test times) varied greatly. There was a discrepancy among CE, TTE, and FLE, between SE and CSE, and between MTE and TSE. The reason lay in the similarity of their characteristics. For CE, RBE, TGE, FSE, and TSE, the features were unique. Therefore, the recognition rate was high, and the recognition results met the subjective visual characteristics. At the same time, the system operation time varied with the depth of the CNN. For example, it took nearly 10 hours to train a deep model with 101 layers in DenseNet, although a 99.36% TMAP was attained. CalicoNet achieved a TMAP of 98.66% with 1.13 hours TT, which was nearly 10 times faster than ResNet(101). Although the TT and MMR of LeNet were the least, the TMAP was the smallest. These results demonstrate the superiority of the proposed algorithm.
Based on the results of the experiments, the major advantages of the proposed approach compared to state-of-the-art element extraction and classification methods are as follows:
the proposed element extraction method exhibits good performance compared with other extraction algorithms; this is the first method established for the elemental data set of blue calico; four hyper-parameter optimizations show good performance in CalicoNet, which improves the validation accuracy in the training data set; through comprehensive analysis, as compared with other sophisticated CNNs, CalicoNet achieves satisfactory test accuracy in the test set.
Conclusions
Blue calico is part of the national intangible cultural heritage of China. It is necessary to use image processing technology to inherit and innovate upon the handwork art. However, little research has been conducted in this field. This is mainly due to the small number of original blue calico samples, the complicated pattern structure, and various elements. These factors make it difficult to achieve satisfactory results using traditional image processing techniques. Thus, there is a need for more researchers to participate in research using modern digital technologies. This paper presented an algorithm for extracting independent element sub-images from a blue calico image. The process of element extraction involved gray scaling, denoising, binarization, and contour tracking. Compared with the other two methods, the feasibility of the proposed method was verified. Although the proposed element extraction algorithm was designed to extract contours and elements from blue calico, the algorithm could be used for similar image processing tasks where there were large-scale patterns composed of small-scale features, such as electronic components in printed circuit boards (PCBs), fruits in plants, and various marks in traffic images. We constructed a data set using the element sub-images extracted from blue calico. A CNN classification training method (CalicoNet) based on an improved CifarNet structure was proposed. According to the shape features of elements, they were divided into 12 categories, and corresponding training, validation, and test samples were constructed. The average classification validation accuracy of the network was 99.2%, and the TMAP was 98.66%. Compared with other classification methods, the effectiveness of the proposed network structure for calico classification was verified, which laid a foundation for the automatic classification of elements of blue calico. The newly acquired element sub-images can be automatically classified according to the training results of the network, which will play a positive role in supporting the inheritance and innovation of blue calico. The proposed pipeline of classification has not previously been used, and thus this paper is the first to report this method for the blue calico element classification task. This approach can be improved further as follows. (1) Expanding the data set to include additional significant species in the future. (2) An unsupervised machine learning algorithm could also be used to reduce the need for human intervention during the training process. (3) The segmentation of overlapping calico element clusters remains a significant challenge. The motivation for our research is to innovate upon traditional pattern designs of blue calico using modern image processing techniques. Developing a database of elements is the first task. How to use the database to produce new patterns is another great challenge.
Footnotes
Acknowledgements
We thank LetPub for its linguistic assistance during the preparation of this manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Zhejiang Public Welfare Technology Research Project Fund of China (LGG20F010010) and the City Public Welfare Technology Application Research Project of Jiaxing Science and Technology Bureau of China (2018AY11008, 2020AY10009).
