Abstract
Based on the case images in the smart city management system, the advantage of deep learning is used to learn image features on its own, an improved deep convolutional neural network algorithm is proposed in this paper, and the algorithm is used to improve the smart city management system (hereinafter referred to as “Smart City Management”). These case images are quickly and accurately classified, the automatic classification of cases is completed in the city management system. ZCA (Zero-phase Component Analysis)-whitening is used to reduce the correlation between image data features, an eight-layer convolutional neural network model is built to classify the whitened images, and rectified linear unit (ReLU) is used in the convolutional layer to accelerate the training process, the dropout technology is used in the pooling layer, the algorithm is prevented from overfitting. Back Propagation (BP) algorithm is used for optimization in the network fine-tuning stage, the robustness of the algorithm is improved. Based on the above method, the two types of case images of road traffic and city appearance environment were subjected to two classification experiments. The accuracy has reached 97.5%, and the F1-Score has reached 0.98. The performance exceeded LSVM (Langrangian Support Vector Machine), SAE (Sparse autoencoder), and traditional CNN (Convolution Neural Network). At the same time, this method conducts four-classification experiments on four types of cases: electric vehicles, littering, illegal parking of motor vehicles, and mess around garbage bins. The accuracy is 90.5%, and the F1-Score is 0.91. The performance still exceeds LSVM, SAE and traditional CNN and other methods.
Keywords
Introduction
With the rapid development of Internet technology, urban management is gradually moving towards digitization, networking, and intelligence [29], and urban management technology based on Internet technology is becoming more and more mature [30], such as in the smart city management system [32], ordinary citizens can use the smart phone APP (Application) to report problems in urban management (such as random parking of vehicles, city appearance and environmental problems, etc.) with one click. At the same time, when reporting this kind of urban management case, the user needs to upload at least one on-site picture, on-site location and other information corresponding to the case. The urban management staff can immediately follow the picture information and location information of the reported case after seeing the problem case reported by the citizen, they go to the location of the incident for corresponding processing, this greatly improves the efficiency of urban management [18]. However, as the number of users continues to increase, the number of cases reported every day is also increasing. It is time-consuming, laborious and inefficient to select cases of different types by manual only. Therefore, there is an urgent need for a method that can quickly classify urban management cases to assist urban management and make urban management more intelligent. Ambient intelligence (AmI) is intrinsically and thoroughly connected with artificial intelligence (AI). Multi-agent systems and the Semantic Web, ambient assisted living and e-healthcare, AmI for assisting medical diagnosis, ambient intelligence for e-learning and ambient intelligence for smart cities are researched from information-society laws, superintelligence [9]. The increasing deployment and usage of ‘smart’ technologies determining a wide range of everyday life activities. From taking different perspectives on common issues, commonalities and relationships are provided between them, and anchor points for important challenges are provided in the field of ambient intelligence [24].
Based on the information in the case image, the case can be automatically classified by the method of image classification. At present, there are many methods for image classification, such as the Support Vector Machines (SVM) classification method proposed by Vapnik et al., and the Lagrangian support vector machine (LSVM) method used by Lu et al. [17]. However, it is generally necessary to extract texture features such as HOG (Histogram of Oriented Gradient) features of the image before using these methods, and the quality of image feature extraction will have a direct impact on the final classification result. More commonly used image feature extraction methods include Gabor wavelet image texture feature extraction [28], Gauss Markov Random Field (GMRF) remote sensing image feature extraction [20], SIFT (Scale-invariant feature transform) image feature extraction [16]. Deep learning was proposed by Hinton in 2006. Its remarkable feature is that it can learn image features by itself, and form a more abstract high-level feature representation by combining low-level features of data [10], it can greatly reduce various overheads for manual feature extraction. Subsequently, some deep learning methods developed rapidly, such as Denoising Autoencoder (DAE) [12], Sparse Autoencoder (SAE) [5], Convolutional Neural Networks (CNN) [14], etc. Convolutional neural network (CNN) is used to recognize the minist handwriting data set, the error rate has been lower than that of human manual recognition [7]. Yang et al. used SAE to classify images [31]. In 2012, Krizhevsky et al. performed image classification on the ImageNet database, and the test error rate of top-5 was 15.3% [14]. Internet of things (IoT) applications for smart cities have become a primary target for advanced persistent threats (APT) of botnets. A botnet detection system is proposed based on a two-level deep learning framework for semantically discriminating botnets and legitimate behaviors at the application layer of the domain name system (DNS) services [27]. In the first level of the framework, the similarity measures of DNS queries are estimated using siamese networks based on a predefined threshold for selecting the most frequent DNS information across Ethernet connections. In the second level of the framework, a domain generation algorithm (DGA) based on deep learning architectures is suggested for categorizing normal and abnormal domain names. The framework is highly scalable on a commodity hardware server due to its potential design of analyzing DNS data. The explosive growth of Internet and the recent increasing trends in automation using intelligent applications have provided a veritable playground for malicious software (malware) attackers. With a variety of devices connected seamlessly via the Internet and large amounts of data collected, the escalating malware attacks and security risks are a big concern. While a number of malware detection methods are available, new methods are required to match with the scale and complexity of such a data-intensive environment. A novel and unified hybrid deep learning and visualization approach is proposed for an effective detection of malware [26]. The application of hybrid image-based approaches with deep learning architectures are proposed and investigated for an effective malware classification. Diseases in plants are a great threat to the yield of the crops thereby causing famines and economy slow down. Machine learning model is used for classifying tomato disease image dataset [8], necessary steps are proactively taken to combat such agricultural crisis. In this work, the dataset is collected from publicly available plant–village dataset. The significant features are extracted from the dataset using the hybrid-principal component analysis–Whale optimization algorithm. Further the extracted data are fed into a deep neural network for classification of tomato diseases. The grid denotes the electric grid which consists of communication lines, control stations, transformers, and distributors that aids in supplying power from the electrical plant to the consumers. Presently, the electric grid constitutes humongous power production units which generates millions of megawatts of power distributed across several demographic regions. There is a dire need to efficiently manage this power supplied to the various consumer domains such as industries, smart cities, household and organizations. In this regard, a smart grid with intelligent systems is being deployed to cater the dynamic power requirements. A smart grid system follows the Cyber-Physical Systems (CPS) model, in which Information Technology (IT) infrastructure is integrated with physical systems. In the scenario of the smart grid embedded with CPS, the Machine Learning (ML) module is the IT aspect and the power dissipation units are the physical entities. In this research, a novel Multidirectional Long Short-Term Memory (MLSTM) technique is being proposed to predict the stability of the smart grid network [3]. The obtained results are evaluated against other popular Deep Learning approaches such as Gated Recurrent Units (GRU), traditional LSTM and Recurrent Neural Networks (RNN).
Internet of Things (IoT) empowered Heating, Ventilation, and Air Conditioning (HVAC) buildings are considered to monitor and control the regulation of thermostats, sensors, actuators, and control devices smartly. A novel model named PersonalisedComfort is proposed to predict the thermal sensation votes of individuals living in a building [21]. Conventional machine learning algorithms and deep learning algorithms are used to predict the thermal sensation vote. The Internet of Things (IoT) provides smart solutions for future urban communities to address key benefits with the least human intercession. A smart home offers the necessary capabilities to promote efficiency and sustainability to a resident with their healthcare-related, social, and emotional needs. In particular, it provides an opportunity to assess the functional health ability of the elderly or individuals with cognitive impairment in performing daily life activities. This work proposes an approach named Cognitive Assessment of Smart Home Resident (CA-SHR) to measure the ability of smart home residents in executing simple to complex activities of daily living using pre-defined scores assigned by a neuropsychologist. CA-SHR also measures the quality of tasks performed by the participants using supervised classification. Furthermore, CA-SHR provides a temporal feature analysis to estimate if the temporal features help to detect impaired individuals effectively. The goal of this study is to detect cognitively impaired individuals in their early stages. CA-SHR assess the health condition of individuals through significant features and improving the representation of dementia patients. For the classification of individuals into healthy, Mild Cognitive Impaired (MCI), and dementia categories, ensemble AdaBoost is used [13]. This results in improving the reliability of the CA-SHR through the correct assignment of labels to the smart home resident compared with existing techniques. A collaborative health care plan using a multi-agent system assists adult individuals to live an independent healthy life by analyzing their routine life activities. Robust recognition of activities provides services such as health monitoring and fitness assessment. A novel Collaborative Health Care Plan system is proposed [22], the independent living of an individual is improved by using a smartphone sensor, machine learning algorithm, multiple agents i.e. doctor, gym trainer, guardian, and intelligent ranker agent. The novelty of the devised approach is that it shares the daily life assessment of activities among care provides which in return provides a care plan or recommendation to ensure good health of the individual. A machine learning algorithm is used to recognize the adult individual’s daily life physical activities.
The images in urban management cases are all captured by various ordinary mobile phones, and the background information is relatively complicated and the image quality is low. Therefore, the above-mentioned method of classifying such images cannot achieve ideal results. Aiming at this problem, the images are analyzed in the city management case, the convolutional neural network is used to automatically extract the characteristics of the image features, and an improved convolutional neural network algorithm (ZCNN) is designed. First, unified ZCA (Zero-phase Component Analysis)-whitening processing is performed on the acquired case images [1], the correlation is effectively reduced between image features, and then an 8-layer convolutional neural network is built according to the image features, and the appropriate convolution kernel size is set. After the convolutional layer, a down-sampling layer is connected to the pooling layer, and the Rectified Linear Unit (ReLU) is used to accelerate the training process [11,19]. The dropout technology is used in the pooling layer to randomly disconnect the network nodes, the algorithm is prevented from overfitting. Finally, when the model is fine-tuned, the stochastic gradient descent method is used to calculate the parameters of the model layer by layer. In addition, in order to improve the accuracy of the algorithm, the BP (Back Propagation) algorithm is used to optimize the network parameters [6]. In this article, two types of case pictures (1300 in each class, 2600 in total) and four types of case pictures (1300 in each class, 5200 in total) experiments are used to verify the effectiveness of the algorithm.
The smart city management platform is based on the digital city management system, with the goal of solving the pain points and difficult problems that need to be solved urgently in city management, better serving the public, and forming a typical application of “smart city”. The smart urban management platform not only focuses on the comprehensive business management of the Urban Management Comprehensive Administrative Law Enforcement Bureau itself, but also provides city-wide public services to functional departments at all levels and the public from the perspective of large urban management. The research content of this article is part of smart city management, and the application of artificial intelligence helps the automation and intelligence of smart city management.
Materials and methods
Data preprocessing-ZCA whitening
Image data preprocessing is a crucial step in the task of image classification, especially for natural images. Preprocessing affects the final classification effect because of the strong correlation between adjacent pixels of the image. Image neighboring pixel correlation is Fig. 1. Therefore, how to effectively remove the correlation of image data and it is especially necessary to reduce its redundancy [2]. Whitening is the process of transforming the covariance matrix of the data into the identity matrix. Its purpose is to reduce the redundancy of the input data and make the whitened data closer to the original data. In terms of value, it is mainly to make the data have a uniform covariance, and each feature has the same variance

Image neighboring pixel correlation.
Specific steps are as follows:
First, the brightness and contrast of the data are normalized. For the pixel value
Where The covariance matrix of the training sample is calculated in equation (2):
Since the data are correlated, the covariance matrix calculated at this time is a non-diagonal matrix.
Reduce the correlation between the data and transform the covariance matrix into a diagonal matrix. Equation (3) is as follows:
Where ZCA whitening is used, the equation is as follows:
Wherein, δ is a very small constant, set to 0.01, and
After the image data is whitened by ZCA (Zero Components Analysis), the correlation between the pixels and the redundancy of the data can be reduced, so that the variance of each dimension of the feature vector is equal, and the data is effectively unified.
In order to make the whitened data as close to the original data as possible, the processed daIta can be transformed back to the original space, which is ZCA whitening. The full name of ZCA Whitening is Zero-phase Component Analysis Whitening. The understanding of [zero phase] is that, relative to the original space (coordinate system), the whitened data does not rotate (coordinate transformation). Interrelationships between features sometimes appear in training data. For image data, the issue of interrelationship is even more serious. Although the convolutional layer can solve these local correlations through learning, it is always not straightforward to obtain through learning. If you directly manipulate the input data to solve some data correlation problems, it will definitely make training easier.
Convolutional Neural Networks (CNN) is proposed by Huber and Wiesel [15]. Its unique network structure can effectively reduce the complexity of feedback neural networks. The basic structure mainly includes two layers: one layer is convolution Layer, an image is convolved with the set convolution kernel, the convolution value is weighted and biased, and then it is obtained through an activation function; the other layer is the down-sampling layer, it is also called the pooling layer, the principle of image local correlation is used to sub-sampling the image, the amount of data processing is reduced while retaining useful information.
In order to understand the process of convolution and pooling more clearly, as shown in Fig. 2, the size of the input image is

Convolution and pooling structure.
Wherein,
The data studied in this article are images of urban management cases. This type of images are all captured by ordinary mobile phones, and the photographers are ordinary citizens. The shooting locations are all corners of the city. Therefore, this type of image has a complex background, low picture pixels, and the characteristics of different image sizes, and there is strong correlation between image pixels. Considering that traditional image recognition and classification methods require very complex feature extraction, this article is based on the convolutional neural network (CNN) algorithm in deep learning. CNN can learn the advantages of picture features by itself and design network models. Considering that this model should be applied to smart city management case data, the network structure should not be too large while ensuring the classification accuracy, so as to minimize training time and reduce operating costs. In order to remove the correlation between the pixels of the image and make the image size uniform, image normalization processing and ZCA whitening processing are used in the model.
In this algorithm, firstly, normalization processing and ZCA whitening processing are applied to the input image. Secondly, the size of the convolution kernel is designed by this method, it is
The image data after ZCA whitening preprocessing is uniformly processed into
Set the size of the convolution kernel to
Each feature map in the C1 layer is obtained through average pooling, that is, the average value of every 4 pixel values in the feature map is calculated, then the weighted value is added, the bias is added, and ReLU is used as the activation function. At the same time, in the pooling layer, dropout of this method is used to randomly disconnect 10% of the nodes, and finally 6 feature maps of the S2 layer with a size of
The obtained S2 layer is used as the input layer of the next layer, the size of the convolution kernel is
After the feature map of layer C3 is processed in the same step (3), the S4 layer is obtained, and the size of the feature map is
In the same step (4), the S4 layer is used as the input layer, the size of the convolution kernel is still
After the C5 layer is processed in the same step (3), the feature map of the S6 layer is obtained, and the size is
Finally, the obtained S6 layer is fully connected, and the classification result is obtained through the softmax classifier.
The implementation convolutional neural network structure (ZCNN) is in Fig. 3, and the implementation process of convolutional neural network model is in Table 1. (https://www-sciencedirect-com-443.web.bisu.edu.cn/topics/engineering/convolutional-neural-network).

Implementation convolutional neural network structure (ZCNN).
Implementation process of convolutional neural network model
Experiment based on urban management case data
The data of this experiment are mainly derived from the smart city management system (referred to as “Urban Guantong” system) developed in cooperation with Qingxiu District, Nanning City, China. This system includes the web page terminal and the smart phone APP terminal (supporting iOS And Android two platforms), when citizens or workers discover problems in the city, they can take pictures of the crime scene with their personal mobile phones, select the description of the case, and report to the system with one click. This article conducted two sets of experiments on different types of case data: the first group randomly selected 1300 cases are from the road traffic category and the city appearance environment category in the system, and a total of 2600 cases were used for the two-category experiment. The second group randomly selected 1300 cases are from the four types of random placement of electric vehicles, littering of garbage, illegal parking of motor vehicles, and mess around garbage bins in the system, and a total of 5200 cases were classified into four categories. Some sample images of smart city management cases are shown in Fig. 4.

Sample graph of experimental data.
In this paper, several commonly used classification evaluation indicators are used as the evaluation criteria of the model, specifically Accuracy, Precision, Recall, F1_Score [4].
Let Accuracy of the current class: The accuracy of the current class: The recall rate of the current category: F1 value of the current class: Average accuracy: Average recall rate: Average accuracy rate: Average F1 value:
Two-class experiment
In this experiment, 1300 case pictures of road traffic (Road_Traffic) and city appearance environment (City_Environment) are used, for a total of 2600 pictures, of which 1,000 pictures are taken as the training set, and the remaining 300 pictures are taken as the test set. They are to calculate various Accuracy (precision), Precision (accuracy rate), Recall (recall rate)
Two classification experiment results
Two classification experiment results

ROC curve.

Mean square error curve.
According to the above experimental results, it can be seen that the performance of the ZCNN model designed in this paper is better than LSVM, SAE and the traditional CNN algorithm in the two classification problems of smart urban management. And according to Fig. 5, it can be seen that this algorithm convergence can be obtained in a short time, and the error gradually decreases and tends to stabilize.
In this experiment, four types of pictures were used: electrocar, rubbish, car parking (car), and dustbin (dustbin), 1300 pictures each. totaling 5200 sheets; among them, 1000 sheets are selected for the training set, a total of 4000 sheets are selected as the training set, and the remaining 300 sheets are selected for each, and a total of 1 200 sheets are used as the test set; the various Accuracy (precision), Precision (accuracy) and Recall (Recall rate)
Results of four classification experiments
Results of four classification experiments

ROC curve.

Mean square error curve.
The ImageNet dataset is currently the world’s largest image recognition library, containing more than 14 million high-definition pictures. In the ILSVRC-2012 competition, Alex et al. designed the AlexNet model and tested the top-level image in a subset of ImageNet. The error rate of 5 is 15.3% [14]. In addition, the VGG-Net designed by Simonyan et al. has an error rate of 7.3% in the ILSVRC2014 competition based on this data set [23]. Based on this data set, the Google-Net designed by Szegedy et al. measured a top-5 error rate of only 6.67% [25]. This type of network structure is a large-scale deep neural network model based on multi-GPU parallel acceleration. Although it can obtain high accuracy in the recognition of large databases, it is much higher than the training time and cost of this article. The designed model is not suitable for the image classification of urban management cases mentioned in this article. The specific model structure is shown in Table 4.
Model structure comparison (the total number of layers only calculates the convolutional layer and the fully connected layer)
Model structure comparison (the total number of layers only calculates the convolutional layer and the fully connected layer)
It can be seen from Table 4 that the method proposed in this article does not use GPU acceleration and uses an ordinary computer configuration (Core i5, 4 GB memory, MATLAB2018a) although only part of the data in the ImageNet data set is used for experiments. It can still complete the training of the model quickly. Compared with larger network structures such as AlexNet, VGG-Net, and Google-Net, it greatly shortens the training time, reduces costs, and improves work efficiency. It is suitable for the classification of urban management cases mentioned in this article and other small and medium-sized application scenarios.
In the ImageNet database, 4 types of image data are randomly selected for classification experiments with 1300 images each, a total of 5200 images, and the experimental data is shown in Fig. 9.

ImageNet data sample graph.
In the experiment, 4 types of pictures are selected as the training set, each with 1000 pictures, a total of 4000 pictures, and the remaining 300 pictures, a total of 1200 pictures, as the test set. The experimental results are shown in Table 5. According to Table 5, it can be seen that the method proposed in this paper can still obtain a higher accuracy rate for the ImageNet data set in a shorter training time. The ROC curve is shown in Fig. 10, and the mean square error curve is shown in Fig. 11. It can be seen that convergence can be achieved in a relatively short time.
Multi-classification experiment results of ImageNet dataset

ROC curve.

Mean square error curve.
As the application of smart city management systems becomes more and more popular, the number of cases reported daily is also increasing. For a large number of urban management cases reported by citizens and staff, it is time-consuming and labor-intensive to select different types of urban management cases, which is extremely inefficient, and due to personal misunderstandings of different types of cases, a large number of cases are misclassified, it results in the city disorders in managing cases.
In this paper, based on the deep learning convolutional neural network algorithm, an improved ZCNN algorithm is designed to automatically classify the images of urban management cases, so as to achieve the effect of automatic classification of cases. First, the city management case image are obtained, the case image is preprocessed with ZCA, an 8-layer convolutional neural network model is designed, and a
Footnotes
Acknowledgements
This work was supported by the Scientific Research Project (NO. 17C0893, NO. 20B335, NO. 18C1122) of Hunan Provincial Education Department, China.
Conflict of interest
The authors have no conflict of interest to report.
