Intelligent image recognition using lightweight convolutional neural networks model in edge computing environment

Abstract

In order to enhance the performance of intelligent image recognition, this study optimizes the image recognition model through lightweight convolutional neural networks (CNNs) and cloud computing technology. The study begins by introducing the relevant theories and models of edge computing (EC) and lightweight CNNs models. Next, this study focuses on optimizing traditional image recognition models. Finally, the effectiveness and reliability of the proposed model are experimentally validated. The experimental results indicate that, when recognizing 1000 images, the average recognition times per image on cloud servers and edge servers are 13.33 ms and 50.11 ms, respectively. Despite the faster speed of cloud servers, the performance of edge servers can be improved by stacking servers. When the number of edge servers reaches 4, their recognition speed surpasses that of the cloud server model. Additionally, comparing the latency and processing time between EC and cloud computing architectures, it is observed that, with an increase in the number of processed images, the average processing time per image in the EC architecture remains relatively stable and consistent. In contrast, the average processing time gradually increases in the cloud computing architecture. This indicates a significant impact of the number of images on the processing rate of the cloud computing architecture. Therefore, as the time gap in processing between cloud computing and EC increases, the advantages of the EC architecture become more apparent. This study’s significance lies in advancing the development of deep learning technology and providing possibilities for its widespread practical application. The contribution of this study lies in promoting the development of EC and lightweight neural network models, offering valuable references and guidance for practical applications in related fields.

Keywords

Edge computing lightweight CNNs image recognition cloud computing technology edge server Elastic Compute Service

1. Introduction

The essence of image recognition and retrieval resides in the algorithms utilized for image classification. The efficacy of these algorithms directly influences the precision of recognition and retrieval outcomes [1]. Deep learning (DL) has emerged as a seminal domain within machine learning, notably in computer vision, and has experienced significant advancements in recent years. However, conventional cloud computing architectures face myriad challenges in deploying these models. Primarily, the centralized processing of all tasks can result in bandwidth bottlenecks, thereby intensifying the strain on central processing [2]. Moreover, in its capacity as a computing node, the cloud manifests a discernible latency in processing real-time data. This latency becomes particularly pronounced when managing multiple tasks, as network congestion compounds complexities associated with delay. Consequently, the optimization of intelligent image recognition has become a pivotal focus in contemporary research [3]. This study’s primary focus is optimizing intelligent image recognition through lightweight convolutional neural networks (CNNs) and edge computing (EC). Due to various challenges faced by traditional cloud computing architectures when implementing DL models, this study aims to mitigate these issues through EC, thereby improving the performance and efficiency of intelligent image recognition. Additionally, this study explores an innovative approach of replacing the traditional Elastic Compute Service (ECS) with edge servers to validate the effectiveness and feasibility of the proposed model.

In a study conducted by Zheng et al., a methodology grounded in random walk was proffered for the segmentation of images depicting moving objects aimed at mitigating irregular jitter induced by external factors. The algorithm underwent optimization to explore novel low-level video segmentation feature points. The adverse effects of video jitter resulting from impulse noise pollution were alleviated by electing the fitness function predicated on optimal edge feature points and instituting a globally optimal search rule Marker points derived from the optimal bottom-edge feature points were delineated, culminating in the optimal segmentation of video images. Additionally, the inquiry delved into the intricacies of target occlusion during detection and introduced a particle filter-based vehicle tracking algorithm that assimilated multiple features. Empirical findings illustrated that amalgamating the random field vehicle occlusion separation model with the particle filter tracking algorithm and offset compensation augmented the accuracy and robustness of vehicle tracking in intricate environments [4]. Bazi et al. curated a landscape dataset focalized on a specific scenic locale, encompassing eight distinct landscape typologies: sunrise and sunset, rime, snow, cloud sea, autumn mountain and red leaves, mirage, Buddha’s light, and waterfall clouds. In order to meet the requisites of landscape recognition in pragmatic scenarios, a compendium of 2167 landscape images from diverse situations was amassed through online acquisition and on-site image capturing. Manual annotations were executed for each target’s position and category within the images. Furthermore, data augmentation techniques were employed to expand the dataset to redress the issue of disparate sample sizes in the landscape dataset. Subsequently, a system was established to facilitate real-time recognition of landscapes in practical environments. Recognition tests conducted with corresponding images substantiated the viability of the proposed lightweight neural network [5]. He et al. posited an attention feature extraction network to fortify the network and glean features from the most discriminative regions in the image. The goal was to address the complex challenge presented by weakly supervised fine-grained image recognition algorithms, specifically in capturing the most discriminative feature regions. The network incorporated a feature enhancement method to underscore the significance of pertinent feature map channels. Additionally, a flexible pooling method was deployed to alleviate information loss within the network. The study formulated techniques for locating discriminative regional networks and masking discriminative regional networks to engage with the backbone network, thereby refining the parameter information within the network. In order to enhance the network’s corrective effect, class center loss, complementary loss functions, and classification loss were conceived and implemented [6].

Prior studies have demonstrated that the enhancement of neural network models requires structural modifications to input and hidden layers and additions or alterations to the overall model architecture. This investigation focuses on exploring an innovative approach involving the replacement of servers in the image recognition process. The introductory Section 1 explicitly outlines the study questions, objectives, and significance of the study. Section 2 delves into the concepts of EC, lightweight CNNs models, and the optimization of such models using technologies related to EC. Section 3 details the proposed optimized model, which entails the substitution of the conventional ECS with edge servers. Section 4 substantiates the effectiveness and rationale of the proposed model through rigorous experimental analysis. Finally, Section 5 furnishes the conclusion, encapsulating the principal findings and engaging in a discourse on limitations and potential avenues for future research. Therefore, the innovation of this study lies in integrating EC and lightweight CNNs to optimize intelligent image recognition. Additionally, it involves the design of a performance-optimized lightweight CNNs model suitable for EC environments, significantly enhancing the speed and efficiency of image processing. The experimental contribution is evident in the proposed model surpassing traditional cloud computing methods in terms of performance, particularly demonstrating higher efficiency and accuracy when handling large volumes of image data. This presents an effective solution for real-time image processing scenarios, showcasing extensive application potential in areas such as traffic monitoring and medical image analysis.

2. Intelligent image recognition optimization under EC and lightweight CNNs

2.1 Lightweight CNNs

Conventional CNNs adhere to a well-defined architectural configuration encompassing distinct layers such as input, convolutional, Rectified Linear Unit (ReLU), pooling, fully connected, and output layers. The convolutional and ReLU layers maintain a consistent framework, effectively mapping input data to the corresponding output [7]. Derived from conventional neural networks, CNNs integrate convolutional and pooling layers, incorporating features such as multi-channel convolution, weight sharing, local receptive fields, and pooling operations [8]. In the domain of image recognition, CNNs adeptly address challenges associated with preserving image features and efficiently handling substantial volumes of data, thereby demonstrating superior performance compared to traditional neural networks [9]. The fundamental structure of CNNs is elucidated in Fig. 1.

Figure 1.

Basic structure of CNNs.

The convolutional layer assumes a pivotal role in the extraction of features from images, incorporating multiple feature channels wherein neurons within a given feature map share weights associated with the convolutional kernel. This weight-sharing mechanism efficiently diminishes the total count of model parameters [10]. Each convolutional kernel possesses the capability to extract distinct features, and the ensuing feature maps are derived through a synergistic abstraction of multiple feature maps originating from the preceding layer.

$\displaystyle x_{i}^{l}=f\left({\mathop{\sum}\limits_{j\in N_{i}}x_{j}^{l-1}% \ast K_{ij}^{l}+b_{i}^{l}}\right)$ (1)

$x_{j}^{l-1}$ is the input of layer $l$ , $x_{i}^{l}$ is the i-th feature of layer $l$ , and $f$ represents the nonlinear activation function. $K_{ij}^{l}$ is the convolution kernel type, $\ast$ represents the convolution operation, and $b_{i}^{l}$ represents the bias term of the convolutional layer. The calculation equation for the size of the data feature map after convolution is shown in Eq. (2):

$\displaystyle C=\frac{N-F+2P}{S}+l$ (2)

$C$ is the size of the data feature map after convolution, $N$ is the size of the input feature map, $F$ is the size of the convolution kernel, $S$ is the step size, and $P$ is the fill size.

An integral facet of lightweight CNNs involves the substitution of conventional convolutions with deeply separable convolutions, accompanied by the introduction of two contraction hyperparameters aimed at diminishing the total parameter count [11]. Notably, three predominant types of lightweight CNNs include SqueezeNet, MobileNet V2, and MobileNet V3. SqueezeNet prioritizes advancements in computational speed over accuracy [12]. It optimizes by minimizing parameter quantity and overall computational workload while upholding accuracy and enhancing computational speed. MobileNet V2 refines the network structure established in MobileNet V1 by incorporating Linear Bottleneck and Inverted Residuals, resulting in heightened accuracy [13]. The two-step structure of the model is delineated in Fig. 2.

Figure 2.

Bottleneck structure under two steps: (a) Linear Bottleneck (b) Inverted Residuals.

The Linear Bottleneck structure depicted in Fig. 2 is an efficient network design primarily composed of separable convolutional and fully connected layers. The inspiration for this design stems from the Inception v4 network, which employs a unique approach to handling channel numbers within the network. In the Linear Bottleneck, the channel count is effectively reduced through the use of 1 $\times$ 1 convolutional operations. This approach not only diminishes the model’s parameter count but also reduces computational complexity, thereby enhancing the operational efficiency of the network. Additionally, the Linear Bottleneck structure incorporates group convolution layers at the bottom of the network. Group convolution is a technique that divides input channels into multiple groups and performs convolution operations on each group separately. This design significantly accelerates the inference speed of the network. Group convolution, by reducing the number of parameters and computational load, makes the model more suitable for operation on resource-constrained devices, such as smartphones and embedded devices. The Inverted Residuals structure is another innovative design that constructs the network by stacking multiple lightweight convolutional operations. The core of this structure is a technique called “inverted residual connection.” The inverted residual connection first employs a lightweight 1 $\times$ 1 convolutional layer to expand the channel count, followed by depthwise separable convolutional layers to extract features. Depthwise separable convolution is a technique that performs channel-wise convolution followed by a 1 $\times$ 1 convolution, effectively extracting features while maintaining the parameter count and reducing computational load. The benefit of using inverted residual connections is that it moderately reduces the computational load and parameter count in CNNs. This method not only improves the operational efficiency of the model but also preserves good performance. Inverted residual connections are particularly suitable for building lightweight DL models, especially those intended to run on edge devices.

MobileNet V3 introduces a Squeeze and Excitation neural network module, which assimilates global information to enhance valuable features and suppress irrelevant ones, thus deriving effective analytical features. This module reduces the channel count to one-fourth of the original, contributing to a more resilient model architecture [14]. Moreover, MobileNet V3 decreases the convolution kernel channels in the header from 32 to 16, augmenting computational speed. While preceding iterations of MobileNet employed the ReLU6 activation function, MobileNet V3 adopted a novel activation function named hard-swish. The hard-swish function replaces the sigmoid function utilized in the regular swish function with ReLU6, ameliorating the computational inefficiency associated with the sigmoid function. As hard-swish proves more suitable for deep-level networks, the initial half of MobileNet V3 utilizes the regular ReLU activation function, reserving the hard-swish function solely for the latter half, resulting in time savings and heightened accuracy [15].

2.2 EC

Conventional cloud computing relies on virtualization technology for uniform resource management across space and time, ensuring the reliability of platform services [16]. However, the escalating number of endpoints and data volumes has exposed challenges within traditional cloud computing [17]. Firstly, the burgeoning demand for cloud performance strains the ECS, diminishing the cost-effectiveness of cloud computing and inducing network transmission pressure. Secondly, data transmission between terminals and the cloud introduces communication delays that fall short of the real-time requirements in specific application scenarios [18]. In order to contend with these challenges and address diverse application needs, the integration of EC into the computing paradigm has become imperative [19]. EC embodies a distributed computing architecture employing a hierarchical scheme involving the collaboration of diverse devices and the allocation of local resources. By offloading some or all tasks from the cloud to EC devices proximate to the user terminal, network communication latency is reduced, alleviating the load on the cloud [20].

The EC architecture comprises three primary components: terminals, edge servers, and ECS, with each component representing a layer of task modules [21]. Each layer assumes specific responsibilities based on task complexity, utilizing corresponding hardware equipment with associated costs to achieve its objectives. The collaborative interconnection of these modules constitutes the comprehensive EC architecture, enhancing the system’s real-time security, efficiency, and scalability [22]. Figure 3 provides a detailed representation of a specific architecture.

Figure 3.

EC architecture.

In Fig. 3, terminal devices play a crucial role within the EC framework, primarily undertaking tasks such as data collection, sensing, preprocessing, and local computation. In this system, terminal devices function as data sources, collecting data from various sensors such as temperature, humidity, images, and sounds. Subsequently, they preprocess this data, incorporating filtering, noise reduction, and data format conversion steps to reduce data transmission volume and enhance data quality. Furthermore, terminal devices may engage in basic local computations, such as simple data analysis and real-time feedback, aiming to diminish reliance on remote servers. Edge servers assume a bridging role within the EC system, responsible for further processing, storage, and management of data and preliminary data analysis and decision-making. A key function of edge servers is the initial preprocessing of data received from terminal devices, encompassing operations such as data compression and feature extraction. Through such processing, edge servers can reduce the data volume requiring transmission to cloud servers, lowering network latency and bandwidth consumption. Additionally, edge servers can perform more intricate analyses and processing, such as running lightweight machine learning models providing swift decision support. Cloud servers serve as the central nodes in the EC system, undertaking tasks such as deep computation, model training, data storage, and complex data analysis. Possessing robust computational and storage capabilities, cloud servers can handle extensive datasets and execute complex machine learning and DL models. Users can effortlessly deploy and manage their applications through cloud service platforms, accessing more sophisticated features and business logic. Furthermore, cloud servers offer Application Programming Interface (API) interfaces, allowing users to facilitate bidirectional data flow and flexible processing strategies by transferring data between the cloud, edge servers, or terminal devices.

In recent times, significant progress has been achieved in the computational speed and accuracy of DL models, leading to heightened intricacy in their structural designs. However, in the context of image recognition and retrieval applications, relying solely on traditional cloud computing architectures for image identification and retrieval would exert strain on the ECS and result in prolonged network latency, particularly when dealing with a substantial volume of images [23]. Conversely, depending solely on terminal devices for image identification and retrieval would necessitate high-end hardware, rendering it impractical for applications involving a multitude of terminal devices. Consequently, the integration of EC and DL network models emerges as a viable solution to address the requisites of practical applications, such as traffic target detection and medical image recognition. The viability of applying the EC architecture to DL-based image classification has been substantiated by advancements in EC technology [24].

In the majority of image recognition and retrieval applications, sophisticated algorithms require extensive training, and managing a substantial number of images entails a demand for high transmission bandwidth. EC technology facilitates the transfer of computational burdens from terminal devices to the edge. Local terminals collect and transmit image information to the edge server. Given that the demand for DL-based detection processes is generally lower than that for training processes, model training tasks are typically offloaded to the cloud for processing. Conversely, the edge assumes responsibility for model deployment and prediction tasks [25]. However, owing to the limited computing capacity of edge devices compared to ECS, the deployment of relatively lightweight networks becomes imperative. Through deploying models on edge servers, real-time task processing at the edge can be achieved [26].

3. Research model

3.1 Design of a lightweight CNNs image recognition model based on EC

In applications involving intelligent image recognition and retrieval, providing users with timely feedback on uploaded images is of paramount importance. This study introduces a system architecture rooted in the integration of EC and DL, capitalizing on the advantages inherent in both cloud computing and EC paradigms. Furthermore, taking into account the characteristics of image classification networks, a decentralized computing architecture is adopted to distribute the intricate tasks originally managed by the ECS, allocating certain tasks to multiple edge devices for concurrent processing. This architectural framework serves as a complementary and optimization strategy for cloud computing, effectively diminishing image processing time. The operational process of the model is elucidated in Fig. 4.

3.2 Architecture design of lightweight CNNs image recognition model

The system architecture proposed in this study comprises three layers: terminal devices, edge servers, and cloud servers. Each layer is assigned specific tasks to form a comprehensive architecture that collaborates effectively. The specific content is shown in Table 1.

Table 1
The optimized system architecture

Hierarchy	The main function	Features
Terminal Equipment	It is used for image acquisition, transfer, and upload of raw data and input sources.	There are a large number of them, and the computing power is insufficient. Local inference calculations are usually not performed, and DL models are not deployed.
Edge Server	It can provide fast real-time processing speed and less transmission delay, handle simple tasks, and deploy lightweight DL models.	Small size, limited computing power and storage capacity, faster processing through mass deployment.
Cloud Server	It is responsible for complex data processing and storage tasks, training of DL models, and complex calculations.	Has high computing power and large storage capacity to handle advanced tasks.

Figure 4.

Operation process of identification and retrieval system architecture.

Edge devices possess computational capabilities intermediate between those of cloud servers and terminal devices, necessitating appropriate hardware configurations. Cloud servers exhibit higher configurations, enhanced performance, and more abundant resources, making them suitable for large-scale computing tasks and resource provision through leasing arrangements. Considerations of transmission delay and cost are imperative. The cloud server assumes responsibilities such as dataset summarization and training of DL network models. Consequently, achieving a complete segregation between cloud computing and EC proves unfeasible. The integration of terminal, cloud, and edge layers is pivotal for effectively reallocating and deploying resources, better aligning with the requirements of diverse applications. The model’s application revolves around three primary tasks. Firstly, cloud servers are employed for model training and deployment. Secondly, edge devices deploy lightweight models to facilitate the uploading and processing of image data, thereby mitigating transmission delays. Finally, terminal devices manage image acquisition and upload, while edge devices perform image recognition and retrieval, enhancing overall computing efficiency.

3.3 Operational logic and advantages of a lightweight CNNs image recognition model

Throughout the model training process, each terminal device captures images, which are subsequently transmitted to nearby edge devices as requisite image data for model training. This data is then transmitted to the cloud server, where it is stored and subjected to data enhancement, cropping, and additional preprocessing operations, culminating in the formation of a comprehensive image dataset. This dataset is subsequently utilized for training the model, and the outcomes of the training process are conveyed back to the edge device for deployment. In subsequent operations, the cloud server retrieves image data from the edge server and expands the dataset based on user specifications. Furthermore, the cloud server has the capability to introduce categories not initially present in the dataset, thereby updating model training and testing. If the cloud server trains a more effective model, the edge device can receive and implement model updates at any given time. During the image recognition and retrieval phase, images are not transmitted to cloud servers; instead, they are directly conveyed to neighboring edge servers for processing via the local area network. The edge device undertakes image recognition and retrieval tasks, subsequently relaying the results to the terminal. This localized processing approach significantly amplifies the efficiency of image recognition and retrieval. This architecture has two advantages, as shown in Table 2.

Table 2
The advantages of optimizing the system

Advantage	Describe
High image recognition and retrieval performance	The terminal device collects images in real-time, shortens the waiting time for users to obtain results, reduces dependence on cloud servers, reduces network backhaul bandwidth requirements and network load, shortens image transmission distance, and reduces the physical distance between users and servers.
High scalability and reusability	Adding edge devices near demand points can achieve system scalability and reusability, avoid a large number of services requesting one server at the same time, disperse task processing pressure by deploying more edge servers, and improve system efficiency.

Therefore, deploying more edge servers can better distribute the task processing load, enhance system scalability, and provide more efficient image recognition and retrieval services.

4. Experimental design and performance evaluation

4.1 Datasets collection

In this experimental setup, terminal devices exclusively handle the collection of images without engaging in any inference computations. To ensure precise timing, all images within the terminal devices undergo pertinent preprocessing operations. The dataset employed in this investigation is the publicly accessible ImageNet dataset, a large-scale image recognition dataset encompassing over 14 million images across more than 20,000 categories. For the purposes of this experiment, a random subset of 5,000 images was selected from this extensive dataset.

4.2 Experimental environment

The terminal configuration of the experiment is shown in Table 3.

Table 3
Terminal configuration

Hardware module	Configuration
CPU (Central Processing Unit)	A12 6-core 2.49GHZ
ROM (Read-Only Memory)	64GB
RAM (Random Access Memory)	4GB

4.3 Parameters setting

For the simulation of real-world application scenarios, a high-performance desktop computer was selected as the cloud server. The choice of edge devices took into account their storage and computing capabilities, leading to the selection of a mobile laptop computer as the edge server. The configurations of both the cloud server and edge server are detailed in Table 4.

Table 4
Server configuration

Hardware module	ECS configuration	Edge server configuration
CPU	AMD (Advanced Micro Devices)	Intel i5-9300H at 2.3 GHZ
	Ryzen 5 5600x 6-Core Processor 3.7 GHZ
GPU (Graphic Processing Unit)	NVIDIA GeForce GTX 3060Ti	NVIDIA GeForce GTX 1650Ti
ROM	1TB	512GB
RAM	16GB	16GB

Throughout the experiment, the image dimensions were configured to 224 $\times$ 224, resulting in each image consuming approximately 0.2MB of storage. The model learning rate was established at 0.01, the number of iterations amounted to 100, and the batch size was set at 32.

4.4 Performance evaluation

The experimental procedure involved distinct simulations of cloud computing and EC to assess image recognition, retrieval, and image transmission times within ECS and EC architectures. A comprehensive comparative analysis was undertaken to evaluate the respective advantages and disadvantages inherent in these two architectures. Initially, a subset of 1000 images was chosen, and the cumulative time required for the recognition and retrieval of these images was computed. In order to ensure accuracy and reliability, the tests were iterated ten times under consistent conditions, and the resulting average value was computed. The experimental results were meticulously recorded with precision up to two decimal places. The outcomes of the experiments are presented graphically in Fig. 5.

In Fig. 5, the image recognition and retrieval speed of the cloud server demonstrates notable swiftness, achieving an average time of 13.33 milliseconds per image. In contrast, the average processing time on the edge server amounts to 50.11 milliseconds. This empirical evidence supports the feasibility of implementing image recognition and retrieval on edge devices. However, it is crucial to acknowledge that the processing time is comparatively lengthier than that of the ECS system. To further validate the accuracy of the experiment, this study conducts multiple experiments on the cloud server, with the corresponding results presented in Table 5.

Table 5
The recognition and retrieval time of ECS in different image numbers

Number of pictures	Time (ms)
1,000	13.33
2,000	26.55
3,000	40.08
4,000	53.54
5,000	66.88

Figure 5.

Time comparison of image recognition and retrieval under different servers.

Table 5 illustrates the correlation between the number of images and the corresponding image recognition and retrieval time on cloud servers. The time exhibits a linear increase with the growth in the number of images. Notably, the testing of 4000 images on a cloud server requires more time than testing 1000 images on a single edge server. Maintaining a constant number of images for recognition and retrieval, the utilization of four edge servers yields faster speeds in recognition and retrieval compared to employing a single cloud server.

The overall processing time for each image is influenced by both the recognition and retrieval time of the device and the duration of image transmission. Minimizing network transmission delays assumes critical significance in enhancing the overall image processing speed A comparative analysis is undertaken to assess network transmission delays across different servers. Specifically, the time required for image transmission from the terminal to the edge server over the local area network is calculated for a single edge device and a single cloud device. This time difference represents the interval between the moment the terminal initiates image transmission and when the edge server receives it. Additionally, the time necessary for image transmission from the terminal to the ECS over the wide area network is determined, denoting the duration between image transmission initiation and its receipt by the ECS. Transmission delays are scrutinized in both the cloud computing architecture and the EC architecture while transmitting varying numbers of images. The experimental results detailing these transmission delays are visually depicted in Fig. 6.

Figure 6.

Comparison of transmission delay of different servers.

Figure 6 presents a visual depiction comparing image transmission delays between the EC architecture and the cloud computing architecture. The results illustrate that the EC architecture exhibits lower image transmission delays than the cloud computing architecture. Additionally, as the number of processed images increases, the discernible difference between the two architectures becomes more pronounced. This phenomenon is attributed to the substantial distance between the ECS and terminal equipment, leading to elevated network transmission delays in the cloud computing architecture. In contrast, the proximity of edge devices to terminal devices aids in mitigating delays caused by data transmission in the channel. This observation supports the notion that a substantial data volume can induce network congestion in the cloud computing architecture, resulting in heightened latency. It underscores the advantage of leveraging edge devices to reduce transmission delays and enhance overall system performance. The cumulative processing time for each image encompasses image recognition, retrieval, and transmission time. Figure 7 is presented to offer a visual representation of this cumulative processing time for each image under both architectures.

Figure 7.

Comparison of average image processing time of different calculation methods.

The average processing time per image serves as a pivotal metric for assessing the overall processing efficiency within the architecture. Figure 7 visually illustrates that, as the number of processed images increases, the average processing time per image remains relatively stable and consistent in the EC architecture. This signifies that the EC architecture maintains a consistent level of efficiency irrespective of the quantity of images being processed. Conversely, in cloud computing architectures, the average processing time for images gradually increases, and the processing speed is significantly influenced by the number of images. It is essential to note that these findings are grounded in data obtained from a single-edge device. The average image processing time diminishes as the number of edge devices increases, suggesting that employing multiple edge devices can further augment processing efficiency. Furthermore, the more pronounced the processing time disparity between cloud computing and EC architectures, the more evident the advantages of the EC architecture become. In comparison to the Prairie Dog Optimization Algorithm proposed by Abualigah et al. [27], the optimized model in this study demonstrates significant advantages in processing speed and accuracy. Through an innovative design of a lightweight CNNs, the model achieves more efficient image recognition in an EC environment while reducing reliance on cloud computing resources, particularly crucial in real-time image processing applications. Furthermore, when compared to the Crawl Motion Search Algorithm by Ekinci et al. [28], the model exhibits better resource utilization efficiency and lower energy consumption. This is attributed to the optimized allocation of computational resources, enabling efficient operation in resource-constrained environments, a critical aspect for applications on edge devices. The significance of this study lies in optimizing intelligent image recognition through lightweight CNNs and EC. The study addresses bandwidth bottlenecks, network congestion, and processing delays associated with traditional cloud computing in image recognition, enhancing efficiency and reducing dependence on cloud servers. Potential application areas include real-time traffic target detection, medical image recognition, and more. Future study directions may focus on further optimizing model structures for increased accuracy, exploring additional applications in intelligent systems, and enhancing EC capabilities to support more complex tasks.

4.5 Discussion

The comparative analysis of image retrieval times reveals that edge devices exhibit slower processing times compared to cloud servers, resulting in longer image recognition and retrieval times on edge devices. However, in practical applications, a single cloud server suffices for model training, while multiple edge devices can concurrently execute image recognition and retrieval tasks. On average, the speed of recognizing and retrieving each image on a cloud device is less than four times faster than that on an edge device. Once the number of edge devices reaches four and operates simultaneously, they can outpace the speed of a cloud server. Following multiple experiments on the cloud server in image recognition and retrieval applications, multiple edge devices within the EC architecture can achieve and surpass the speed of image recognition and retrieval in cloud computing. Moreover, the impact becomes more pronounced with an increasing number of edge devices. By comparing with the cloud computing architecture, the effectiveness and feasibility of implementing EC architecture for image recognition and retrieval are verified from the perspective of recognition and retrieval speed. The comparative experiment on transmission delay reveals that, as the number of processed images increases, the average processing time per image remains stable and consistent in the EC architecture. In contrast, the average processing time of images in the cloud computing architecture elongates, and the number of images significantly influences the processing rate. Consequently, as the number of images escalates, the advantages of the EC framework become more apparent.

5. Conclusion

With the continuous progress of EC and lightweight CNNs, the field of image recognition has undergone significant and sustained development. However, traditional intelligent image recognition methods are no longer sufficient to meet the growing demands of this field. Therefore, this study aims to optimize intelligent image recognition models using EC technology. To achieve this goal, it first introduces the theories and models related to lightweight CNNs, elucidating common lightweight CNNs architectures. Subsequently, this study provides a detailed exposition of the concept of EC, highlighting its current applications and unique characteristics. Finally, it optimizes intelligent image recognition models using EC-related technologies and conducts experimental analysis to validate the effectiveness and feasibility of the proposed model. The experimental results indicate that the average processing time per image using ECS for recognizing 1000 images is 13.33 milliseconds. In contrast, on a single ECS and an edge server on edge devices, the average processing time per image is 50.11 milliseconds. The practical implementation of the EC architecture can enhance model performance by increasing the number of edge servers. Considering the latency and processing time during the image recognition process, a second round of comparative experiments was conducted. These experiments suggest that the average processing time per image remains relatively stable in the EC architecture with an increasing number of processed images. Conversely, the average processing time for images in the cloud computing architecture gradually increases. This finding indicates that the number of images significantly influences the processing speed of the cloud computing architecture. The significant difference in processing time between cloud computing and EC architectures underscores the advantages of the EC architecture.

Nevertheless, despite these notable advancements, certain limitations persist. Enhancing the network model contributes to improved image recognition and retrieval performance. However a residual number of recognition errors may still be present. Consequently, ongoing optimization of the model structure becomes imperative to elevate accuracy levels The application of intelligent image recognition relying on the lightweight CNNs model demonstrates expansive potential within the EC environment. Future study endeavors will be dedicated to this domain, fostering its continual development and application in sophisticated intelligent systems. Future studies will concentrate on further optimizing the structure of the lightweight CNNs model to enhance image recognition accuracy. Additionally, there will be an exploration into extending the application of this technology to broader scenarios, such as intelligent traffic monitoring and medical image analysis. Furthermore, experimental plans include strengthening EC capabilities to enable more effective operation of DL models in resource-constrained environments. Simultaneously, additional experiments and performance assessments will be conducted to evaluate the effectiveness and reliability of the model comprehensively. These future directions aim to elevate the study’s practical value and technological sophistication, fostering innovation in intelligent image recognition.

Footnotes

Fundings

This work was supported by 2023 Jiangxi Province University Humanities and Social Sciences Research Project Planning Project, Project Name: Interactive Experience Design Research Based on Augmented Reality (AR) Technology in Traditional Art Exhibition, Project Number: JC23108.

References

Zhang

, et al. State-of-the-art in 360 video/image processing: Perception, assessment and compression. IEEE Journal of Selected Topics in Signal Processing. 2020; 14(1): 5-26.

Hong

Han

Yao

, et al. SpectralFormer: Rethinking hyperspectral image classification with transformers. IEEE Transactions on Geoscience and Remote Sensing. 2021; 60(2): 1-15.

Touvron

Bojanowski

Caron

, et al. Resmlp: Feedforward networks for image classification with data-efficient training. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2022; 23(6): 19-23.

Zheng

Liu

Yin

. Research on image classification method based on improved multi-scale relational network. PeerJ Computer Science. 2021; 7(1): 613.

Bazi

Bashmal

Rahhal

MMA

, et al. Vision transformers for remote sensing image classification. Remote Sensing. 2021; 13(3): 516.

Shen

Cui

. Towards non-iid image classification: A dataset and baselines. Pattern Recognition. 2021; 110(17): 107383.

Wieczorek

Siłka

Woźniak

, et al. Lightweight convolutional neural network model for human face detection in risk situations. IEEE Transactions on Industrial Informatics. 2021; 18(7): 4820-4829.

Yun

Jiang

Liu

, et al. Real-time target detection method based on lightweight convolutional neural network. Frontiers in Bioengineering and Biotechnology. 2022; 10(2): 56-66.

Zhao

Gui

Xue

, et al. A novel intrusion detection method based on lightweight neural network for internet of things. IEEE Internet of Things Journal. 2021; 9(12): 9960-9972.

10.

Liu

Kong

Chen

, et al. Multi-scale ship detection algorithm based on a lightweight neural network for spaceborne SAR images. Remote Sensing. 2022; 14(5): 1149.

11.

Liu

. TanhExp: A smooth activation function with high convergence speed for lightweight neural networks. IET Computer Vision. 2021; 15(2): 136-150.

12.

Zulkifley

Abdani

Zulkifley

. COVID-19 screening using a lightweight convolutional neural network with generative adversarial network data augmentation. Symmetry. 2020; 12(9): 1530.

13.

Cao

Shih

Guo

, et al. Lightweight convolutional neural networks for CSI feedback in massive MIMO. IEEE Communications Letters. 2021; 25(8): 2624-2628.

14.

Khaki

Safaei

Pham

, et al. Wheatnet: A lightweight convolutional neural network for high-throughput image-based wheat head detection and counting. Neurocomputing. 2022; 489(16): 78-89.

15.

Liang

Jiang

, et al. CEModule: A computation efficient module for lightweight convolutional neural networks. IEEE Transactions on Neural Networks and Learning Systems. 2021; 4(1): 9-12.

16.

Wang

, et al. Estimating crowd density with edge intelligence based on lightweight convolutional neural networks. Expert Systems with Applications. 2022; 20(6): 117823.

17.

Wang

Han

Leung

VCM

, et al. Convergence of edge computing and deep learning: A comprehensive survey. IEEE Communications Surveys & Tutorials. 2020; 22(2): 869-904.

18.

Siriwardhana

Porambage

Liyanage

, et al. A survey on mobile augmented reality with 5G mobile edge computing: architectures, applications, and technical aspects. IEEE Communications Surveys & Tutorials. 2021; 23(2): 1160-1192.

19.

Zhang

Shu

, et al. Study of convolutional neural network-based semantic segmentation methods on edge intelligence devices for field agricultural robot navigation line extraction. Computers and Electronics in Agriculture. 2023; 209(11): 107811.

20.

Ranaweera

Jurcut

Liyanage

. Survey on multi-access edge computing security and privacy. IEEE Communications Surveys & Tutorials. 2021; 23(2): 1078-1124.

21.

Lin

Zeadally

Chen

, et al. A survey on computation offloading modeling for edge computing. Journal of Network and Computer Applications. 2020; 169(1): 102781.

22.

Bai

Pan

Deng

, et al. Latency minimization for intelligent reflecting surface aided mobile edge computing. IEEE Journal on Selected Areas in Communications. 2020; 38(11): 2666-2682.

23.

Poongodi

Bourouis

Ahmed

, et al. A novel secured multi-access edge computing based vanet with neuro fuzzy systems based blockchain framework. Computer Communications. 2022; 192: 48-56.

24.

Pustokhina

Pustokhin

Gupta

, et al. An effective training scheme for deep neural network in edge computing enabled Internet of medical things (IoMT) systems. IEEE Access. 2020; 8(3): 107112-107123.

25.

Chen

Lou

, et al. Intelligent edge computing based on machine learning for smart city. Future Generation Computer Systems. 2021; 115(1): 90-99.

26.

Zhang

Zou

Wang

, et al. Resource allocation and trust computing for blockchain-enabled edge computing system. Computers & Security. 2021; 105(1): 102249.

27.

Abualigah

Oliva

Jia

, et al. Improved prairie dog optimization algorithm by dwarf mongoose optimization algorithm for optimization problems. Multimedia Tools and Applications. 2023; 5(2): 1-41.

28.

Ekinci

Izci

Abu Zitar

, et al. Development of Lévy flight-based reptile search algorithm with local search ability for power systems engineering design problems. Neural Computing and Applications. 2022; 34(22): 20263-20283.

Intelligent image recognition using lightweight convolutional neural networks model in edge computing environment

Abstract

Keywords

1. Introduction

2. Intelligent image recognition optimization under EC and lightweight CNNs

2.1 Lightweight CNNs

3.1 Design of a lightweight CNNs image recognition model based on EC

3.2 Architecture design of lightweight CNNs image recognition model

Table 1 The optimized system architecture

Table 2 The advantages of optimizing the system

4.1 Datasets collection

4.2 Experimental environment

Table 3 Terminal configuration

Table 4 Server configuration

Table 5 The recognition and retrieval time of ECS in different image numbers

5. Conclusion

Footnotes

Fundings

References

Table 1
The optimized system architecture

Table 2
The advantages of optimizing the system

Table 3
Terminal configuration

Table 4
Server configuration

Table 5
The recognition and retrieval time of ECS in different image numbers