Abstract
Edge computing applications have the characteristics of huge scale and sensitive quality of service. However, due to the “long tail delay” problem of user access requests across the heterogeneous environment of edge networks, wide area networks and data centers, the quality of experience of edge users has seriously decreased. Therefore, a time reduction rule calculation algorithm based on Internet of Things (IoT) delay application-driven measurement mechanism is proposed, which can be applied to multi-source heterogeneous information fusion, big data fusion and information fusion security. The system architecture features of edge computing applications are reviewed, and the causes and classifications of long tail delays are analyzed. The main theories and methods of network delay measurement are introduced, and the optimization techniques for long tail delay are summarized. Finally, the online optimization operation environment is proposed thoughts and challenges. The research results show that the GXDGC algorithm proposed is effective for the application of driving measurement technology in IoT delay. Users’ access to online real-time big data needs to span complex heterogeneous network environments such as edge networks, wide area networks, data center networks, etc. Due to the superposition of delays, any increase in delays in online real-time big data processing will inevitably lead to end-to-end long-tail delays. Therefore, it is necessary to design an integrated optimization mechanism to control end-to-end online real-time big data network delays.
Introduction
The progress of Internet, mobile computing and IoT technology has formed a deeply integrated man-machine IoT ecological environment, which has led to a large number of business applications for edge users, and it has formed a billion-level edge user market, including network search, online social network, e - commerce, video monitoring, intelligent assistants, etc [1]. The progress of Internet, mobile computing and IoT technology has formed a deeply integrated man - machine - thing ecological environment, which has led to a large number of business applications for edge users, forming a billion-level edge user market including network search, online social network, e - commerce, video surveillance and intelligent assistants. Cisco expects global wireless and mobile smart device traffic to exceed 60% by 2021 [2]. However, in data-intensive applications where online real-time calculations of large-scale data sets are performed, cloud computing encounters a severe “long tail” delay problem: a certain proportion of processing response time is much larger than the average of all processing times. The extreme boundary conditions that cause the average response delay to not reflect network latency [3]. Google found that its 99.9 - percentile delay was several orders of magnitude of the median delay, indicating that at least one out of 1000 users experienced excessive delay, significantly affecting the user experience [4]. According to statistics, for every 0.5s increase in Google’s network delay, its network traffic will drop by 20 %. If Amazon service delay increases by 0.1s, its profit will decrease by 1 %. At the same time, the increase in response time has affected the ranking order of online web services in Google search [5]. In order to solve the shortcomings of cloud computing, edge computing has attracted wide attention from academia and industry.
Unlike cloud computing centralized resource deployment model, edge computing deploys equipment resources near the user’s edge and uses the user’s cooperation to perform some or all of the tasks of data filtering and compression, calculation, storage, communication and management, thus improving real-time processing response speed, reducing network transmission bandwidth overhead and enhancing data privacy protection. If the calculation of the similarity between the vertices in the graph can be reduced, the execution efficiency of the algorithm is greatly increased, and the computation time reduction rule algorithm and the GxDsGC algorithm can solve the small problem and improve the efficiency of the edge calculation. In summary, the application-driven measurement and optimization technology of IoT delay in edge computing environment is studied by using the calculation time reduction rule algorithm and GxDsGC algorithm.
The research is innovative and can be divided into three parts: The first part introduces edge computing, which provides a theoretical basis for the discussion of edge calculation-driven measurement mechanism for IoT delay application. The second part proposes an algorithm to calculate the time reduction rule and analyzes the application-driven measurement mechanism of IoT delay. In the third part, GXDGC algorithm is used to optimize and analyze the delay technology. Finally, the effectiveness of the algorithm is verified by experiments.
Related work
Edge computing refers to an open platform integrating network, computing, storage and application core capabilities on the side close to the object or data source to provide the nearest service. Long C and others pointed out that mobile communication has developed from the first generation to the fourth generation, basically meeting the needs of human daily life and work. However, with the rapid development of the Internet of Things, a large number of terminals will be connected to the mobile network, and the total amount of mobile data will greatly increase in the future. It is the basic requirement of the mobile network to provide users with applications with high reliability and low latency [6]. Fan Q et al. conducted research based on AM Dahl’s law in the computer field, and found that the delay rise at any position of the edge calculation will inevitably increase the end-to-end delay. It was considered that hundreds of calculation and edge computing techniques need to be combined to control and optimize the edge calculation entire process [7]. Yousefpour A and others used edge computing to deploy device resources at the user’s edge and cooperate with the user to perform some or all of the tasks such as data filtering and compression, computing, storage, communication and management. Research shows that the algorithm improves real-time processing response speed, reduces network transmission bandwidth overhead and enhances data privacy protection [8]. Barcelo M et al. used edge computing applications to study changes in the requirements of the operating environment and found that by flexibly adding or adjusting node resources and their configurations, the scale of computing can be met [9]. Darei Otis K proposed an edge computing framework, and evaluated the proposed scheme under different parameter settings. It studied and implemented the collaborative processing of delay-sensitive multimedia IoT tasks on resource-rich mobile devices [10]. Aazam M’s edge-based IoT design an application-aware workload distribution scheme that minimizes the response time of IoT application requests by determining the target cloudlet for each type of IoT user’s request and the amount of computing resources allocated for each IoT. The study found that the solution will dynamically adjust the computing resources of different applications in each cloud according to the workload, reducing the computational delay of all requests in the cloud [11]. Qin L proposed a delay minimization collaboration and offload strategy, and found that it can be used to support fog devices, which can reduce the service delay of IoT applications [12] Zhu L proposed to improve the TLP congestion control mechanism and the switch (router) priority scheduling mechanism, and found that sending as much application data as possible within the deadline requires significant changes to the data center hardware and software environment [13]. In order to avoid interference from resources such as cache, memory and hardware prefetching caused by resource reuse, Lyu X recommended scheduling low-interference applications to the same server and using priority to distinguish delay-sensitive tasks from other tasks. It is found that the service ratio of competing resources has risen sharply due to the widespread use of multi-copy technology to improve the availability of services, reducing the effectiveness of scheduling algorithms [14]. Hwang G summarized the results of previous studies and believes that existing research work usually uses heuristics to delay optimization of specific applications [15]. Through the research of scholars in China and other countries, it can be seen that the research on the long tail delay optimization of big data has aroused great interest in academia and industry, and the delay optimization technology has made great progress, but the economic and computational costs of delay application optimization are still very high [16, 17].
Edge computing and IoT delay application driven measurement and optimization technology
Edge computing
Edge computing is not a new word. As a provider of content distribution network CDN and cloud services, AKAMAI worked with IBM on Edge Computing in 2003. As one of the largest distributed computing service providers in the world, it was responsible for 15–30% of global network traffic. In one of its internal research projects, the purpose and problem of “edge computing” was proposed, and AKAMAI and IBM provided Edge Edge-based services on their WebSphere. Edge computing refers to an open platform integrating network, computing, storage and application core capabilities on the side close to the object or data source to provide the nearest service. Its application program is launched on the edge side to generate a faster network service response and meet the basic needs of the industry in real-time business, application intelligence, security and privacy protection. Edge computing is between the physical entity and the industrial connection, or at the top of the physical entity. However, cloud computing can still access the historical data of edge computing.
Three typical features of edge computing applications include: large - scale, relevance, and suddenness. First, it is difficult for service providers to obtain the network delay distribution of each location in the online real-time big data computing structure and to quickly find the long tail delay area and optimize the key locations causing the long tail delay online. Second, in the message routing and response push phase, the wide area network data routing is affected by the high delay routing path and data transmission packet loss, resulting in serious long tail network transmission delay for online real-time big data requests. On - line real-time big data tree or directed acyclic graph on-line computing structure may cause communication hot spots and link congestion near the convergence node, exacerbating the long tail problem of network transmission delay. In the edge computing and on-line processing stage, on-line real-time big data processing tasks need multi-layer distributed parallel computing, and server performance fluctuations and data center network traffic scheduling enlarge the influence range of the long tail delay. For the IoT, the breakthrough in edge computing means that many controls will be implemented through local devices without being handed over to the cloud and the processing will be completed at the local edge computing layer. This will undoubtedly greatly improve processing efficiency and reduce the load on the cloud. Due to being closer to the user, it can also provide the user with a faster response and solve the demand at the edge [18–20].
Calculation of time reduction rule algorithm based on iot delay application driving measurement mechanism
The network delay measurement has received extensive attention from researchers. The existing research can be divided into data center, content distribution network, edge network and other measurement systems with different scales. IoT platform measurement provides decision-making basis for administrator management and performance monitoring channels for users by collecting the usage of underlying facilities. AmazonC1oudWatch monitors AWS instances and the processes running on them. It can read user-defined monitoring metrics and allow users to set alerts. The EC2 instance in Amazon embeds C1oudWatch basic monitoring functions, including CPU utilization and data transfer, disk usage, etc. In the existing PSCAN algorithm, the similarity between the vertices associated with all the edges in the graph is calculated. The calculation of the similarity between the vertices is a relatively time-consuming operation in the whole algorithm, if the vertices in the graph can be reduced. The amount of similarity calculation will greatly increase the efficiency of the algorithm. By analyzing the definition of similarity between vertices, the following reduction rules are obtained for a given similarity threshold ɛ (0 < ɛ ≤ 1) and two vertices v and w, if |H [v] | < |H [w] | · ɛ2 or |H [w] | < |H [v] | · ɛ2, so vertex v and w are not similar. Because |H [v] | < |H [w] | · ɛ2, therefore:
That is δ (v, w) < ɛ, so the vertices v and w are not similar, the same reason:
So when |H [v] | < |H [w] | · ɛ2 or |H [w] ||H [v] | · ɛ2, therefore so vertices v and w are not similar, using the above reduction rule, the following algorithm for judging the dissimilarity of two vertices is obtained - the dis Sim With Prune algorithm, enter the vertices v and w the structured neighbors H [v]and H [w], a similarity threshold ɛ (0 < ɛ ≤ 1). Then output, if not similar output true, otherwise output false. The algorithm uses lemma to judge that the two vertices are not similar. If the two vertices v and w satisfy the condition in Lemma 1, the vertices v and w must not be similar, and the output is true. Otherwise, it is impossible to determine whether the two vertices are similarly output false.
The application of delay optimization in edge computing has a huge potential economic benefit, so researchers have proposed a large number of delay optimization techniques, which can be divided into data center network optimization, memory computing, scheduling optimization, routing optimization, edge caching and other ways according to the delay optimization location. Based on the above algorithm, the distributed structure graph clustering algorithm based on GraphX (GXDSGC) is given below. Enter a non-directional unweighted graph G (V, E), a similarity threshold ɛ (0 < ɛ ≤ 1), the minimum number of vertices in the cluster min Num. Output a collection of all the clusters in the graph G. Find the neighbor vertices of each vertex; use the neighbor vertices of each vertex as the attribute values of the vertices; process each v, w of adjacent vertices in the graph, and distribute the hierarchical structured graph clustering algorithm based on Graph X (GXDSGC) As shown, the algorithm first uses the two functions aggregate message and the outer join Vertices in Graph X to find the neighbor vertices of each vertex in the graph and uses the neighbor vertices as the attribute values of the vertex; then for each of the graphs The adjacent vertices are similar to v and w. When seeking similarity, firstly use the reduction rule to judge whether the current vertex v, w is not similar. If the similarity is set, the similarity between the two vertices is false. Otherwise, the algorithm 3 is used to find the similarity between the two vertices; then remove all the edges between the dissimilar apex pairs in the graph, then use the “connected component algorithm” to find all the connected components in the graph, then delete the vertices The number is less than the connected component of the specified threshold minNum, and the last connected component is the required cluster.
Experimental design and analysis
Experimental environment and conditions
Based on the algorithm constructed, the following algorithm is used to verify the algorithm. The software and hardware environment used in this experiment is as follows: The cluster consists of five servers, Each server is configured to: RedHat 64-bit operating system, 16-core CPU, clocked at 1.9 GHz, 16G RAM, 2T hard drive; Hadoop version 2.6.0, Spark version 1.6.0, Java version 1.8.0, Scala version 2.10.4. Development environment configuration: The operating system is Windows 732 flagship version, clocked at 3.10GHz, 4G memory, 500G hard disk; development tool IntelliJIDEA Community Edition 15.0.2, Java version is 1.8.0, Scala version is 2.10.4. The data set used in this experiment includes DBLP, YouTube, LiveJournal 3 real data sets and artificial data sets for experiments. Among them, DBLP is an author collaboration network; YouTube is a user-to-user link network; LiveJournal is an online social network. Artificial data sets are generated using algorithms from previous literature. The data set statistics are shown in Tables 1 and 2.
Real datasets
Real datasets
Synthetic datasets
The GXDSGC algorithm proposed and the PSCAN algorithm proposed by Zhao et al. were compared. The experiment was compared in terms of running time, reduction strategy and scalability. The GXDSGC algorithm is analyzed from these three aspects. First, the running time is compared. In this experiment, for different data sets, the GXDSGC algorithm and the PSCAN algorithm are respectively run under the condition that the similarity thresholds are 0.6, 0.7, 0.8, and 0.9 respectively. The running time of the algorithm on the data set in Table 1 is shown in Fig. 1. It can be seen from the experimental results in Fig. 4 that the running time of both algorithms is gradually reduced with the increase of the similarity threshold, and the GXDSGC algorithm runs 30 times faster than the PSCAN algorithm on the 4 data sets., indicating that the calculation is very effective compared to the PSCAN algorithm.

Running time.
There are two main reasons why the GXDSGC algorithm is faster. The first one is that the GXDSGC algorithm is based on the GraphX algorithm, when the memory of the cluster is large enough; the entire data is stored in the middle of the memory in the process of data processing, thus avoiding a large number of disks I/O and saving a large amount of time. While PSCAN runs in a Hadoop cluster, it needs to go through multiple Mapper and Reducer phases. Even if the cluster’s memory has enough space, intermediate results need to be stored on disk at each stage, and intermediate results are read from disk in the next stage. The disk I/O overhead is greatly increased, which increases the running time of the program. The second reason is that the GXDSGC algorithm calculates the similarity between vertices by calculating two kinds of reduction strategies when calculating the similarity between adjacent vertices. If the similarity between vertices is not accurately calculated, it is judged whether the two vertices are similar, further reducing the running time of the program.
Then the reduction strategy is compared. In order to compare the effectiveness of the two reduction rules in the algorithm, in the experiment, only recorded the time consumed by similarity calculation between vertices in the graph, ran the algorithm five times for each data set, and then averaged the time consumed, resulting in the results shown in Fig. 2. As can be seen from Fig. 2, for each data set, the time taken to calculate the similarity by using the reduction rule is significantly shorter than that by not using the reduction rule, indicating that the reduction rule in the algorithm does reduce the time taken to calculate the similarity. It can also be seen from Fig. 2 that for each data set, with the increasing similarity threshold, the running time of the algorithm using reduction decreases, because with the increasing similarity threshold, more similarity calculations are calculated through reduction rules, indicating that the larger the similarity threshold, the more effective the reduction rules are. In order to further compare the effectiveness of the reduction rules in the calculation, the following experiment was conducted using the synthetic data set in Table 2. The experimental results are shown in Fig. 3.

Pruning strategy.

Pruning strategy on synthetic datasets.

Functional block composition of the experimental system.
As can be seen from the experimental results the GXDSGC algorithm has good scalability, and as the size of the data set increases, the acceleration performance of the algorithm is better, because as the size of the graph increases, the communication overhead between the machines in the cluster is reduced the percentage of the entire runtime. At present, many IoT motors are equipped with special signal output interfaces, which can be directly sent to the data feedback interface of the motion controller. The processing of feedback signals is equipped with filters, AD/DA conversion circuits and photoelectric circuits, which greatly improve the stability of signal transmission. Only these algorithms need to be used to ensure the accuracy of data acquisition when reading data. The processing flow is shown in the following figure.
The simulated interference signal enters the system after the first PID regulator, in order to ensure that the system has dynamic corresponding performance and can achieve good noise reduction and filtering effects, and can control the errors before and after the signal adjustment without affecting the normal operation of the following system, it is necessary to adjust all the control parameters in this part of the control loop to solve this problem. The simulation test result and its waveform are shown in the figure below.
As can be clearly seen from Fig. 5, the selected control parameters are not appropriate, resulting in a great difference between the filtered signal and the original input signal, Burrs appear on the waveform curve and are not smooth resulting in a large number of interference signals entering the system in the form of carrier waves. At the same time, it can be seen from the error signal waveform diagram that the error range after PID and filtering processing is between 0.078 and 0.02, with a large error. Therefore, it is necessary to change the above control parameters and their associated variable values. When the parameters are adjusted to the second set of control parameters many times, the filtered waveform is very smooth. It is not difficult to draw a conclusion from Fig. 6 that the selection of control parameters for each functional module is interrelated. If the stability of the system can’t be guaranteed at the initial stage of design, coupled with the disturbance of external interference signals, the poor anti-drying ability of the system will be clearly demonstrated, and the problems arising therefrom can be imagined.

The first set of model control parameter simulation waveforms.

Schematic diagram of de-dispersion.
According to the characteristics of the diversity of edge computing applications, it will be the future development trend to support the normal operation of various edge computing applications in order to improve the long tail delay optimization ability of online real-time big data. By constructing an online real-time big data online optimization operation environment, the end-to-end information distribution closed loop composed of edge processing, message routing, online processing and response push phase is integrated with measurement and routing optimization to ensure the user’s access time is bounded and controllable. Therefore, the application-driven measurement and optimization technology of IoT delay in edge computing environment is studied by using the calculation time reduction rule algorithm and GXDGC algorithm. The experimental results show that the algorithm proposed has certain validity. It is concluded that the measurement of the long tail delay needs to collect the delay distribution of the WAN and the data center network with the allowable error range as low as possible. At the same time, the measurement process is not robust due to machine abnormality, network congestion, network shielding and other reasons. Therefore, the measurement process should tolerate measurement noise and repair missing measurement results. However, there are still many deficiencies in the study. The next step is to study whether WAN is not under the control of online optimization environment and whether user requests can be forwarded around routing areas with too high delay.
Footnotes
Acknowledgments
This work is supported by Innovation team Funded project of DaLian University of Foreign Languages(Grant No. 2017CXTD01)”.
