Abstract
The significance of big data are prone to complication in solving optimization issues. In several scenarios, one requires adapting several contradictory goals and satisfies various criterions. This made the research on multi-objective optimization more vital and has become main topic. This paper presents theoretical analysis and comparative study of top ten optimization algorithms with respect to DMS. The performance analysis and study of optimization algorithms in big data streaming are explicated. Here, the top ten algorithms of optimization based on recency and popularity are considered. In addition, the performance analysis based on Efficiency, Reliability, Quality of solution, and superiority of DMS algorithm over other top 10 algorithms are examined. From analysis, the DMS provides better efficiency as it endeavours less computational effort to generate better solution, due to acquisition of both DA and MS algorithm’s benefits and DMS takes less time to process a task. Moreover, the DMS needs less number of iterations in the process of optimization and helps to stop optimization process in local optimum. In addition, the DMS has better reliability as it poses the potential to handle specific level of performance. In addition, the DMS utilizes heuristic information for attaining high reliability. Moreover, the DMS produced high computation accuracy, which reveals its solution quality. From the analysis, it is noted that DMS attained improved outcomes in terms of efficiency, reliability and solution quality in contrast to other top 10 optimization algorithms.
Introduction
The big data is presently described with three data features, namely velocity, volume and variety. It refers that at certain instance whenever the velocity, volume and variety of data are amplified, then the presents models and techniques cannot be able to manage data processing and storage. In this point, the data is described as big data [1]. For big data research, the analytics is defined as a procedure to examine and comprehend features of large sized databases by mining beneficial geometric and numerical structures. Preferably, the three features of database maximize the complication of data, and hence made the present models and techniques to avert operation as expected in provided time [2]. The classification of big data is emerging as an imperative process in several domains, like marketing, biomedicine and social media and so on. The current advancements in gathering data in several domains has outcome to an unalterable data increment, which one has control. The complication, volume and diversity brought the big data, which can hamper the assessment and knowledge mining procedure. Here, the benchmark data mining techniques should be re-developed or employed for handling the data [3]. For big data, the utilization of classical preprocessing model helps to improve the data and it is resource demanding and time consuming and sometimes make it impracticable amidst several cases. The deficiency of effective and reasonable preprocessing model states that issues in the data can impact the models mined. The sophisticated modelling of big data and analytics are crucial to determine the fundamental model with the retrieved data for attaining Smart Data [4, 5].
Several techniques are considered as popular to handle complicated optimization issues that contain huge count of contradictory functions. In these days, the big data is termed as an emerging domain, and hence the metaheuristics should be capable to address dynamic issues, which may change with respect to time because of processing and assessment of various streaming data. It requires a software infrastructure for solving vibrant multi-objective optimization (MOP) issues. The majority of research considering the MOP is static, and it does not alter in the process of optimization. However, in some real-time cases, the search space or fitness function can change with respect to time and can lead to dynamic MOP and may need dynamic techniques to address it [6]. Optimization is a model, which can acquire the benefits of accessible sources. The application optimization can be performed to acquire the benefits from large memory space contained on particular computer by processor utilized. Optimization should be performed to acquire the optimum method with respect to the group of chosen issues that involves certain aspects, like productivity, utilization, efficiency, longevity, reliability and strength [7]. Here, several big data issues can be modelled into convex issue optimization and some researchers are trying to devise an effective optimization technique for tackling the problems of big data in optimization, like various decision variables and quick convergence with real-time processing [8].
The optimization techniques are splitted into two classes, namely stochastic and deterministic technique. Even though, deterministic techniques poses elevated rate of convergence and it fall into local optimal trap easily, due to the gradient optimization issue. The most suitable optimization model is big data optimization issues, which are solved with stochastic techniques that rely on evolutionary techniques. It can reveal huge search spaces that are more effective and lead to potential solution [9]. The well-known samples of evolutionary optimization techniques includes Imperialist Competitive Algorithm (ICA) [10], Genetic Algorithm (GA) [11], Differential Evolution (DE) [12], Particle Swarm Optimization (PSO) [13], and Firefly Algorithm (FA) [14]. However, the most distinctive techniques suffered, due to curse of dimensionality, and it takes more time to process and sometimes fail to discover improved optimization solutions for handling huge scale issues. The issues of big data optimization include large count of variables and any technique that can address them should have elevated convergence rate [9]. In these days, a huge number of machine learning applications can be utilized to evaluate data and mine precise, pertinent and beneficial data to provide effective decision-making and knowledge discovery. The deep model is fastest growing machine learning domains that reveal remarkable power to decipher several layers of representation using raw data without any experts for developing feature miners. In recent days, the emergence of mathematical programming and machine learning techniques flashes huge attention in data-driven optimization. Considering the data-driven optimization model, the uncertainty model is modelled on the basis of data and hence permits uncertainty data in optimization techniques [15].
The purpose is to provide a theoretical analysis and comparative study of top ten optimization algorithms with DMS. The top ten optimization algorithms are analyzed considering recency and popularity. The analysis of performance based on Efficiency, Reliability, Quality of solution, big data and superiority of DMS algorithm over other top 10 algorithms are briefly examined. The efficiency analysis considers number of fundamental evaluations, running time, and memory usage as the evaluation aspect. The reliability analysis considers success rate, number of constraint violations, and percentage of global solutions as the evaluation aspect. The quality of solution performance considers fixed-cost solution result, fixed-target solve time, and computational accuracy as the evaluation aspect.
Top 10 algorithms of optimization based on recency and popularity
The top 10 optimization algorithms based on recency and popularity are briefly examined below.
Parameter optimization algorithm
The parameter optimization algorithm [16] is a procedure to address optimum configuration parameters in solution space. For enhancing the efficiency of search, it utilizes a GA integrated with the parameter prediction model. The GA is an optimum solution search technique, which simulates the evolution of biology. Its particular internal method can facilitate the estimated best solution and is discovered with less solution space. Its optimization technique poses a positive impact to prevent fall of local optimum. Each configuration attribute is optimized as chromosome gene, and individual in GA represents vector that contains optimized parameter. Here, the group of default configuration attributes and task running time are termed as an input of GA and the performance model is considered as a fitness for judging the cons and pros of each parameter. The mutation and crossover are executed on configuration attributes for generate novel parameter configurations.
Whale optimization algorithm (WOA)
WOA [17] is inspired from the hunting behaviour of humpback whales. The humpback whales poses a special hunting technique termed as bubble-net feeding. It builds bubbles amongst the circles path to hunt tiny fishes, which are closer to surface. This hunting method contains two strategies, namely double-loops and upward spirals. In upward spirals, the whales dive down and it start building bubbles in a spiral shape in the region of prey while swimming. The behaviour of bubble-net feeding is mathematically written for performing optimization. In WOA, the position of whales is arbitrarily initialized in search space. Then, the optimum position is updated with target prey. Other solution tries to update its position to obtain its best solution. Hence, any position in search space can be attained to update the present whale position and simulate the prey encircling. Thus, the same notion can be adapted to “n”-dimensional search space. The behaviour of bubble-net feeding is modelled into two stages, namely exploitation and exploration phases.
Ant colony optimization (ACO) algorithm
ACO [18] is inspired from foraging ant’s behavior. In ACO, the count of artificial ants constructs solutions and is modelled as an optimization issue at it swap information considering the solution quality through communication scheme. The ACO must perform huge iterations to acquire the optimum solution and at each iteration end, it updates the pheromone matrix based on optimum path outcomes. Updated pheromone matrix is the basic aspect of ant colony traversal in subsequent iteration. After each iteration end, it updates pheromone matrix and transmit it to each node in cluster as broadcast variable to utilize in next iteration. Moreover, the pheromone matrix is sent to cluster nodes by transmitting method so as to obtain reliable solution for each node amidst cluster. In ACO, the ants builds own reliable solution, which are totally self-governing.
Artificial bee colony (ABC) algorithm
ABC [19] is motivated from the intellectual honey bee behaviour whose goal is to determine the sources of food with huge quantity of nectar. This technique utilizes Food Source, Employed Bees, Onlooker Bees and Scout Bees. Here, the swarm size is expressed as search space and food sources indicate the solution. Each bee tries to attain the better food source. The count of adapted bee is similar to count of food sources and each food source is linked with single employed bee. In this algorithm, the food source is modelled by its profitability and their value relies on richness, proximity and how simply it can be mined. The employed bees went to food source and it approach back to hive. The Onlookers observe its dances of employed bees and selects the food sources based on dances. The scout bees represent a employed bee whose food source is discarded.
Chaotic biogeography based optimization (CBBO)
Chaos theory [20] indicates the learning of chaotic dynamical model and it’s personified by butterfly effect. As nonlinear dynamical models, the chaotic models are extremely sensitive to its initial condition and small alteration to its preliminary conditions can result into imperative alterations in the final results of these models. Here, the chaotic systems are adapted on BBO inspite of arbitrary values of initialization for generating the CBBO. The BBO represents population assisted optimization that are motivated by development and equilibrium of predators and preys in various ecosystems, which reveals that the chaotic maps replace the arbitrary values to offer chaotic characteristics to heuristic techniques. In processing the BBO, the most imperative arbitrary values are evaluated to select a habitat to emigrate the novel habitants throughout migration. Here, the chaotic maps are utilized and it selects a value amidst [0, 1] when there is requirement of random value.
Owl search algorithm (OSA)
OSA [21] is motivated from the auditory characteristic of owls throughout the prey hunting. The owls can discover the prey location with an exclusive auditory model in which the sound attains one ear prior to other. The owl’s brain produces an auditory map of prey sound that imitates the owl to fly towards the prey in dark. The nature-based method is adapted to solve global optimization issue in which a set of owl takes effort to discover prey or best solution. In “d” dimensional space, each owl is modelled by arbitrarily produced position. In each iteration, this position is updated on the basis of prey’s movement.
Particle swarm optimization (PSO)
PSO [22] is inspired from the particles, which alter in phases throughout the region. At each phase, the fitness is computed for each particle. Based on this computation, the update of particle velocity is computed for each particle. This process repeats till ending a best solution. The PSO is easy to execute and are utilized in count of applications. The particles flies throughout the search space considering the velocities that are dynamically altered based on its historical characters. Thus, the particles pose a potential for flying to improved search area with respect to search process.
Genetic algorithm (GA)
GA [23] is popular optimization technique and it begins with group of arbitrary solution and search for improved solution through generations. The new solutions are generated with mutation and crossover operators. Whenever the GA is executed for adequate time, it can be capable to generate improved solution. For each individual, a crossover point is selected in random for offspring and chromosomes by integrating the genes in the cut points of chromosome. At last, the mutation operator arbitrarily chooses one or genes with less arbitrarily probability and alters it. The mutations simply alter the gene value.
Atom Taylor bird swarm algorithm (Atom Taylor BSA)
The Atom Taylor BSA [24] is obtained by combining ASO and Taylor BSA in such a way that the Taylor BSA is inspired from the Taylor series and BSA.ASO is motivated from the meta-heuristic optimization motivated by the characteristic of molecular dynamics. It is easy and execution is simple and has the ability to handle real-world engineering issues. The Taylor series is utilized for handling higher-order terms that facilitates the classification accuracy. The BSA is precise and has the ability to solve optimization issues and effectively balances exploration and exploitation. Hence, the integration of ASO and TaylorBSA elevates the accuracy with less time.
Moth search optimization (MSA)
MSA [25] is inspired from the moth, which is a type of bug that commonly belongs to the family of butterfly and contains total 160,000 moth species and has been determined throughout the night-time. When compared to other attributes of moth, and Levy flight is termed as an essential feature. The phototaxis process following moths fly is that it enclose the light, which is known as phototaxis. Thus, it permits airborne moths to fall downwards and it models the spiral-path for travelling nearer towards source light. Levy flight is a type of random progression and hence in natural surroundings. The Levy distribution is considered in the power-law. In fly straight scenario, the present moths are divided from light source and it flicker in straight line in the direction of light.
Performance analysis based on efficiency
The performance assessment on the basis of efficiency is examined below. There exist three metrics for computing the efficiency of a technique and that involves memory usage, running time and number of fundamental evaluations. Here, the analysis is done with definition, papers utilized for the measures, theoretical conclusion used and additional parameters used are briefly described below.
Definitions
It involves the definition of certain metrics that are utilized for comparing the performance of each optimization techniques based on efficiency. This metrics involves number of fundamental evaluations, running time and memory usage.
Number of fundamental evaluations
It is utilized to express any subroutine, which is known by the technique for gaining the fundamental data regarding the optimization issue. The most imperative instance is the assessment of fitness function, but the evaluation can include complicated simulation techniques. In addition, it can be utilized as a benchmark time unit and is termed as platform independent. In several cases, the number of fundamental evaluations is an imperative measure for solving real-world issues and these computations control internal workings of the technique.
Running time
It refers the metric to benchmark optimization and it generally computes CPU time and wall clock time. Here, the Wall clock time comprises the CPU time and is considered to be more beneficial in real-world setting. The Wall clock time is not reproducible as it tied to a particular hardware platform and combination of software. CPU time is more stable as it is background functions of the computer. In addition, the CPU time is more reliable for similar version of an operating systems implementing on the similar computer models.
Memory usage
The memory usage happens initially in the heap for holding dynamically built objects. It is devised on the basis recovery and consumption. It must be evaluated with respect to whole evaluation of each program. This evaluation is performed in a quick way for helping identifying the elevated watermark on memory space required.
Metrics exploited for efficiency computation
Table 1 displays metrics used for evaluating the efficiency. Here, the number of fundamental evaluations, running time and memory usage as metrics. Paper [22] utilized number of fundamental evaluations as metric for computing efficiency while paper [19] used memory usage as metric for computing efficiency. The papers [16, 17, 18, 19, 20, 23, 21] utilized running time as a metric for computing efficiency.
Metrics used for efficiency computation
Metrics used for efficiency computation
The efficiency of the optimization techniques indicates the computational efforts needed to generate a solution. Here, there exist three primary metrics of efficiency that includes number of fundamental evaluations, running time and memory usage. Here, the paper [22] attained number of fundamental evaluations as 15. The concept of computing the algorithms innovative potential is defined on the basis of number of fundamental evaluations and on the basis of this the efficiency is being computed. Meanwhile, the running time attained by paper [20, 16, 17, 18, 19, 23, 22] are 0.018 sec, 80 sec, 357.6 sec, 0.020 sec, 3047 sec, 4936.7 seconds, and 19.24 seconds, respectively. The memory usage of about 16GB is attained by paper [19]. It must be observed that because of huge number and complication of contemporary computing models, the count of situations wherein time is dominated due to memory access costs tends to be increasing, and thus accuracy of CPU timers are minimized. To enhance the CPU timer precision, the tools, like memory access tracers and cache can assist generating a precise and accurate performance in terms of CPU time. In order to increase the efficiency of data, it is imperative to facilitate any functions of background and it is kept to be minimal. Moreover, any technique utilized to compute the performance must clearly tell, which state of running time is utilized.
Additional metrics utilized for efficiency computation
Table 2 reveals additional metrics used for computing efficiency. Paper [19] utilized adjusted rand score as metric to enumerate efficiency.
Additional metrics used for efficiency computation
Additional metrics used for efficiency computation
The performance assessment on the basis of reliability is inspected below. There exist three metrics for computing the efficiency of a technique and that involves success rate, number of constraint violation and percentage of global solution found. Here, the analysis is done with definition, papers utilized for the measures, theoretical conclusion used and additional parameters used are briefly described below.
Definition
It includes the definition of various metrics that are used for comparing the performance of each optimization techniques based on reliability. This metrics involves success rate, number of constraint violation and percentage of global solution found.
Success rate
The Success rate is obtained by calculating the count of test problems, which are successfully addressed with a chosen tolerance. In optimization techniques, it can be performed with the value of fitness function. It also refers the ratio or success percentage amidst total attempts for performing a task.
Number of constraint violation
It refers the count of constraint violation. The constraint violation is an issue that expresses a syntactically correct, but semantically illegal is termed as constraint violation. It is not used for validating end-user input. The issue of constraint violation happens in the production is known as bug.
Percentage of global solution found
In optimization techniques, a global optimal solution indicates one in which there are no other reliable solutions, which has improved values of objective function. A locally optimum solution is one wherein there are no other better solutions having good fitness values.
Metrics utilized for reliability computation
Table 3 utilized success rate, number of constraint violation and percentage of global solution found as the metrics for evaluating reliability. Papers [22, 23] used success rate as measure for evaluating the reliability while papers [21, 18] used percentage of global solution found as metric for evaluating the reliability.
Metrics used for reliability computation
Metrics used for reliability computation
The reliability of an optimization technique is described as the potential of a technique for performing well with a huge range of optimization issues. The most general performance measure to compute the reliability is the success rate, number of constraint violation, and percentage of global solution found. Papers [22, 23] attained the success rate of about 97.5%, and 90.72%. In addition, papers [21] and [18] attained the percentage of global solution found as 97.6% and 94.837%. While studying the reliability, the researcher must adapt of the technique are deterministic, or non-deterministic and the test are repeated several times if the technique is non deterministic. The reliability is generally devised on the basis of fixed starting point, but it is good to utilize several starting points. In deterministic optimization techniques, the reliability are modelled as the count of problems in the provided test set, and it is addressed by the optimization technique. When handling the non-deterministic techniques, it is essential to continue each test several times for making sure the reliability is evaluated in aggregate, and not skewed with single run. The researchers collects a huge amount of test functions with some process and starting points for analyzing the robustness and reliability of unconstrained optimization techniques.
Additional metrics used for reliability computation
Table 4 displays metrics utilized for evaluating reliability. Here, the Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Square Error (MSE) as termed as metrics for computing reliability. Paper [16] utilized MAE and RMSE as metric for computing reliability. The papers [20, 17] utilized MSE as a metric for computing reliability.
Additional metrics used for reliability computation
Additional metrics used for reliability computation
The performance assessment based on quality of solution is inspected below. There exist three metrics for computing the quality of solution of a technique and that involves fixed cost solution result, Fixed-target solves time and Computational accuracy. Here, the analysis is done with definition, papers utilized for the measures, theoretical conclusion used and additional parameters used are briefly described below.
Definition
It includes the definition of various metrics that are used for comparing the performance of each optimization techniques based on quality of solution. This metrics involves fixed cost solution result, Fixed-target solve time and Computational accuracy.
Fixed cost solution result
If the Fixed-cost is utilized, then there exist several options for quantifying the accuracy of algorithm output. Here, the final error of optimization is validated after implementing the technique for particular time period, count of function calls, and count of iterations. Hence, if the optimization error is small, then the solution quality is better.
Fixed-target solve time
For fixed-target technique, the need time for discovering the solution with an accuracy target is computed. The major issue with fixed-target is that some techniques cannot able to address the test issue. Hence, the termination criterion cannot depend only on accuracy, but it must involve other safety measures, like highest budget of computation.
Computational accuracy
It refers the state of being precise or correct. In optimization technique, if the technique successfully attains the required accuracy, then the time to attain accuracy can be utilized for computing the algorithm quality of that test issue. If the algorithm stops prior to attaining the required accuracy, then it must be termed as unsuccessful.
Metrics utilized for quality of solution computation
Table 5 displays metrics utilized for evaluating quality of solution. Here, the fixed cost solution result, fixed-target solves time and Computational accuracy is termed as metrics for computing quality of solution. Paper [23] utilized MAE and RMSE as metric for computing reliability, whereas papers [21, 18, 23, 24] utilized Computational accuracy as metric for computing quality of solution.
Metrics used for quality of solution computation
Metrics used for quality of solution computation
The quality of solution is imperative domain, while comparing the optimization techniques. The computation of quality befalls in two classes, namely known solution is available, and no known solutions are available. In Known solution available: Whenever the solution required for the issue is obtainable, then two techniques are adapted for computing the output quality, namely fixed target and fixed cost. In fixed target, the needed time to discover solution is computed, whereas in fixed cost, the error is validated after executing the technique for particular time. From the analysis, it is noted that paper [23] used fixed cost solution result for the analysis and attained 33.2% value. The papers [21, 18, 23, 24] attained the computational accuracies of 88.06%, 83%, 89.10%, and 88.04% respectively. The choice of suitable method amongst different method relies on the goal of experiment research, the structure of problem, and the kind of optimization utilized. The researcher must choose the success criterion for fairly comparing the solution and must satisfy the constraints. In case of no known solution available, the set of tests utilized will not contain the known solutions to all problems. This is specifically true if the set of test involves the samples of real-world tools. For computing the algorithmic output quality, other new considerations are needed.
Additional metrics used for quality of solution computation
Table 6 displays additional metrics utilized for evaluating quality of solution. Here, the papers [21] utilized Standard deviation, and paper [24] utilized mean and variance for computing the quality of solution.
Additional metrics used for quality of solution computation
Additional metrics used for quality of solution computation
The performance assessment of the algorithm in big data is inspected below. There exists three metrics for computing the efficiency of a technique and that involves accuracy, sensitivity and specificity. Here, the analysis is done with definition, papers utilized for the measures, theoretical conclusion used and additional parameters used are briefly described below.
Definition
It includes the definition of various metrics that are used for comparing the performance of each optimization techniques in big data. This metrics involves accuracy, sensitivity and specificity.
Accuracy
It articulates degree of calculated value to its input value in categorizing the big data using several optimization techniques, and modelled by,
where,
It refers ratio of positives and are identified by big data classification techniqueprecisely, and formulated as,
It symbolizes the ratio of negatives discovered using proposed model accurately and is modelled by,
Table 7 displays metrics utilized for computing the performance of algorithm in big data. Here, the papers [22, 23, 24, 25, 19, 17, 18, 21] utilized accuracy as metric for evaluating the big data and paper [22, 24, 25] utilized sensitivity and specificity for computing the performance of algorithm in big data.
Metrics used for big data classification
Metrics used for big data classification
Additional metrics used for computing the algorithm performance in big data
The analysis of big data is done to uncover trends, correlations and patterns in huge quantity of raw data for making effective decisions. The analysis of big data is complicated process to examine big data for uncovering information, like correlations, hidden patterns, and market trends. Generally, the accuracy depends on the way wherein the data is accumulated. It is handled by comparing various computations as disparate sources. Here, the accuracy represents the fraction of instances, which are precisely classified. Sensitivity is evaluated as a proportion of positive predictions that are precisely classified and specificity is a proportion of negative predictions that are precisely classified. The accuracy depends on the manner wherein the data is accumulated. Papers [22, 23, 24, 25, 19, 17, 18, 21] produced the accuracy values of about 97.59%, 90.72%, 88.04%, 99.12%, 83%, 81%, 88.06%, and 83% respectively. The papers [22, 24, 25] measured sensitivity of about 95.81%, 90.74%, and 97.91% respectively. Meanwhile, papers [22, 24, 25] measured specificity of about 96.76%, 83.07%, 99.47% respectively.
Additional metrics used for big data analysis
Table 8 presents additional metrics used for computing the performance of algorithm in big data. Paper [16] used R-Squared metric for computing the performance of algorithm in big data.
Analytical study based on different techniques
The judgment based on several optimization techniques adapted for big data classification using year of publication, employed strategies, software tools, and performance evaluation values are inspected below.
Assessment with implementation tool
Assessment with implementation tool
Assessment with year of publication.
Assessment with strategies.
This subsection inspects evaluation with the years in which 10 research papers based on optimization techniques are published for big data classification. The assessment performed with publication year is exposed in Fig. 1. Out of 10 papers, more research papers for big data classification were published in year 2021.
Assessment on the Basis of Strategies
It examines types of optimization methodologies adapted for theclassification of big data and is modelled in Fig. 2. Using Fig. 2, it is noted that 30% of papers used swarm intelligence based optimization technique, while 40% of papers used nature-inpsired optimization techniques and remaining 30% of papers covered evolutionary-based optimization technique. From analysis, it is noted that nature-inpsired optimization technique are most commonlyusedmethod for classifying big data.
Assessment with software tool
The assessment of software tool adapted in previous strategies is described in Table 9. The major implementation tools adapted for performing big data classification is Matlab. Here, papers [20, 22, 23] used Matlab, while paper [24] used java as software tool for classifying big data. From Table 7, Matlab is termed as most frequently used software tool for performing classification of big data.
Analysis with evaluation measure values
The assessment with values of evaluation measures is described. The assessment with running time and accuracy is examined.
Analysis with running time
Table 10 portrays the evaluation with running time considering three ranges. Based on the table, it is noted that paper [18, 20] had attained least running time in range 20–25 sec while paper [17, 21] attained running time in range 25–30 sec. The running time above 30sec is attained in paper [19]. From the table, it is noted that reference [18, 20] had utilized less running time in range 16–18 sec.
Analysis with accuracy
Table 11 portrays the evaluation with accuracy considering four ranges. Based on the table, it is noted that paper [17, 21] had attained accuracy in range 80–85% , while paper [24] and paper [23] attained accuracy in range 85–90% and 90–95%, respectively. The accuracy in range 95–100% is attained in paper [22, 25]. From the table, it is noted that reference [22, 25] had attained highest accuracy in range 95–100%.
Assessment with running time
Assessment with running time
Assessment with accuracy
Comparison of top 10 algorithms with DMS algorithm to reveal superiority
Table 12 presents the comparison of top 10 algorithms with DMS algorithm [26] to reveal superiority. Here, top ten optimization algorithms are analyzed with respect to DMS for exploring the superiority of DMS algorithm. Considering CBBO, Parameter optimization algorithm, WOA, ACO, ABC, and GA, these method was analyzed with running time and produced values as 0.018 sec, 80 sec, 357.6 sec, 0.020 sec, 3047 sec, and 4936.7 sec respectively. Likewise, the PSO produced number of fundamental evaluation value as 15. Also, the ABC produced better memory usage of 16 GB, and thus these techniques revealed high efficiency in classifying the big data in terms of number of fundamental evaluation, running time and memory usage. The GA and PSO attained the highest success rate of about 90.72%, and 97.5%, respectively. Here, 94.837% and 97.6% are percentage of global solution found considering ACO and OSA. The GA offered enhanced outcome with value of 33.2% in terms of Fixed cost solution result. Considering Computational accuracy, the algorithms, like ACO, OSA, GA, Atom Taylor BSA, and DMS produced better outcomes with computational accuracy values of 88.06%, 83%, 89.10%, 88.04%, and 95% , respectively. Thus, the proposed DMS algorithm produced better accuracy of 95% and smallest running time of 0.016 sec. Hence, these algorithms have better Quality of solution in classifying the big data. These top ten optimization algorithms has either better efficiency, or better reliability or better quality of solutions, whereas the DMS algorithm has better efficiency, reliability and quality of solutions collectively, which shows the superiority of DMS in contrast to these top ten optimization algorithms.
Conclusion
This paper presents a theoretical analysis with top 10 optimization algorithms considering DMS algorithm. In addition, the comparison is carried out for highlighting the pitfalls by examining the performance of each optimization techniques. Several techniques for reporting benchmarking outcomes are also discussed. It involves various figures and tables that summarize the process and offers advice for leading fair benchmarking. The analysis of DMS algorithm-based big data streaming in spark architecture is done in comparison with various optimization algorithms. The ten algorithms of optimization based on recency and popularity are considered. The performance analysis based on efficiency, reliability, solution quality, big data and superiority of DMS with respect to top 10 algorithms are inspected. The papers [21, 18, 23, 24] attained the computational accuracies of 88.06%, 83%, 89.10%, and 88.04% while the DMS algorithm attained the highest computational accuracy of about 95%. In addition, the smallest running time of DMS is 0.016sec while running time evaluated by papers [20, 16, 17, 18, 19, 23, 22] are 0.018 sec, 80 sec, 357.6 sec, 0.020 sec, 3047 sec, 4936.7 sec, and 19.24 sec. From the above analysis, it reveals that DMS is highly effective compared to other top ten algorithms for performing big data classification.
