Abstract
The economic interaction between the countries of the world is gradually strengthening. Among them, the US stock market is a “barometer” of the global economy, which has a huge impact on the global economy. Therefore, it is of great significance to study the data in the US stock market, especially the data mining algorithm of abnormal data. At present, although data mining technology has achieved many research results in the financial field, it has not formed a good research system for time series data in stock market anomalies. According to the actual performance and data characteristics of the stock market anomaly, this paper uses data mining techniques to find the abnormal data in the stock market data, and uses the isolated point detection method based on density and distance to analyze the obtained abnormal data to obtain its implicit useful information. However, due to the defects of traditional data mining algorithms in dealing with stock market anomalies containing uncertain factors, that is, the errors caused by other human factors, this paper introduces the roughening entropy of the uncertainty data and applies its theory to the field of data mining, a data mining algorithm based on rough entropy in the US stock market anomaly is designed. Finally, the empirical analysis of the algorithm is carried out. The experimental results show that the data mining algorithm based on rough entropy proposed in this paper can effectively detect the abnormal fluctuation of time series in the stock market.
Introduction
Since the twenty-first century, computer science and technology, statistical theory and database technology have been continuously developed and improved. This has led to the continuous generation of data stored on databases and other storage devices. With the continuous deepening of the financial system of the world and the continuous innovation and development of the financial market, an important feature in the financial field is the large amount of data analysis and many uncertain factors. As the global economic “barometer” of the US stock market, its data also has such characteristics, and has a huge impact on the economies of the world. Therefore, these data have been paid more and more attention from people, and people are eager to extract valuable information and implicit knowledge from these massive data. At this time, even with advanced computers, the traditional data analysis method has great limitations in analyzing and processing large amounts of data. Data Mining was born in the late 1980s, and it is a powerful tool for processing large amounts of data. The so-called data mining is to mine hidden or unknown knowledge or rules that may be of interest to users from a large amount of data information [1]. Data mining is an interdisciplinary subject that combines the knowledge of various disciplines, such as statistics, artificial intelligence, neurology, machine learning, database technology, etc., and has received extensive attention from various academic circles [2]. It is a new research hotspot. Nowadays, data mining technology is becoming more and more mature, and its application is becoming more and more extensive. It plays an important role in data mining in military, economic, industrial, medical, and commercial fields.
In the process of data mining, isolated points (also called abnormal points, noise, etc.) are often found. As an important research direction of data mining, the isolated point detection method has been highly valued by scholars and researchers. The study of isolated points began in the 1980s. After Hawkwind [3] first proposed the concept of isolated points, many researchers have defined isolated points, but have not conducted in-depth research on isolated point detection methods. In the process of studying statistical theory, V. Barnett [4] proposed the definition of isolated points in the field of statistics, that is, the essence of isolated points is that data points are significantly different from other data in the data set. The “significant difference” in the definition refers to statistical bias. V. Barnett not only defines isolated points from a statistical point of view, but also gives methods and ways to find isolated points using statistical methods. Its research is statistically a groundbreaking measure of measuring isolated points and a groundbreaking study of future anomalies. At the same time, foreign scholars and researchers have conducted long-term and fruitful research in the field of data mining. In the early days, Olivia Parr Rud [5]’s “Data Mining Practice” did not specifically discuss the isolated point detection method and process, but introduced the life cycle model of data mining. Also in Mehmed Kantardzic [6]’s “Data Mining Concepts, Models, Methods and Algorithms”, although its concepts, models, methods and algorithms are described from the various topics of data mining, cluster analysis, decision trees, There are few aspects of association rules, regression models, etc., but there is almost no description of outlier detection. It is only a brief introduction to the by-products and precursor steps of cluster analysis, and does not involve the isolated point detection process. However, data mining has more important and practical significance in special areas such as weather anomaly detection and fraud identification. Therefore, automatic detection of outliers from anomalous data has gradually become an important branch of data mining research. From the 1990s to the present, data mining theory and methods have developed rapidly, which also provides a rich theoretical basis and algorithm basis for isolated point detection. For example, Agrawal R et al. [7] proposed the association rule mining method, NGR Han J [9] led to the theory of spatial data mining, Ester M et al. [8] proposed density-based clustering analysis algorithm, and Ramakrishnan R et al. [10] proposed cluster analysis methods based on big data sets. These data mining and clustering analysis theories and algorithms have laid a solid foundation for the rapid development of the isolated point detection method. At the same time, researchers have achieved many results in the time series of anomalous data mining algorithms. Jagadish [11] defines anomalies in a time series as deviation points that are significantly different from adjacent points in a time series. And to reduce the fitting error, the deviation point from which the maximum value is removed is employed. Yamanishit [12] uses model ideas to model historical time series data and find isolated points by measuring the deviation of new sequence points from the model. Shahabi [13] proposed an improved anomaly pattern of TSA-Tree, which is found by the local maximum of the wavelet coefficients. An abnormal point is detected by a sudden change in the abnormal pattern in the time series. Keogh [14] proposed an anomaly model based on linear time and space range, which is defined as the case where the frequency of occurrence of the pattern is significantly different from its expected probability of occurrence. At the same time, the Tarzan algorithm based on this abnormal pattern detection is also proposed. The effectiveness of abnormal pattern detection is further verified by predicting the expected occurrence probability of anomalous patterns using the Markov model.
Domestic scholars’ research on isolated point detection methods in data mining is still in its infancy, and it mainly focuses on the application of abnormal data mining. In Wang Hongding, Tong Yunhai et al. [15]’s “The Research Progress of Abnormal Point Mining”, the research process of foreign abnormal data mining is discussed and summarized, and the existing anomaly data mining methods are summarized in combination with the current hotspots. Compare and then explore the future research directions and challenges of anomaly data mining. In the financial industry data mining, Zhang Jinliang et al. [16] proposed a merger algorithm for financial market data mining. The data mining applications in a sequential pattern, association analysis, cluster analysis, deviation detection and evolutionary genetic simulation are discussed respectively. The application of the mining algorithm in the financial market is described systematically. Qi Hongwei et al. [17] proposed an algorithm based on variance anomaly detection model, and applied the algorithm to the detection of abnormal returns in the stock market. Weng Xiaoqing et al. [18] proposed a method for detecting multi-variable time series (MTS) anomalous subsequences based on sliding event window, and excavated anomalous subsequences (including anomalous data) for the MTS data set of the Shanghai Stock Exchange. Zhou Xiaohua et al. [19] used the multifractal spectrum method to study the changes of the stock market before and after the large fluctuations, and verified that the multifractal spectrum method can predict the abnormal changes before and after the stock price fluctuates greatly. Sun Jinhua et al. [20] used fractal theory to solve time series outliers and mining solution sets of optimization problems according to the characteristics of stock time series. Du Hongbo et al. [21] proposed an abnormal subsequence detection algorithm based on Local Linear Mapping (LLM). And using two anomaly detection indicators, the abnormal subsequences in the stock trading time series are detected [22].
Although the data mining technology in the financial industry has been widely used, there are many cases and research results in the stock market using data mining technology. However, from the research on isolated points at home and abroad, most of the research focuses on Focusing on the concept and algorithm of isolated points, there is not much research on the combination of isolated point mining and industry, and there are few studies on the combination with the stock market. In the existing research literatures, the stock market data is generally used as the experimental data set, and then the new anomaly detection method is introduced to find the abnormal data from the time series data, and then the feasibility and effectiveness of the algorithm are verified [23]; or Only study the changes in the characteristics of the stock market when anomalies occur. However, these algorithms are not able to quantify the changes in the before and after characteristics found, and have little meaning for investment decisions. At present, the research on data mining in stock market anomalies is not deep enough, and a systematic and complete research system has not yet been formed. And the existing data mining mature algorithms are mostly used in numerical data, but in reality, many data are non-numeric, and non-numeric data is also more and more, such as time series in the stock market data. To better process the uncertain data and make the data in the stock market anomaly more accurate, this paper introduces the rough entropy theory and designs a data mining algorithm based on rough entropy.
Data mining is an information processing process that extracts implicit useful information and knowledge from a database. It is widely used. As long as it is a database with analytical value, we can use data mining tools to mine valuable information. The main application areas of data mining include biology, industrial production, artificial intelligence, financial data analysis, medical diagnosis, scientific research, and engineering diagnosis. Among them, researchers at home and abroad have carried out research work on abnormal data mining algorithms targeting time-series data, and invested in the application of the stock market. Zbigniew R. Struzik and Arno P.J.M.SiebeS applied the multifractal theory of wavelet transform to financial time series data and detected the abnormal points in the financial time series. Thierry Ane et al. introduced an anomaly detection method based on the AR (1)-GARCH (1, 1) model and applied it to the detection of abnormal returns of stock indices in Asian stock markets. Aiping Jiang and Shuang Ma used fractal theory to predict the fluctuation trend and inflection point of Hong Kong’s Hang Seng Index, and also played a good role in monitoring abnormal points. Therefore, this paper designs a data mining algorithm based on rough entropy and applies it to the isolated point detection in the US stock market anomaly.
Although domestic and foreign scholars have done a lot of research work on isolated point detection methods, they mainly focus on isolated point mining algorithms, including algorithm implementation, applicability, optimization, etc, which are separated from practical problems and industry background, and can’t make objective and practical explanations for the results of outlier detection. At present, with the stock market as the background, the research on application data mining technology is still not deep enough, and it is still in the exploration stage. And the existing data mining mature algorithms are mostly applied to numerical data. There are not many types of research on abnormal data in the stock market with time series as the object. To detect and analyze the data in the stock market anomaly more accurately, this paper introduces the rough entropy of the uncertainty tool, and applies its theory to data mining technology. A data mining algorithm based on rough entropy in the US stock market anomaly is proposed. By combining rough entropy theory and data mining technology, the data mining results in stock market anomalies can be analyzed and analyzed, and the effective information in stock market anomalies can be found and it is helpful to grasp the operating rules of the stock market. The experimental results of the algorithm show that the data mining algorithm based on rough entropy proposed in this paper has certain rationality and feasibility.
The theory and design of data mining algorithm based on rough entropy
Rough entropy theory
The rough set theory is a data analysis theory proposed by the Polish mathematician Professor Zdzislaw Pawlak. It is another mathematical tool for dealing with incomplete and uncertain type data after probability theory, fuzzy set theory and evidence theory. Effectively analyze and deal with incomplete information such as inaccuracy, incompleteness, inconsistency, etc., and discover hidden knowledge. In general, the existence of entropy in a system is always related to the relevant factors and states of the system. The magnitude of the system entropy indicates the degree to which the system deviates from the equilibrium state. Entropy is the best tool and measure for measuring uncertainty. The definition of rough entropy is:
P ⊆ R is a set of equivalence relations on U, and P is derived as U/ - IND (P) = {X1, X2, . . . , X
n
} on U, then the rough entropy of P is defined as:
Where: |X
i
| represents the cardinality of the set X
i
, and
For the above information system, to solve the weakness that the rough set cannot directly deal with the continuous attribute, it is necessary to discretize the continuous attribute. For the range V
α
= [l
α
, r
α
], there is a set of points
The attribute α ∈ R is necessary in R if and only if Sig (α, R) >0. Let the core of the attribute set R be CORE(R) = {α ∈ R| sig(α,R)>0}.
Data mining is the process of extracting hidden knowledge from a large number of incomplete, fuzzy, noisy, and random data that people don’t know beforehand but have potential effects. Data mining is an interdisciplinary subject that combines knowledge from a variety of disciplines such as statistics, artificial intelligence, neurology, machine learning, and database technology [24]. Common data mining techniques and methods include statistical methods, rough sets, decision trees, artificial neural networks, genetic algorithms, fuzzy logic, etc.
(1) Statistical methods
There are two kinds of relationships between database field items: function relations and related relationships. The analysis of them can be carried out by statistical methods, that is, using statistical principles to analyze the information in the database. Common statistics, regression analysis, correlation analysis, and difference analysis can be performed.
(2) Rough set
Rough set theory is a mathematical tool for studying inaccurate and uncertain knowledge. The rough set method has several advantages: no need to give additional information; simplify the expression space of the input information; the algorithm is simple and easy to operate. The object of rough set processing is an information table similar to a two-dimensional relation table.
(3) Decision tree
Decision trees are an algorithm commonly used in predictive models. They find valuable and potential information by purposely classifying large amounts of data. The most influential and earliest decision tree method is the famous information entropy-based id3 algorithm proposed by quinlan. Its main advantages are simple description and fast classification, which is especially suitable for large-scale data processing.
(4) Artificial neural network
Neural Network is a mathematical representation of the human brain thinking [25]. It is a typical representative of machine learning in data mining. Neural networks are abstract computational models of the human brain. We know that there are tens of billions of neurons in the human brain (the micro-units that process information in the human brain). These neurons are connected, so that the human brain produces sophisticated logical thinking. The “neural network” in data mining is also composed of a large number of artificial neurons (microprocessing units) distributed in parallel. It can learn from empirical knowledge by adjusting the connection strength, and can apply this knowledge. Therefore, in recent years, more and more people have paid attention to it in the field of data mining.
(5) Genetic algorithm
A genetic algorithm is a kind of randomized search method that learns from the evolutionary law of the biological world (the survival of the fittest, the genetic mechanism of the survival of the fittest). It was first proposed by Professor J. Holland of the United States in 1975. Its main feature is that it directly operates on structural objects, without the limitation of derivation and function continuity; it has inherent hidden parallelism and better global optimization ability. The probabilistic optimization method can automatically acquire and guide the optimized search space, adaptively adjust the search direction, and does not require certain rules. The implicit parallelism of genetic algorithms, its ease of integration with other models, etc. make it applicable in data mining.
(6) Fuzzy logic
The fuzzy set theory is used to evaluate the actual problems, fuzzy decision, fuzzy pattern recognition, and fuzzy cluster analysis [26]. The higher the complexity of the system, the stronger the ambiguity. The general fuzzy set theory is to use the degree of membership to describe the fuzzy things. Used in data mining, commonly used for confidence calculation, evidence synthesis and so on.
Isolated point detection method
In the process of data mining, the database may contain some data objects, which are inconsistent with the general behavior or model of the data. A small number of data objects that have large deviations or differences from the commonality of most other data in the data set are called abnormal points, and are also called isolated points, outliers, and the like. Hawkins gives the essential definition of anomalies: anomalies are data that are significantly different in the data set, making them suspected to be non-random deviations, but generated in completely different mechanisms. An isolated point may be generated due to a measurement or execution error, or it may be due to an exception in the data itself. The detection of outliers allows one to find small patterns (as opposed to clusters) in the dataset, ie objects that are significantly different from other data in the dataset; and analysis of isolated points may be more valuable than the information contained in general data. The following are common detection methods for isolated points:
(1) Statistical based outlier detection method
The basic idea of a statistical-based algorithm is to presuppose a probability model of a data distribution based on the characteristics of the data set, and then determine the anomaly based on the inconsistency of the model.
Statistical inconsistency checks have two assumptions: a working hypothesis and an alternative hypothesis. A working hypothesis H is a proposition: the entire data set of n objects comes from an initial distribution model F, ie H:o∈F, I = 1, 2,...,n. If there is no statistically significant evidence to support the rejection of this hypothesis, then the assumption is retained. The inconsistency test verifies whether an object o is significantly larger or smaller concerning the distribution F. The significance probability is estimated. If the significance probability is small enough, the working hypothesis is rejected and the alternative hypothesis H is adopted. It states that o comes from another distribution model G.
o maybe an exception under one model and non-exact under another model, that is, the result is very dependent on the choice of the distribution model F. The ability to detect can be measured by the probability that the working hypothesis is rejected when o is abnormal.
(2) Distance-based isolated point detection method
The basic idea of distance-based algorithms is to detect small patterns by the size of the distance, which is considered to be not enough neighbors. It can be described as that in the data object set N, at least P objects and the object O have a distance greater than d, then the object O is a distance-based abnormal point with parameters P and d. The advantage of the distance-based detection method is that he does not need to know the characteristics of the data set itself in advance, and is domain-independent, but the problem lies in the difficulty in estimating the parameters P and D. The determination of different P and D parameters can have a large impact on the results. Since the parameters P and D of the distance-based method are certain, the isolated points found are global outliers.
The specific discovery process is as follows: First, data preprocessing for large databases, including null values and data normalization; secondly, data clustering analysis is performed on the preprocessed database, and then multiple clusters and a special class are generated. That is, all classes of outlier data. It can be seen that the key to the isolated point detection method based on distance is how to perform effective cluster analysis on the database, so that the data in the DB_1, DB_2,..., DB_K cluster set is normal pattern data, but not in the above cluster the data is outlier data.
(3) Isolated point detection method based on deviation
The basic idea of a bias-based approach is to determine an anomaly by examining the main features of a set of objects. If an object’s features are “offset” from a given “description”, the object is considered to be anomalous. Existing deviation-based methods mainly include sequence anomaly techniques and OLAP data cube methods. The former describes the basic characteristics of the sample set based on the variance of the population of the sample set. All the samples deviating from these features are abnormal. This method is too ideal for the existence of anomalies. Complex data is not very effective. The latter uses the data cube to determine the anomalous region in large-scale multidimensional data. If the cell value of a cube is significantly different from the expected value obtained from the statistical model, the cell value is considered to be an isolated point, when there are many involved multiple layers. At the conceptual level of dimensioning, manual detection becomes very difficult.
The sequence exception technique mimics how humans recognize anomalous objects from a series of speculatively similar objects. After observing the continuous sequence, one of the data was quickly found to be significantly different from the other data. Given a set S of n objects, it establishes a sequence of sub-sets, S1, S2,... Sn, 2 < m < n, satisfying Sj-1 ∈ Sj, Sj ∈ S. The algorithm then uses the dissimilarity function to estimate the degree of dissimilarity between subsets in the sequence. A collection of isolated points is defined as a minimum subset of objects of a class of objects, and the removal of these objects minimizes the dissimilarity of the remaining sets. This method requires prior knowledge of the characteristics of the data to determine the dissimilar function; if the selection of the dissimilar function is not appropriate, satisfactory results cannot be obtained, so it is difficult to use it in practical problems.
This OLAP data cube technology is a form of discovery-driven exploration that pre-computes values that indicate data anomalies and then uses these values to guide the user through data analysis at all levels of the set calculation. If the cell value of the cube is significantly different from the expected value obtained from the statistical model, the cell is treated as an exception and represented by a visual cue. For example, the background color reflects the degree of anomaly in each unit.
(4)Density-based outlier detection
Rotogi and Raalaswamy pointed out that the distance-based method proposes uniform P and D parameters for the global clustering data, but if the density of each cluster itself is different, the distance-based method will omit some isolated point data. Therefore, a local anomaly mining algorithm based on the density model is proposed. The abnormal point is determined by the calculation of the local anomaly point factor LOF. As long as the LOF of an object is much larger than 1, it may be an abnormal point. The LOF of the object near the core point in the cluster is close to 1, and the LOF of the object at the edge of the cluster or outside the cluster is relatively large, so that the local anomaly point can be detected, which is closer to the characteristics of the actual data set. The main problem of this traditional local anomaly mining algorithm is the difficulty in selecting the local range parameter Minpts. It can be evaluated by using the multi-granularity deviation factor instead of Minpts, so that a better solution can be obtained.
(5)High-dimensional data isolated point discovery
Most algorithms are not ideal for finding high-dimensional data outliers. Aggarwal and Yu discuss a method for high-dimensional data outlier discovery. It maps the high-dimensional data set to the low-dimensional subspace, and determines whether there is abnormal data according to the sparseness of the mapped data of the subspace.
Divide each dimension of the data space into m equal depth intervals. The equal depth interval means that after mapping data into this one-dimensional space, each interval contains equal 1/m data points. Each of the dimensions in the k-dimensional subspace of the data set takes an equal depth interval to form a k-dimensional cube definition, and the k-dimensional cube D contains n (D) points. Define a sparse coefficient S(D) for k and n(D) to measure the sparseness of the data. The smaller s(D) is, the more sparse the data is in this cube. When s(D) is negative, the data point in cube D is lower than the expected value. The essence of finding the most sparse k-dimensional cube is to find anomalous patterns that only a few data points can match, and these data points are the anomalous data in high-dimensional data.
The experiments of data mining algorithm based on rough entropy for US stock market abnormality
Data source
The experimental data is from the Dow Jones Index of the US stock market. Among them, the data range is selected from 2008 to 2018.
Data processing
To ensure the integrity of the information and the accuracy of the data, this paper takes the logarithm of the original data sequence {I k } of the Dow Jones index of the US stock market in the past ten years and makes a first-order difference, the formula is R k = ln I k - ln Ik-1.
Where R k is the Dow Jones index volatility on day k, I k is the Dow Jones index on day k, and Ik - 1 is the Dow Jones index on k-1.
After the data processing, the Dow Jones index fluctuation sequence of the US stock market has a total of 1186 data points.
Evaluation criteria
To evaluate the effectiveness of the proposed algorithm more objectively, this paper introduces the traditional method of abnormal data mining, and also applies to the anomaly detection in the US stock market. Finally, the two methods are discussed and analyzed through experimental results, to evaluate the rationality and feasibility of the proposed algorithm.
Experimental results
The rough entropy-based data mining algorithm designed in this paper is applied to the time series anomaly detection of the US stock market. The final US stock market anomaly eigenvalue is shown in Fig. 1.
Since phase space reconstruction can be used on a set of sub-time sequences, the points of the above-mentioned aberrant subsequence positions in the original sequence are shown. That is to say, the structure of p k the reconstructed phase space corresponds to I p in the original sequence, thereby obtaining the initial position of the abnormal subsequence corresponding to each abnormal pattern in the original sequence, and the result is as shown in Fig. 2.

Abnormal eigenvalues of the US stock market.

Isolated point distribution in the US stock market anomaly.

Isolated point distribution map detected by traditional methods.
In the definition of isolated points in the previous section, it can be seen that the isolated points are points that deviate from the overall distribution. From a statistical point of view, quantile can be used to study the distribution of data, which is also a common traditional method of detecting anomalies. To better study the effectiveness of the algorithm in the stock market, the following compares the traditional methods and gives the experimental results of the traditional anomaly data mining algorithm, as shown in Fig. 3.
Figure 1 shows the anomalous eigenvalues of the US stock market in the past decade. It can be seen that there are 30 objects in the abnormal subsequence set obtained by the test result, that is, the corresponding stock market sequence has 30 abnormal patterns n, accounting for 3.1% of the total. Figure 2 is a distribution of isolated points in the US stock market anomaly. It can be seen that the Dow Jones index fluctuation series in the US stock market fluctuated drastically during the period from 2008 to 2010, and a large number of outliers appeared. This paper believes that this is due to the impact of the global financial crisis on the US financial market. It is precisely because of the volatility of the capital market and the money market that the US stock market is abnormal. After the financial crisis, global liquidity contracted on a large scale, and the policies adopted by the US government to respond to the financial crisis began to show certain effects. However, due to the certain contingency of these policies and measures, a series of crisis symptoms such as economic growth decline, export slowdown, and financing difficulties for SMEs have emerged. These macroeconomic fluctuations will inevitably be passed on to the stock market. From 2012 onwards, as the US economy develops better, the US stock market volatility is relatively flat. Here is just a brief analysis of the abnormal fluctuations in the US stock market from the economic background, but it is still unclear about the specific factors that cause the anomalies to occur. Figure 3 is an isolated point distribution map detected by a conventional method introduced to form a comparative experiment with the algorithm. It can be seen from the above that the traditional method of sliding quantile gap is used to detect the isolated points of stock time-series data. Because of the criterion of the algorithm for determining the isolated points, the results are evenly distributed. It can also be seen from the fluctuations that the algorithm has a good detection effect on the isolated points deviating from the overall distribution of the data, and can roughly detect the occurrence of the pre-determined isolated points. The outlier detection result using the algorithm of this paper is known as Fig. 2, the data mining algorithm based on rough entropy can detect isolated points with a large degree of deviation from the overall distribution of data, and has the function of traditional methods. However, this method reduces the attention of traditional methods to a single isolated point, focusing on the anomaly in subsequence mode. This will help to detect the internal information implied in the stock market anomaly, and can better achieve pre-judgment and decision-making.
Conclusions
Today’s era is an era of information explosion. As people accumulate more and more information, more and more important knowledge is hidden in this massive information. In the financial sector, we also face the problem of not being overwhelmed by massive amounts of information, but also able to find useful information in time. With the deepening of economic globalization and the continuous development of science and technology, people are increasingly favoring the application of data mining technology to solve this problem in the financial field, but it is not very common for data mining applications in the US stock market anomaly. As an American stock market affecting the global economy, the research on its anomalous data mining algorithm has important theoretical and practical significance. Therefore, this paper studies the abnormality of the US stock market and proposes a data mining algorithm based on rough entropy. Due to traditional data mining algorithms have limitations in dealing with such uncertain data as time series, this paper first introduces the theory of rough entropy and applies it to data mining technology. Then, based on the abnormal data in the stock market data found, the isolated point detection method based on density and distance is used to analyze the obtained abnormal data to obtain the hidden useful information. Finally, through the experimental analysis of the Dow Jones index of the US stock market in the past ten years, the results show that the algorithm designed in this paper not only has the functions of the traditional method, but also can mine the hidden internal useful information.
Footnotes
Acknowledgments
This work was supported by the Fundamental Research Funds for the Central Universities (Grant No. 2017CDJSK02YJ06, 2020CDSKXYJG007).
