Abstract

Dear Colleague: Welcome to volume 22(1) of Intelligent Data Analysis (IDA) Journal.
This issue of the IDA journal, the first for year 22 of our publication, contains 10 articles representing a wide range of topics related to the theoretical and applied research in the field of Intelligent Data Analysis.
The first three articles of this issue are about various aspects of data preprocessing. Ma et al. in the first article of this issue discuss privacy protection in social networking data and argue that most privacy protection systems are based on the simple graph only. In these systems, the weight values on the edges represent the tightness between the nodes and any research based on weights in privacy protection field is still relatively rare. The authors consider protecting the weighted social networks from weight-based attacks and propose a method based on the weighted social networks, named k-weighted generalization anonymity. To evaluate their proposed method, the authors use a number of publicly available data sets where they present their results. Galarus and Angyk in the second article of this group advocate the need for robust solutions to the challenges of spatio-temporal data quality assessment that include and go beyond assessment of accuracy. The authors focus on the development and evaluation of such a representative, interpolation-based solution for the assessment of spatio-temporal data quality. When applied to a real-world, meteorological data set, the authors identify numerous problematic sites that otherwise have not been flagged as bad. The authors believe that there are many problems with real data sets that, in the absence of an approach like the one proposed here, these problems would largely gone unidentified. Ahlayyal et al. the next article argue that reaction prediction to the stock market, especially based on released financial news articles and published stock prices, still poses a great challenge to researchers because the prediction accuracy is relatively low. The objectives of this study are to investigate the performance of five statistical metrics, and to introduce feedback variables to capture the interaction between the released news and the published stock prices; and also to introduce a prediction model that integrates features from financial news and a stock price value series. The authors perform various experiments in which they demonstrate how to choose the best feature sets as well as the best feedback measure.
The next two articles are about various aspects of unsupervised and supervised machine learning methods in IDA. Hosny et al. in the first article of this group argue that high dimensionality of the data is a challenge in data clustering and advocate that we often need to assign a relative weight for each feature to indicate its importance during the clustering process. The authors propose a co-evolutionary algorithm for the dynamic adjustment of feature weights during data clustering. Their extensive experimental results on several datasets from UCI machine learning repository indicate the efficacy of their proposed approach. Lee in the next article of this issue argues that multinomial naive Bayes approach could improve multi-label classification in a number of ways. The author uses the value weighting method, a new fine grained weighting approach, to calculate the weights of the feature values. Lee then employs a co-training method to incorporate the dependencies among the class values. The results of the experiments show that the proposed approach outperforms other state-of-the-art methods.
And finally the last five articles of this issue are about enabling technologies in IDA. Aghababaei and Makarehchi in the first article of this group discuss mining Twitter data for crime trend prediction. The authors pursue the idea of whether a social media context can provide socio-behavior “signals” for a crime prediction problem. The hypothesis in this research is that crowd publicly available data in Twitter may include predictive variables which can indicate changes in crime rates without being only limited to the availability of historical crime records of specific locations. The authors have developed a prediction model for crime trend prediction, where the objective has been to employ Twitter contents to predict crime rate directions in a prospective time-frame. Their results have revealed the correlation between features extracted from the content as content-based features and the crime trends. Song et al. in the next article of this group argue that most high-utility itemset discovery algorithms seek patterns in a single table, and few are dedicated to processing data stored using a multi-dimensional model. The authors investigate the problem of mining high utility itemsets in multi-relational databases and propose two algorithms for star schema-based data warehouses. Neither of the proposed algorithms materialize the join operation between tables, thus making use of the star schema properties. Their experiments show that both RHUI-Mine and RHUI-Growth are effective approaches for mining high utility itemsets in multi-relational data. Boldt et al. in the eighth article of this group also discuss crime analytics and investigate the possibility of automatically estimating offenders’ risk exposure and level of pre-crime preparation for residential burglaries. Such estimations can assist law enforcement agencies when linking crimes into series and thus provide a more comprehensive understanding of offenders and targets, based on the combined knowledge and evidence collected from different crime scenes. In their experiments the authors analyze to what extent 16 machine-learning algorithms could generalize both offenders’ risk exposure and preparation scores from the criminal profilers’ ratings in a large list of residential burglaries. Zheng et al. in the next article propose a novel supervised particle swarm optimization (S-PSO) classification algorithm for fault diagnosis. In order to improve the accuracy of fault diagnosis and obtain the global optimal solutions with a higher probability, the authors propose two strategies: a hybrid particle position updating strategy and a fixed iteration interval intervention updating strategy. Their experimental results demonstrate that the proposed classification algorithm can overcome the problems in the classical clustering algorithms, which only consider the similarity of data instead of their physical meanings. The comparison on a complex engine borescope image texture feature classification is also conducted. Their results show that the performance of their classification algorithm is robust. And finally, Wang et al. in the last article of this issue discuss that with the emergence of Internet data, there is a clear need for intuitive and adaptive methods to analyze a series of evolutions and argue that most approaches completely depend on topic models and only focus on whether topics are changed while ignoring the degree of changes. The authors propose a framework for topic evolution based on semantic connection which not only indicates the content similarity between documents but also shows the time decay for an adaptive number of topics and rapid responses to the changes of contents. Their results show that their proposed method has a better performance in reducing redundant topics, avoiding topic suppression, and discerning the vanishment of old topics and the appearance of new ones.
In conclusion, we would like to thank all the authors who have submitted the results of their excellent research to be evaluated by our referees and published in the IDA journal. As usual, in addition to six regular issues, we also have one special issue planned for 2018. We look forward to receiving your feedback along with more and more quality articles in both applied and theoretical research related to the field of IDA.
With our best wishes
Dr. A. Famili
Editor-in-Chief
