Editorial

Abstract

Dear Colleague: Welcome to volume 22(2) of Intelligent Data Analysis (IDA) Journal.

This issue of the IDA journal, the second issue of our twenty second year of publication, contains 10 articles representing a wide range of topics related to the theoretical and applied research in the field of Intelligent Data Analysis.

The first two articles of this issue are about various aspects of data preprocessing. Nivetha and Venkatalakshmi in the first article discuss the topic of sensor data collection in human activity and explain the challenges involved in proper identification of outliers. Outlier detection is used to remove noisy data, discover faulty nodes and also distinguish interesting events. The authors introduce an ensemble or hybrid outlier removal methods to detect abnormal human activities based on mobile sensor data. The outcome of their investigation demonstrates that the proposed method in combination with standard classifiers performs superior to other anomaly detection methods as far as different quality measurements. Yousef and Kim in the next article of this issue, discuss the topic of list sampling in large graphs. The authors argue that existing methods fail to match important properties of the original graph and work poorly in maintaining its topology. The authors introduce a novel approach in which they keep a list of candidate nodes that is populated with all the neighbors of nodes that have been sampled up to that point. They evaluate the effectiveness of their approach using several real world datasets and show that it surpasses the existing state-of-the-art approaches in maintaining the properties of the original graph and retaining its structure.

The next four articles are about various forms of unsupervised and supervised machine learning methods in IDA. Vu in the first article of this group discusses that semi-supervised clustering, that integrates side information (seeds or constraints) in the clustering process, would be a good strategy to boost clustering results. In this article, a new semi-supervised graph based clustering is presented. The idea is based on using a graph of the k-nearest neighbours and a measure of local density for the similarity between vertices, that integrate the seeds in the process of building clusters and hence can improve the quality of clustering. Their experiments conducted on real data sets from UCI show that their method can produce good clustering results compared with the related techniques, such as semi-supervised density based clustering. Zhang et al. in the next article of this group argue that feature clustering is a powerful technique for dimensionality reduction in which one requires to specify the number of clusters to be given in advance or controlled by parameters. In this article, by combining clustering with affinity propagation, the authors propose a new feature clustering algorithm, called APFC, that is targeted for dimensionality reduction. Here, the original features automatically form a bunch of clusters where new features can then be extracted from each cluster in three different ways for reducing the dimensionality of the original data. Their experiments and comparisons with three well-established dimensionality reduction methods on a number of datasets show the effectiveness of their method in terms of classification accuracy and computational time. The next article by Berka is about a meta learning approach in which the author presents a novel approach to post-processing of association rules based on the idea of meta-learning. A subsequent association rule mining step is applied to the results of “standard” association rule mining. They thus obtain “rules about rules”, which can help to better understand the association rules generated in the first step. The authors define various types of such meta-rules and report some experiments on benchmark data from the UCI Machine Learning Repository as well as on data from atherosclerosis risk domain. The last article of this group by Yu et al. is about an eye detection method that is based on convolutional neural networks and support vector machines. In order to improve the speed of detection in the system, an eye variance filter (EVF) is constructed for eliminating most of non-eye images to keep less candidate eye images. They have conducted experiments by applying their model on the BioID, IMM, FERET and ORL face databases. Comparisons with other methods on the same databases indicate that the proposed hybrid model has achieved a higher detection accuracy.

And finally the last four articles of this issue are about novel applications and enabling technologies in IDA. Chen et al. in the first article of this group discuss that continuous function optimization is ubiquitous in many branches of science and technology where memetic algorithms are a particularly interesting approach to the optimization of continuous, non-linear, multimodal, ill-conditioned or noisy functions. In this work, an improvement of the Wang algorithm is proposed that allows for an adaptive evaluation of the genomic difference between individuals in a way that is independent of the optimization problem and takes into account the stage of the evolutionary process. The proposed algorithm is empirically evaluated using 25 bench marking functions against the state-of-the-art memetic algorithms revealing superior performance which is a strong evidence on the relevance of proposed approach. Huang et al. in the next article of this group argue that with the rise of personalized travel recommendation in recent years, automatic analysis and summary of the tourist attraction is of great importance in decision making for both tourists and tour operators. To this end, many probabilistic topic models have been proposed for feature extraction of tourist attraction. However, existing state-of-the-art probabilistic topic models overlook the fact that tourist attractions tend to have distinct characteristics with respect to specific seasonal context. The authors propose the innovative idea of using seasonal contextual information to refine the characteristics of tourist attractions. The authors also propose a season topic model that is based on latent Dirichlet allocation which can capture meaningful topics corresponding to various seasonal contexts for each attraction. Their detailed experimental study using collected real-world textual data of tourist attractions show the superiority of their approach in providing a representative and comprehensive summarization related to each tourist attraction. The next article by Mo et al. is about an influence-based fast preceding questionnaire model which is based on re-ordering of the attributes. The authors argue that the values of low-ranking attributes can be predicted by the values of the high-ranking attributes. Therefore, the number of attributes can be reduced without redesigning the questionnaires. In this article a new function for calculating the influence of the attributes is proposed based on probability theory. The model is verified through a practical application of data from an elderly-care company in which the authors show that the proposed approach can reduce the number of attributes substantially. In the last article of this issue Wu et al. argue that modularity evaluation is usually used in community detection for evaluating the disjoint and overlapping communities. In this article, the authors introduce two obvious defects of modularity evaluation and prove, the non-decreasing contribution of isolated nodes to modularity and lacking of appropriate measures on overlapping community. The authors propose a new evaluation criterion, which is based on the area under the curve, originated from link-prediction of uniform-structure-information model. The authors evaluate the new criterion on various datasets, and find that such criterion can avoid the issues exposed in modularity evaluation.

In conclusion, we would like to thank all the authors who have submitted the results of their excellent research to be evaluated by our referees and published in the IDA journal. As usual, in addition to six regular issues, we also have one special issue planned for 2018 which will be published at the middle of this year. We look forward to receiving your feedback along with more and more quality articles in both applied and theoretical research related to the field of IDA.

With our best wishes,

Dr. A. Famili

Editor-in-Chief