Editorial

Abstract

Dear Colleague: Welcome to volume 22(3) of Intelligent Data Analysis (IDA) Journal.

This issue of the IDA journal, the third issue of our twenty second year of publication, contains 12 articles representing a wide range of topics related to the theoretical and applied research in the field of Intelligent Data Analysis.

The first four articles of this issue are about various aspects of data understanding. Li and Zhang in the first article of this group emphasize that discovering dependencies between variables has a significant impact on the performance of any exploration task, especially in large datasets. They also emphasize that most of the existing methods are for pairs of variables, rather than triplets. The authors propose a measure of dependence for three variable relationships with the score of equaling the determination coefficient, where their proposed approach is applied to a number of real-world data sets with some interesting results. Janeja et al. in the next article of this issue argue that analytics of data across multiple sites and heterogeneous databases, remains a challenge to gain more understanding about the implementation of data mining studies as well as proper interpretation of results. The authors first provide a mechanism to extract task-relevant data using Master Data Management (MDM) from a clinical trial database and further provide a method for validating findings by collaborative utilization of multiple data mining techniques, namely: classification, clustering, and association rule mining. The authors apply their method to a number of real-world life sciences data where they find implications in validating their findings using multiple data mining methods in a collaborative manner. The third article of this issue by Wang et al. is about microblog oriented interest extraction with both content and network structure. The authors propose a systematic framework for interest extraction by taking into account both the textual and social network information of microblog to get high quality tags. Their experimental results on one of the most popular microblogs in China, demonstrate that their proposed approach makes dramatic improvements over state-of-the-art baselines. Yuan et al. in the last article of this group discuss the topic of task-specific word identification from short texts and argue that existing approaches require well-defined seed words or lexical dictionaries (e.g., WordNet), which are often unavailable for many applications such as social discrimination detection and fake review detection. The author’s approach is based on exploiting the class labels rather than using seed words or lexical dictionaries. They consider task-specific word and phrase identification as feature learning where they train a convolutional neural network over a set of labeled texts and use score vectors to localize the task-specific words and phrases. Their experimental results on sentiment word identification show that their approach significantly outperforms existing methods.

The next five articles are about various forms of unsupervised and supervised machine learning methods and their applications in IDA. Liu et al.in the first article of this group argue that clustering could be used in various data exploration tasks to automatically group a given data set into a list of meaningful categories and identify the natural structure of the data. Their proposed approach is a non-parametric algorithm that first creates a cluster based on entropy of the posterior and then dynamically adjusts the parameters of the base distribution according to the mean of the observed data. Their experimental results indicate that the proposed approach works well in real-world data sets. The next article of this group by Seidpisheh and Mohammadpour is also about data clustering and the influence of similarity measures in hierarchical clustering to uncover patterns in heavy-tailed data. The authors demonstrate how to perform a hierarchical cluster analysis in heavy-tailed data by extending a well-known similarity measure that is based on the correlation to a new similarity measure that is based on covariation coefficient. V. Nguyen and L. Nguyen in the seventh article of this issue argue that graph classification has many real life applications and propose a heuristic method to learn a set of classification rules from a set of graph objects. Their approach is based on boosting and poset order of Formal Concept Lattice of sub-graphs. Their experiments show that learned rule sets would be compact, comprehensible and contains high classification accuracy. Rosalas-Salas et al. in the next article of this group propose a data mining framework to improve the understanding of how people allocate their time. This is a multivariate approach, in which the authors perform a clustering procedure, and subsequently a regression analysis to detect which variables influence individual time used for each discovered cluster. Their results suggest that the impact of various sociodemographic variables on sleep and work depends significantly on the characteristics of the individuals analyzed. The authors also report that proper identification of the most significant variables involved in time allocation decisions would allow researchers to better analyze and interpret their data and results. Rezapour et al. in the last article of this group propose an approach to use K-Nearest-Neighbor, along with some knowledge based resources in order to design a Word Sense Disambiguation scheme. The authors report that the success of K-Nearest-Neighbor is tightly dependent on two factors; the features used to represent the context in which an ambiguous word occurs and the distance/similarity measure used for comparison of text vectors. Their evaluation results, with regards to the feature selection and feature weighting strategies, show that the semantic and syntactic features have a significant effect on the classification ability of the system.

And finally the last three articles of this issue are about fuzzy systems and intelligent decision making in IDA. Saito et al. in the first article of this group argue that the act of reviewing scientific papers is very subjective, but in reality many factors would influence user’s decision. This can be called social influence bias. The authors pick two factors, “Who” and “When” and discuss which factor is more influential when a user posts his/her own rate in an online review system. They further propose a weighted multinomial generative model that can learn the factor metric quite efficiently from a vast amount of data already available in many online review systems. The authors evaluate the proposed method and confirm its effectiveness by five review datasets, and empirically clarify that there is no universal solution, but the social bias does exist. Feng and Wang in the eleventh article of this issue discuss that soft set theory can be used as a general mathematical tool for dealing with uncertainty where fuzzy soft sets are a combination of fuzzy set and soft set. The authors introduce a soft t-discernibility matrix in fuzzy soft sets, which is a combination of soft discernibility matrix and level soft sets. Their proposed algorithm is evaluated through a comparative analysis where the effectiveness of their proposed approach is presented. And finally Du et al. in the last article of this issue argue that the performances of face recognition are always affected by variations of expression, illumination and so on. To address this problem, the authors propose an interval type-2 fuzzy linear discriminant analysis (IT2FLDA) algorithm and incorporate it into a linear discriminant analysis method. The authors claim that their proposed approach is able to minimize the effects of uncertainties, find the optimal projective directions and make the feature subspace discriminating and robust. Their experiment results show that the proposed approach improves the recognition rate and reduces sensitivity to variations when compared to results from existing techniques.

In conclusion, we would like to thank all the authors who have submitted the results of their excellent research to be evaluated by our referees and published in the IDA journal. As usual, in addition to six regular issues, we also have one special issue planned for 2018 which will be published at the middle of this year. We look forward to receiving your feedback along with more and more quality articles in both applied and theoretical research related to the field of IDA.

With our best wishes,

Dr. A. Famili

Editor-in-Chief