Abstract
Introduction
As one of the most common neurobehavioral diseases in school-age children, ADHD has been increasingly studied in recent years. Not only about 5% of school-age children and 2% to 4% of adults suffer from this disease (Polanczyk & Jensen, 2008), but also it is a complicated procedure to identify ADHD individuals, which needs a clinical assessment by various Hamilton Scales (Safren et al., 2005). However, more objective observation approaches have been developed to provide a convenient way in discovering the potential information about ADHD. As a well-known non-invasive neuroimaging approach, blood-oxygen-level-dependent functional magnetic resonance imaging (BOLD-fMRI) effectively shows brain abnormalities of ADHD individuals by detecting different magnetic properties in the oxygenated and deoxygenated forms of hemoglobin (Liu et al., 2010). To further find the brain circuit model of ADHD, a functional connectivity (FC) pattern is built by using the temporal and spatial coherence of BOLD signals from fMRI data (Bastos & Schoffelen, 2016), and it has successfully identified some fundamental differences between ADHD and healthy control individuals (Hoekzema et al., 2014; Konrad & Eickhoff, 2010). Therefore, given the resting-state fMRI data, we here focus on its FC analysis for ADHD diagnosis and classification.
In past decades, numerous fMRI-based FC classification methods have been proposed by exploiting different machine learning approaches, where FC alterations are very often tagged as the features of ADHD. Early works are dedicated to construct ADHD classifiers by some linear statistical models. For example, a classifier with principal component analysis (PCA) and linear discriminant analysis (LDA) is presented to find ADHD functional connections in the voxel level (Dey, Rao, & Shah, 2012). Later, a functional-anatomical discriminative region model is proposed for ADHD participants, where independent components analysis (ICA) is adopted to deal with the grouped distributed FC networks (Nuez-Garcia et al., 2015). Meanwhile, a spatial discriminant ICA method is performed on the brain functional network to find ADHD features (Tabas, Balaguer-Ballester, & Igual, 2014). From another perspective, as FC describes a topographic map of brain, more graph-based methods are attempted in ADHD classification. These methods either consider the definitions of graph measures or design some subnetworks to enhance the discrimination between ADHD and control groups. To this end, different graph measures are introduced to explore the attributes of FC network and form so-called graph distance vectors as the input of support vector machines (SVM) classifier (Dey, Rao, & Shah, 2014; Siqueira, Junior, Comfort, Rohde, & Sato, 2015). Moreover, an exquisite discriminative subnetwork is discussed in Du, Wang, Jie, and Zhang (2016) such that a robust FC pattern can be achieved in the ADHD feature extraction. Very recently, deep learning also shows its powerful ability in finding the disordered connections of ADHD. In the FC net method (Riaz, Asad, Al-Arif, et al., 2017), a novel convolutional neural network (CNN) is employed to describe the FCs of ADHD individuals, which uses a fully connected network to compute the similarity between the extracted features and Siamese architecture. A three-dimensional CNN model is further given to investigate the local spatial patterns of ADHD from various fMRI data sets (Zou, Zheng, Miao, Mckeown, & Wang, 2017). Generally speaking, all these methods can be categorized into a traditional two-step classification framework, that is, features or feature networks are first extracted from the given training data and then the test sample is identified with these features. In this framework, it should meet the assumption that the major features of test samples are contained in the feature space of training data. Unfortunately, suffering from the limited size of ADHD databases, the features of training data in practice cannot well represent those of test samples. As a result, it not only causes the low robustness for the framework, but also seriously hinders achieving a higher ADHD identification accuracy.
On the contrary, subspace clustering has attracted considerable attention as a feature extraction and separation tool (Vidal, 2011). It is widely used in various scenarios, for example, face recognition (Wang, Guo, Lei, Zhang, & Li, 2017), motion segmentation (Xia, Sun, Lei, Zhang, & Liu, 2018), and hyper-spectral imagery classification (Sun, Zhang, Du, Li, & Lai, 2015). The target of subspace clustering is to identify some low-dimensional subspaces with the discriminative projected features of given data and cast these data into the corresponding subspaces. Among numerous subspace clustering methods, spectral-type algorithms (Ng, Jordan, & Weiss, 2002) are extremely popular for their flexible affinity matrix selection. Most of them can be mainly divided into two stages. In the first stage, it focuses to learn better affinity matrices to measure the similarities among the given data. Some well-developed clustering methods, such as sparse subspace clustering (SSC; Elhamifar & Vidal, 2013) and low-rank subspace clustering (LRSC; Vidal & Favaro, 2014), have already formed their remarkable affinity matrices by exploring the self-expressiveness property of data. In the second stage, spectral clustering is performed with these affinities. Note that, a subspace clustering model via a variance regularized ridge regression approach is recently proposed to merge these two stages together (Peng, Kang, & Cheng, 2017). In this model, it optimizes the affinity matrix and subspace support vectors in an iterative procedure, which greatly inspires the design of our clustering model. To deal with mental diseases, some reports also prove that subspace clustering can effectively promote the identification accuracy for diagnosis. For example, in the miRNA disease prediction, a graph-regularized subspace clustering approach is incorporated by converging the features of miRNAs with and without associated diseases with their different centralized features (Chen & Huang, 2017). As for ADHD, a transductive maximum margin classification is proposed to employ the spectral clustering with a carefully designed affinity matrix (Wang, Li, He, Wong, & Xue, 2016).
Motivated by the recent process, we propose an ADHD identification method by using subspace clustering and binary hypothesis testing. Different from the existing ADHD classification works, the major contribution of our work is twofold.
We perform a subspace clustering model on FC sets of individuals to find discriminative projected features between ADHD group and healthy control group. Several subspace measures are exploited as affinity matrices to enhance the clustering performance in the learned subspace. More importantly, a subspace scatter measure is provided, which can be further treated as a subspace energy evaluation to identify ADHD individuals in our following classification framework.
A novel ADHD classification framework is presented. In a traditional pipeline of training and testing, the test data are utterly refused to participate in any procedure during the training phase. But we adopt partial information of test data (excluding the test data label) for training. By hypothesizing the binary label (ADHD or control) for the test data, two feature sets of training FC data are generated via the feature selection procedure that employs both training and test data. The corresponding subspace projected feature sets are achieved by our clustering model under the binary hypotheses. It shows that the energy of the projected feature set for the training data always increases with the false hypothesis, because the clustering performance is ineffectively explored. Therefore, we compare the energies of two projected feature sets and finally identify ADHD participants for the test data.
Materials and Methods
Image Data Set and Preprocessing
In our work, all resting-state fMRI data are from the ADHD-200 consortium (http://fcon_1000.projects.nitrc.org/indi/adhd200/). There are four categories of participants in this data set: healthy control, ADHD combined, ADHD hyperactive impulsive, and ADHD inattentive. But, we combine all ADHD types into one category to investigate the binary classification between ADHD and healthy control participants. In details, ADHD-200 includes several databases contributed from eight sites. We just use data from four sites, that is, NeuroImage (NI), New York University Medical Center (NYU), Kennedy Krieger Institute (KKI), and Peking University (PU), with their information in Table 1. Note that, these sites have a different number of participants with the fMRI data of various scan parameters, which increases the complexity and diversity of the data set.
Summary of Several ADHD-200 Data.
Note. NYU = New York University Medical Center; KKI = Kennedy Krieger Institute; NI = NeuroImage; PU = Peking University.
PU includes three subsets, and PU_1 is the first subset of PU.
In data preprocessing, we obtain the time course values of BOLD signals from the connectome website (www.preprocessed-connectomes-project.org/adhd200/). The preprocessing steps include removing of first four time points, slice time correction, motion correction (first image taken as the reference), registration on
Pruning operation is also performed on these transformed correlation coefficients with an absolute threshold
Absolute Threshold Setting for Various Databases.
Note. NYU = New York University Medical Center; KKI = Kennedy Krieger Institute; NI = NeuroImage; Peking University.
Definitions of Subspace Measure
Some notations are first introduced for our subspace clustering model. We, respectively, define
Subspace scatter measure
It describes the intra-class clustering performance of each group. Inspired by Hou, Nie, Zhang, and Wu (2009), a scatter kernel
where
To discover the intrinsic meaning of this scatter measure, the selected feature means of ADHD and control groups are, respectively, denoted as
In LDA, Equation 3 is known as the within-class scatter matrix. Therefore, our measure of Equation 2 can be viewed as the projection of within-class scatter matrix in the subspace. Interestingly, by casting Equation 3 in Equation 2, we further obtain another form of subspace scatter measure as follows:
where given a vector set
Subspace correlation measure
It evaluates the similarity among ADHD and health control participants as an interclass measure. An indicator correlation kernel is defined as follows:
where
In Equation 6, the correlation measure can be minimized to 0, when projected features of ADHD and control groups are orthogonal to each other in the subspace.
Graph embedding measure
It focuses on keeping the same graph structure of data in subspaces as that in the original space, where the smoothness attributes of subspace signals are still preserved. In detail, the core idea of graph embedding is to design a well adjacency matrix and then generalize its Laplacian matrix. Although various adjacency matrices have been tried in ADHD classification with their geometric, functional, and mixed graph forms (Menoret, Farrugia, Pasdeloup, & Gripon, 2017), we adopt a full-connected adjacency matrix to strengthen the relationship of participants in each group and cope with some uncertainty factors. These uncertainties come from not only the noise-disturbed selected features, but also the instability of brain network with the pruning operation of FC. Thus, we present the graph embedding measure as follows:
where
where
Proposed Subspace Clustering
We perform a subspace clustering model on the labeled participants as follows:
where
In Equation 10, the affinity matrix of spectral-type subspace clustering is identified as
As Equation 10 provides a standard constrained quadratic form, the solution of
where
Framework of Subspace-Based Classification
We design a subspace-based framework for ADHD classification in Figure 1. It includes four steps, that is, feature selection, subspace clustering, subspace energy estimation, and ADHD decision. Moreover, the test data sample is initially assumed to belong to a healthy control (

Framework of ADHD subspace-based classification.
As for the subspace energy estimation, it is the core step in our framework. Here, the total subspace energy of each hypothesis is evaluated and labeled as
where
In Equation 13, the first two terms can be treated as the direct-current component of total energy, whereas the rest terms are the alternating component. Note that, the value of direct-current component is in practice far less than that of alternating component. By neglecting the direct-current component of Equation 13, the total subspace energy is approximated to the subspace scatter measure as follows:
Thus, the total subspace energies under the binary hypotheses
We finally compare these energies in the ADHD decision stage and identify the test sample under the true hypothesis by
where the energy threshold
Results
Experimental Setting
We give a set of performance evaluations on datasets of Table 1. The identification accuracy is tested with the leave-one-out cross-validation. In our experiments, the first 110 selected features are picked from individual FCs with the SVM-RFE. The weighting coefficients
The proposed classification method is compared with several state-of-the-art ones, that is, graph fMRI (Dey et al., 2014), fusion fMRI (Riaz, Asad, Alonso, & Slabaugh, 2017), FC net (Riaz, Asad, Al-Arif, et al., 2017), and Deep fMRI (Riaz et al., 2018). The graph fMRI method treats the FC matrix as a graph and identifies ADHD participants by using a set of graph attribute parameters. As for the fusion fMRI, it introduces an affinity propagation clustering approach to obtain more reliable weighted brain connections. Meanwhile, some nonimaging data information, including IQ and gender, is additionally employed to achieve its enhanced identification accuracy for ADHD diagnosis. In the FC net, the classical CNN structure is used to train the given FCs. Both feature extractor and similarity networks are constructed to form robust FCs from BOLD signals. But it still adopts the SVM as the final classifier. The deep fMRI can be viewed as an advanced version of FC net, where a self-learned classification network replaces the SVM. Therefore, the deep fMRI is wholly designed by the CNN from feature extraction to classification procedures.
Subspace Clustering Demonstration
The subspace clustering performance is shown in Figure 2. We learn the subspaces and cluster the participants for NYU and PU databases, wherein geometry structures of participants are visualized by the isomap algorithm in GSP toolbox (Perraudin et al., 2016).

Comparison of subspace clustering on NYU and PU databases.
In Figure 2, it shows that a chaotic relationship exists among the original selected feature data of participants. However, with our subspace clustering model, ADHD and control groups can be effectively identified in the learned subspace. In other words, these subspaces provide the highly discriminative projected features to separate ADHD and control groups from each other. It is proven that our subspace clustering method has its potential advantage to identify ADHD participants.
We also evaluate our subspace-based ADHD decision from the view of participant energies. The energy comparison under different hypotheses is shown in Figure 3. In this test, we first select two test samples, that is, one for control group and the other for ADHD group. For each test sample, the geometry structures of its training participants and the corresponding participant energies are given in the learned subspace. In Figure 3, various geometry structures are formed in their learned subspaces due to the different selected feature sets under the binary hypotheses.

Energy difference comparison of training participants with
Note that, by using the false hypothesis, a serious scattering phenomenon exists for training participants such that it achieves a poor clustering performance. In this case, the subspace energies of participants increase. On the contrary, the minimal subspace energies of participants are always accompanied with the true hypothesis in statistics. For example, in Figure 3a, when the healthy control (
Dimension of Subspace
To achieve a better subspace, we test the accuracy performance with a variety of basis numbers in Figure 4. It is found that with the small basis number,

Accuracy comparison with different basis numbers.
Classification Comparison
Various measures of ADHD classification, including specificity, sensitivity, and accuracy, are given on several databases in Figure 5. It shows our method achieves the significant classification performance, where the average accuracy is 92.4% for these databases. Note that NYU and PU can achieve a better performance. They benefit from the large size of databases, as the subspaces now become more reliable with the participants increasing. As the same reason, the lower sensitivities of PU_1 and KKI are obtained in our test. Due to PU_1 and KKI suffering from the participant unbalance, the subspace clustering model cannot effectively extract the projected features of ADHD participants and results in worse outcomes. As for NI, with the balance of participants, the specificity and sensitivity scores are approximately equal. Although NI has the small database size, an acceptable accuracy is still achieved.

Comparison of group classification on various databases.
We further compare our method with several state-of-the-art methods mentioned in the experimental setting, where the corresponding results are shown in Table 3. Although these existing methods adopt different approaches to either enhance the brain connectivity or adaptively learn FC features, the traditional classification framework hinders their performances. In brief, they only focus on the attributes of training data, whereas some useful information of test sample is neglected. To address this issue, our method provides an alternative way to identify test samples by the binary hypothesis testing instead of the traditional procedures. Benefiting from this operation, not only the partial information of test sample is used during the feature selection for training data, but also the subspace clustering approach gives an effective scatter measure to recognize ADHD participants with the energy detection. Therefore, our method achieves the best performance among these methods. It is worth mentioning that, though the fusion fMRI method obtains a remarkable accuracy of 86.7% on KKI, it uses some nonimaging information including gender and IQ. However, no additional data are required in our method.
Accuracy Comparison With Various Methods.
Note. NYU = New York University Medical Center; PU = Peking University; KKI = Kennedy Krieger Institute; NI = NeuroImage; fMRI = functional magnetic resonance imaging; FC = functional connectivity.
The bold values are described as the best values among the tested methods on different databases.
Discussion
Graph Structure of ADHD and Control Groups
We find an interesting phenomenon that the graph structures of subspace are nearly the same no matter for databases in Figure 2 or on training data sets under the true hypothesis in Figure 3. In these figures, ADHD and control groups show their extremely different intra-class structures, that is, healthy control participants are almost arrayed along a line and ADHD participants are scattered in a fan-shaped area. As we know, the disordered functional connections of ADHD participants usually have some special modulations, which are varied by ADHD types and severity degree. But these connections are kept stable for control participants. In our classification framework, the SVM-RFE approach is performed to generate the selected features by exploiting this attribute. Thus, the projected features in learned spaces also contain this information. As a result, the control group has a line structure, where only limited intensity changes exist on these projected features. To address the structure of ADHD group, it is derived from the diversity of ADHD. Although each ADHD type has its classical modulation, no boundary lies among these types. It means one ADHD type can be evolved into another type with the smooth feature transition. Consequently, a fan-shaped structure is formed to represent this evolution among the projected features. In a nutshell, the graph structure illustrates the attribute difference between ADHD and control groups and proves the validity of our subspace-based classification to some extent. This attribute may further be used in the multiclass classification for ADHD children.
Analysis on Discriminative FC Contribution
Some postprocessing is considered on the classification results of various databases to discover the discriminative FC contribution between ADHD and control groups. We count the weighted appearance frequency of each functional connection and define it as a measure for the FC contribution. In details, for each test sample, we record the selected functional connections corresponding to its identified true hypothesis. Then the total appearance frequency for these functional connections is calculated. The weighted appearance frequency is given as follows:
where

Discriminative FC contribution between healthy control and ADHD participants.
In Figure 6, Parietal (pre)motor lobe is affected the most, which is associated with movement intention and motor awareness as mentioned in Desmurget and Sirigu (2009). Recent report also shows that the connectivity reduction exists in the intertemporal lobe for ADHD participants (Riaz, Asad, Al-Arif, et al., 2017). In our result, there do exist more FC connection contributions in this lobe, which verifies such reduction to some extent. On the contrary, we further obtain several most distinct regions from the FC contribution statistics. These regions are mainly divided into three parts. The first part, including right pars opercularis of inferior frontal gyrus and left/right superior parietal gyrus, is known as the direct components to control body and hand movements, where different fMRI task experiments have been designed to confirm their relationship to ADHD (Christakou et al., 2013; Morein-Zamir et al., 2014). The second part is in charge of the human sense and decision, such as right supramarginal gyrus, right orbital parts of middle frontal gyrus, and right superior temporal gyrus. These regions guide children actions and responses to the surrounding environment. Various reports are proven that ADHD participants have some damage in these regions (Durston et al., 2003; Tomoda et al., 2011; Wolfgang et al., 2010). The third part contains three regions, that is, right olfactory, right gyrus rectus, and right rolandic operculum. The olfactory gyrus effects on the memory, where the working memories of children with ADHD are often impaired (Cockcroft, 2011). Rolandic operculum not only controls painful sensation, speech perception, and language processing, but also controls a part of motor execution. In Pastura, Mattos, Gasparetto, and Araujo (2011), it shows ADHD participants have the abnormal cortical thickness in rolandic operculum region. As for the gyrus rectus, the altered gray matter analysis of ADHD on this region is given in (Griffiths et al., 2016; Stevens & Haneycaron, 2012). It also shows there has a significant effect of gender on rectus, that is, rectus can be more easily activated by male than female (Benwell, Balfour, & Anderson, 1988). Thus, it well explains the phenomenon that the ADHD risk is 2.3 times higher for boys than that for girls (Bauermeister et al., 2007). From the above analysis on FC contribution result, it demonstrates our classification method reveals some brain circuits which are in accordance with the anatomical changes of ADHD.
Conclusion
We propose a subspace clustering method and apply it in the ADHD classification framework with binary hypothesis testing. We design the subspace clustering model combined with various subspace measures as affinity matrices. Thus, the discriminative projected features can be extracted in the learned subspace to identify ADHD participants. In the classification framework, the partial information of test sample is employed with binary hypotheses in the feature selection of training data. An ADHD decision is performed via the subspace energy detection. The experiments show our method significantly outperforms the existing classification methods, where the average accuracy is achieved above 90% with the leave-one-out cross-validation. Moreover, the discriminative FC contribution analysis also proves the reliability of our method and reveals some useful brain circuits.
Footnotes
Declaration of Conflicting interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by National Natural Science Foundation of China under Grant 81571344 and 61501169; the Key Research and Development Program of Jiangsu, China, under Grant BE2017071 and BE2017647; and “Six Peaks of Talents” support program of Jiangsu, China, under Grant 2016-WSN-109.
