Abstract
Deep Neural Networks (DNNs) have powerful recognition abilities to classify different objects. Although the models of DNNs can reach very high accuracy even beyond human level, they are regarded as black boxes that are absent of interpretability. In the training process of DNNs, abstract features can be automatically extracted from high-dimensional data, such as images. However, the extracted features are usually mapped into a representation space that is not aligned with human knowledge. In some cases, the interpretability is necessary, e.g. medical diagnoses. For the purpose of aligning the representation space with human knowledge, this paper proposes a kind of DNNs, termed as Conceptual Alignment Deep Neural Networks (CADNNs), which can produce interpretable representations in the hidden layers. In CADNNs, some hidden neurons are selected as conceptual neurons to extract the human-formed concepts, while other hidden neurons, called free neurons, can be trained freely. All hidden neurons will contribute to the final classification results. Experiments demonstrate that the CADNNs can keep up with the accuracy of DNNs, even though CADNNs have extra constraints of conceptual neurons. Experiments also reveal that the free neurons could learn some concepts aligned with human knowledge in some cases.
Keywords
Introduction
Deep Neural Networks (DNNs) can be referred to large Artificial Neural Networks (ANNs) stacked with many layers. DNNs has very good performance in reducing high-dimensional data into a compact representation space [13]. In various recognition tasks, such as computer vision [12, 25] and speech recognition [9, 14], DNNs show powerful recognition ability to fit the high-dimensional data. Given enough training data, DNNs trained by an optimization algorithm can automatically adjust their parameters and reach to a very high classification accuracy. Generally, it is very difficult to explain the meanings of the numerous parameters and the outputs of the hidden neurons under the cognitive competence of human beings. Therefore, the models of DNNs are usually regarded as black boxes that cannot interpret how they classify the objects according to some effective features. Similarly, human beings usually cannot explain how they can identify some objects. For example, Traditional Chinese Medicine (TCM) practitioners are difficult to explain how they diagnose the patients based on subtle observations. However, the model interpretability is necessary for many applications, especially in clinical diagnoses. People will feel an interpretable predictive model is more dependable than an unknown model that delivers results without supportive reasoning. Therefore, the model interpretability is crucial for the wide adoption in medical research and clinical decision-making.
In order to specify the recognition process, human beings form many effective abstract concepts to distinguish the different classes of objects. For example, a patient who has pale facial, skinny body, pale tongue and thin tongue coating might be classified into Qi-deficiency body constitution type in TCM [8, 26]. Before the advances in deep learning, a majority of machine learning methods are heavily dependent on feature engineering, which is a way to take advantage of the human knowledge. Human beings usually can discover the underlying explanatory factors hidden in the observed sensory data, and form the abstract concepts. In other words, human beings can map the observed sensory data into a representation space described by the abstract concepts, and reach a consensus on the recognition tasks. Compared with a black-box model that provides results without any reasoning, an accurate and interpretable model will be more dependable and attractive. This motivates many deep learning researchers to explore the representations in deep models [3]. But why can not the deep models learn some common sense along with human beings? Maybe a direct and effective way is to make the deep models learn the human-formed concepts so as to form similar cognitions.
This paper proposes a training method that can make the DNNs map the sensory data into a representation space aligned with human concepts. The advantages to make the DNNs learn the human-formed concepts are twofold. The former one is to let the DNNs become interpretable with human knowledge, and the latter one is to let the DNNs become easy to be trained by a small amount of training data with prior knowledge. Moreover, the DNNs will become easy to be debugged because the classification results can be traced back to the conceptual neurons. In the proposed Conceptual Alignment Deep Neural Networks (CADNNs), some neurons in the hidden layers are chosen as the conceptual neurons. These neurons are used to extract designated features related to the corresponding human-formed concepts. By introducing the conceptual neurons, the classification results of CADNNs become interpretable. Furthermore, the explanatory factors that contribute to the classification results can be checked. The main contributions of this paper are highlighted as follows: A kind of new DNNs, called CADNNs, is proposed. CADNNs have the attractive property of interpretability, as well as keep the high accuracy like the DNNs. The framework and objective of CADNNs are formalized and analyzed based on DNNs. The training procedure is also provided. Experiments are designed to test the different architectures of CADNNs, which results demonstrate that the conceptual neurons can learn the effective representations of abstract concepts. Furthermore, some experiments show that the free neurons could also learn the representations corresponding to human-formed concepts, in some cases.
The remainder of this paper is structured as follows. Section 2 provides an overview of the related work. Section 3 describes the methods including the general framework, objective and training method. Section 4 depicts experimental results, whilst discussions and comments are presented in Section 5. Finally, conclusions and directions for future work are presented in Section 6.
Related work
With the increasing popularity of Deep Neural Networks (DNNs), a number of researches have attempted to explore the representations space of hidden layers of DNNs. Zeiler et al. [29, 30] use deconvolutional networks to explore the low-level and mid-level image representations. Yu et al. [28] have shown successful studies at learning hierarchical image representations from the pixel level via hierarchical sparse coding. Zhu et al. [31] propose a deep model, termed multi-view perceptron, for learning face identity and view representations. For learning robust representations of human physiology, Che et al. [5] propose to use prior knowledge to regularize parameters in the topmost layers. Recently, Che et al. [6] propose an interpretable mimic learning method to distill knowledge from DNNs via Gradient Boosting Trees (GBT) to learn interpretable models and strong prediction rules. Bengio et al. [4] provide a comprehensive review of representation learning about learning good representations. Despite more researchers realize the relationship between the representations of DNNs and human-formed concepts, there is not a clear method to incorporate the prior knowledge of human-formed concepts into the DNNs.
Transfer learning, which is motivated by the fact that human beings can apply previously learned knowledge to solve new problems faster even with better solutions [18], also confirms the DNNs can learn effective representations to distinguish the different objects, even in different classification tasks [3]. Ahmed et al. [2] propose using transfer learning to improve the training of hierarchical DNNs, and experiments show that transfer learning substantially improves the quality of Convolutional Neural Networks (CNNs) by incorporating useful prior knowledge. Long et al. [16] propose a Transfer Sparse Coding (TSC) approach to construct robust sparse representations for images. Oquab et al. [17] design a method to reuse the layers of DNNs trained on the ImageNet dataset to compute mid-level image representations for images in the PASCAL VOC dataset. They demonstrated that the transferable representations are useful in various kinds of tasks. However, the main difference of this work lies in that our aim is to make the DNNs generate not only effective but also interpretable representations.
Generative Adversarial Networks (GANs) [10] are used to learn to generate the observations from a compact low-dimensional representation space. Recently, GANs have shown promising results in learning hierarchical representations [19]. InfoGAN [7], which is a variant GAN, shows it can learn interpretable representations. In the experiments, it discovers visual concepts, such as hair styles, presence of eyeglasses and emotions [7]. Reed et al. [20, 21] demonstrate that GANs can generate images from human visual concepts. Reed et al. [22] also propose models combining CNNs and Long Short-Term Memories (LSTMs) to relate the images with fine-grained and category-specific language concepts. However, the interpretable representations produced by GANs are still very limited.
Methods
The information process of Deep Neural Networks (DNNs) is transmitting key information layer by layer from the input layer to the output layer. It can also be regarded as a representation space transformation when the activation codes of the neurons in a layer activate the neurons in the next layer. For the recognition tasks, the layers of DNNs trend to have smaller neurons from the input layer to the output layer. Therefore, these DNNs are to make the input high-dimensional data reduce to a compact low-dimensional representation space. Generally, only the representation space of the output layer is meaningful to human beings. The output neurons are trained to be aligned with the category labels, which are usually high-level abstract concepts. It is worth noting that the categories are also human-formed concepts. Generally, human-formed concepts can be organized hierarchically. The fine-grained concepts form the high-level abstract concepts. In this way, the human-formed concepts should be hierarchically distributed in a DNN. Based on this conception, a framework is devised as follows.
General framework
The general DNNs are illustrated as Fig. 1, where they are trained to fit the input and output variables, whilst the variables represented by the neurons in hidden layers are not concerned. This paper proposes to train the DNNs having hierarchical interpretable hidden neurons aligned with human-formed concepts, termed Conceptual Alignment Deep Neural Networks (CADNNs), as illustrated in Fig. 2. The method is to assign some effective meanings to some selected hidden neurons, called conceptual neurons. In addition, there are still hidden neurons, called free neurons, used to be trained in their own way. The number of conceptual neurons in each hidden layer depends on specific applications. It can be zero conceptual neuron or full of conceptual neurons in a hidden layer. The conceptual neurons need to be manually assigned according to human knowledge. In shallow hidden layers, conceptual neurons should be assigned to represent low-level abstract concepts. And high-level abstract concepts should be represented by conceptual neurons in deep hidden layers.

The illustration of a general DNN. The circles represent the neurons of DNN. Only the neurons of input layer and output layer are assigned meanings, and they are drawn as shaded circles.

The illustration of a CADNN. The circles represent the neurons. The shaded circles in hidden layers are conceptual neurons, and the hollow circles are free neurons, which can be trained freely.
A m-layer CADNN can be defined as Net (
The parameters of f (
Given a training sample
For a training data set (
Once an objective function as Equation (9) is chosen, the parameters of a CADNN model can be trained by optimization methods, such as Stochastic Gradient Descend (SGD). In order to use the optimization methods, the partial derivatives of the objective function with respect to parameters need to be calculated. Back-Propagation (BP) algorithm [23], which is a common method to train the multi-layer artificial neural networks, also can be used to train the CADNNs. BP algorithm uses a chain rule to calculate the partial derivatives. At the start, a feedforward pass computation runs from the input layer to the output layer, and the loss values of the output neurons can be calculated. Then, the loss values can be propagated backwards and be used to calculate the partial derivatives with respect to the parameters of each layer. The partial derivatives of the weights
And they can directly compute the results according to Equation (7). The partial derivatives of the weights
Each
In addition, the conceptual neurons of CADNNs can be pre-trained from the input layer to the output layer. Once the training for a layer is finished, the learning rate for the parameters related to the aligned conceptual neurons need to be decreased in the following training process, and the complete CADNNs should be fine-tuned in the end.
Training dataset (
Parameters (
1: Randomly set the initial parameters (
2:
3:
4: Sample a mini-batch (
5: Use BP algorithm to compute the partial derivatives according to Equations (12) and (13):
g
w
← ∇
w
J (
g
b
← ∇
b
J (
6: Update the corresponding parameters:
w ← w - α · g w , b ← b - α · g b .
7:
8:
9: return (
Dataset
Fashion-MNIST dataset [27] is used in the experiments. It is a dataset that has the same format as the popular and overused MNIST dataset consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28 × 28 grayscale image, associated with a label from 10 classes. In contrast with the simple handwritten digits in MNIST dataset, the examples in Fashion-MNIST dataset are images about different clothes, trousers, shoes and bags, as shown in Table 1.
The labels and the examples of Fashion-MNIST dataset
The labels and the examples of Fashion-MNIST dataset
Hierarchical label generation
In order to obtain the hierarchical labels, we further reduce ten classes of examples into four classes. They are top, bottom, shoe and bag, as shown in Table 2. The top class includes T-shirt, pullover, dress, coat and shirt corresponding to the original label 0, 2, 4 and 6 respectively. The bottom class includes the trouser corresponding to the original label 1. The shoe class includes sandal, sneaker and ankle boot corresponding to the original label 5, 7 and 9 respectively. The bag class includes bag corresponding to the original label 8. Therefore, a training dataset of two-layer labels is obtained. The original labels can be used for training the conceptual neurons in the hidden layer, while the four new labels are used for training the neurons in the output layer.
The high-level concepts and the label definition
The high-level concepts and the label definition
The neurons in the input layer and the output layer are prescribed in the experiments. The size of the input layer neurons is 784, which is the number of pixels of an input image (28 × 28). The output layer has 4 neurons for four generated labels as shown in Table 2. And the activation function for output layer is softmax. A simple CADNN architecture

The illustration of architecture

The illustration of architecture
As shown in Table 2, if there are ten conceptual neurons corresponding to ten original labels, the new defined labels are completely represented by ten conceptual neurons. In order to test the incomplete representation situation, another architecture

The illustration of architecture
The deeper architectures

The illustration of the architecture
The designed architectures are implemented and tested using TensorFlow [1]. The TensorFlow (version 1.0) is installed on Python 3.5 (64-bit), and the computing environment is based on Windows 10 (64-bit), 2.30 GHz CPU (Intel i3-2350M) and 4 GB memory. A training process of The illustration of the training processes of architecture 
To the extreme case, another architecture

The illustration of the performance of conceptual neurons in the architecture
In order to answer that whether a free neuron can learn the representation aligned with the veiled concept of “ankle boot”, some experiments on architecture

The illustration of the training result of the architecture

The illustration of the training result of the architecture
As a comparison with architecture

The illustration of the training result of the architecture

The illustration of the training result of architecture

The illustration of a training result of the architecture
There are numerous different representation spaces for an object, which can be recognized in many ways. The purpose of CADNNs is to align the representation space of DNNs with some human concepts, so as to enhance the interpretability. Optimization algorithms cannot always find an effective representation space, particularly in the case of lacking training dataset. The introduction of conceptual neurons in CADNNs can promote the architectures converged to a desired representation space. On the other hand, the free neurons can learn the representations that are not contained in the conceptual neurons, so as to keep the accuracy of the output layer. Moreover, the architectures of CADNNs can be used to explore the latent representations for the recognition tasks. It is easy to check the effectiveness of some concepts used for recognition of certain objects in CADNNs. The interpretable CADNNs is very appropriate for the transfer learning tasks that learn some representations from one task and use the representations to other tasks. However, there is a problem that few existing training data sets have hierarchical labels used for training the CADNNs. One solution is to directly build new data sets that contain hierarchical labels, while the other solution is to utilize a number of correlative existing data sets to form hierarchical labels to train CADNNs.
The valuable human knowledge, formed by a long evolutional history, is constructed by a complex network of numerous effective concepts. It should be promising that infusing the effective human-formed concepts into computer systems to boost artificial intelligence. Moreover, maybe there is a key to answering what is the essence of human consciousness. However, the way of organizing the numerous concepts into a dynamical network is still unclear so that it needs more explorations. Although the work of CADNNs may be a primitive exploration, there is no harm in proposing the perspective that serves as a modest spur to induce someone to come forward with his or her valuable contributions.
Conclusion
This paper proposes a kind of Deep Neural Networks (DNNs), termed as Conceptual Alignment Deep Neural Networks (CADNNs). There are conceptual neurons used for learning representations of human-formed concepts in the hidden layers of CADNNs. Although added the extra constrains of some conceptual neurons, CADNNs can guarantee the performance compared with the DNNs. Meanwhile, the conceptual neurons can align the representation space with human-formed concepts in CADNNs, as shown in the experiments. Experiments also demonstrate that the free neurons of CADNNs could learn effective representations aligned with human-formed concepts in some cases. Moreover, hyper-parameters could be used to trade off the interpretability of the hidden layers and the accuracy of the output layer. However, the results are not always converged to expected solutions that have both good interpretability and high accuracy. The method of choosing appropriate hyper-parameters and activation functions is still need to be researched. There is also a challenge to make the free neurons form some new effective concepts based on known concepts that people can understand. Constraints and improved training methods for free neurons will be the future work. Furthermore, the CADNN framework could also extend to other structures, such as Recurrent Neural Networks (RNN). More explorations are needed to make CADNNs become more comprehensive and dynamical.
Footnotes
Acknowledgments
This work is supported in part by the National Natural Science Foundation of China under Grant Numbers 61632009 and 61472451, the Guangdong Provincial Natural Science Foundation under Grant Number 2017A030308006, and the High Level Talents Program of Higher Education in Guangdong Province under Funding Support Number 2016ZJ01.
