Abstract
In this paper, we propose the Neural Knowledge DNA, a framework that tailors the ideas underlying the success of neural networks to the scope of knowledge representation. Knowledge representation is a fundamental field that dedicates to representing information about the world in a form that computer systems can utilize to solve complex tasks. The proposed Neural Knowledge DNA is designed to support discovering, storing, reusing, improving, and sharing knowledge among machines and computing devices. It is constructed in a similar fashion of how DNA formed: built up by four essential elements. As the DNA produces phenotypes, the Neural Knowledge DNA carries information and knowledge via its four essential interrelated elements, namely, Networks, Experiences, States, and Actions; which store the detail of the artificial neural networks for training and reusing such knowledge. The novelty of this approach is that it uses previous decisional experience to collect and expand intelligence for future decision making formalized support. The experience based collective computational techniques of Set of Experience Knowledge Structure (SOEKS) and Decisional DNA (DDNA) are used to develop aforesaid decisional sustenance. Together with artificial neural networks and reinforcement learning, the proposed Neural Knowledge DNA is used to catch knowledge of a very simple maze problem, and the results show that our Neural Knowledge DNA is a very promising knowledge representation approach for artificial neural network-based intelligent systems.
Keywords
Introduction
Knowledge representation is a fundamental field that dedicate to representing information about the world in a form that computer systems can utilize to solve complex tasks [4]. It is the study of thinking as a computational process. Then, what is knowledge? This is a question that has been discussed by philosophers since the ancient Greeks, and it is still not totally demystified. Drucker P.F. defines it as “information that changes something or somebody - either by becoming grounds for actions, or by making an individual (or an institution) capable of different or more effective action” [5]. While the Oxford Dictionary defines Knowledge as “facts, information, and skills acquired through experience or education; the theoretical or practical understanding of a subject” [19]. O’Dell and Hubert claim that Knowledge is not knowledge until the information inside itself has been taken and used by people [18]. And for scientists and researchers in the AI field, we can argue it as “knowledge is not knowledge until the information inside itself has been taken and used by computers, machines, and agents”.
Consequently, a good knowledge representation shall easy to be used by different systems to allow storing, reusing, improving, and sharing knowledge among these systems. A survey [11] carried out by Liao found that there were generally seven categories of knowledge-based technologies and applications developed until 2002. In another study [14], after analyzing 30 published articles between 2003 and 2010 from high quality journals, found nine core theories in the knowledge-based area. However, there are limitations to these technologies: most of them are designed for one specific kind of product; they don’t have standard knowledge presentation; most systems lack the capability for information sharing and exchange and most of these systems only focus on supporting a particular stage of a product lifecycle [10].
Recent studies [6, 9] in artificial neural networks (ANN) and psychology have found that the image representations in ANN are very similar to those in bio brains; which inspires us that why do not organize and store knowledge as or close to the way how it exists in the human brain?
In this paper, we propose the Neural Knowledge DNA (NK-DNA), a framework adapting ideas underlying the success of neural networks to the scope of knowledge representation for neural network-based knowledge discovering, storing, reusing, improving, and sharing.
Neural networks and deep learning
Machine learning is one of today’s most rapidly growing technical fields. It is the corner stone of artificial intelligence that addresses the question of how to build computer systems improving themselves automatically through experience [13]. Recent progress of new theories and learning algorithms, especially, the artificial neural networks (ANN), has become the new driving power in machine learning.
ANN is a biologically-inspired programming paradigm which enables a computer to learn from observational data [15]. It consists of a network where the information can be passed from one node to another, and these nodes in the network are called artificial neurons. The network typically is structured hierarchically, and its neurons are usually organized into layers such that each neuron in layer l connects to every neuron in layer l+1. Any layers in between the input layer and output layer are called hidden layers. The forward pass of an ANN is where information flows from the input layer, through any hidden layers, to the output. ANN learns during the backwards pass, which updates the connection’s weights of the network [13].
Deep learning is a powerful set of techniques for learning in ANN [15]. It allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction [8].
Deep learning learns sophisticated structure in large data sets by using the Backpropagation algorithm [7] to reveal how a neural network should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer [15]. The essential aspect of deep learning is that these layers of features are not human-designed: they are learned from data using a general-purpose learning procedure. Deep learning has dramatically improved the state-of-the-art in image recognition, natural language process, object detection and many other domains such as drug discovery and genomics [8, 26].
Deep reinforcement learning
Reinforcement learning is a branch of machine learning concentrated upon using experience obtained via interacting with the world and evaluative feedback to improve a system’s capability to make decisions [12, 17]. Reinforcement-learning algorithms [29] are mainly inspired by our perception of human’s decision making in which learning is happened through the use of reward signals in response to the observed results of actions. It has been called the artificial intelligence problem in a microcosm because learning algorithms must operate autonomously to perform well and achieve their objectives. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as empirical methodology, exploration, planning, and generalization, leading to increasing applicability to real-life problems [16].
Reinforcement learning can be represented as an interaction between a learner (i.e. the decision making agent) and an environment that gives evaluative outcomes to the learner. The environment is often understood as a Markov decision process [1, 20].
A Markov decision process is composed by a set of actions A (the decisions the decision maker can choose), and states S (situations in which a decision can be made). These quantities of actions and states can be limited, but spaces with continuous actions and states are often more valuable for capturing interactions in important reinforcement learning applications, such as performing physics tasks. Function P (s’| s, a) defines the probability of the state transforming from s to s’ by taking the actiona [1, 16].
A reward function R(s, a) and discount factor γ ∈ [0, 1] are used to describe the decision making agent’s performance: every time-step, the agent chooses an action, and the environment returns a reward and transitions into the next state. The goal behind this is to maximize the cumulative discounted expected reward. Specifically, the agent is looking for a behavior policy π * (a t |s t ; θ) mapping states to action creating a reward sequence r0, r1, r2, r3, … such that is as large as possible []. The relation between the cumulative discounted expected reward and the environmental interaction (state, action, reward, state, action, reward, ...) is captured by the Bellman equation [1] for the optimal state–action value function Q*. The solution to the Bellman equation can be used to optimize the agent’s behavior by calculating . The expected cumulative discounted reward for the policy that takes action a from state s and then behaving optimally thenceforth is the immediate reward received, and the expected discounted value of the cumulative discounted expected reward from the resulting state s’ given that the best action is chosen[16, 20].
Deep reinforcement learning is the method that uses deep neural networks (DNN) in combination with reinforcement learning to address learning the environment and gaining the best control policy. DNNs can be used to directly approximate a control policy, a = π (s) from example data points (si, ai) as generated by some other control process. Control policies based on DNNs have been learned to control agents in many ways [8, 17].
Set of experience knowledge structureand decisional DNA
The presented approach towards constructing Neural Knowledge DNA is a vision that aims to address complex issues and challenges that arise from the pervasive nature of digital technologies as witnessed in recent years in our everyday life. One of such major challenges is the need for nature-like cognitive blueprints for man-made systems as required by the incoming semantic-focused society and the “Internet of Things” [3, 30]. Our past research delivers the cutting-edge component of the above challenge and at the same time the fundamental notion behind the proposed Neural Knowledge DNA – the Decisional DNA (DDNA) technology.
In a broader sense, the above research direction plays an important role in our effort to bridge the gap between current society and the one fully embedded in semantic networks. The fully linked Semantic Web concept offers a future vision of the Web where both humans and machines are able to communicate and exchange information and knowledge [2].
The Decisional DNA is a novel knowledge representation theory that carries, organizes, and manages experiential knowledge stored in the Set of Experience Knowledge Structure (SOEKS or SOE for short) as illustrated in Fig. 1 [21, 25].
The SOE has been developed to capture and store formal decision events in an explicit way [23]. It is a flexible, standard, and domain-independent knowledge representation structure [24]. And is a model based upon available and existing knowledge, which must adapt to the decision event it was built from (i.e. it is a dynamic structure that depends on the information provided by a formal decision event); moreover, SOEKS can be stored in XML or OWL files as ontology in order to make it transportable and shareable [21].
SOEKS consists of variables, functions, constraints and rules associated in a DNA shape enabling the integration of the Decisional DNA of an organization. Variables normally implicate representing knowledge using an attribute-value language (i.e. by a vector of variables and values), and they are the centre root and the starting point of SOEKS. Functions represent relationships between a set of input variables and a dependent variable; besides, functions can be applied for reasoning optimal states. Constraints are another way of associations among the variables. They are restrictions of the feasible solutions,limitations of possibilities in a decision event, and factors that restrict the performance of a system.
Finally, rules are relationships between a consequence and a condition linked by the statements IF-THEN-ELSE. They are conditional relationships that control the universe of variables [21].
Additionally, SOEKS is designed similarly to DNA at some important features. First, the combination of the four components of the SOE gives uniqueness, just as the combination of four nucleotides of DNA does. Secondly, the elements of SOEKS are connected with each other in order to imitate a gene, and each SOE can be classified, and acts like a gene in DNA. As the gene produces phenotypes, the SOE brings values of decisions according to the combined elements. Then, a decisional chromosome storing decisional “strategies” for a category is formed by a group of SOE of the same category. Finally, a diverse group of SOE chromosomes comprise what is called the Decisional DNA (DDNA) as illustrated in Fig. 2 [21].
Our proposed DDNA empowers the vision of Neural Knowledge DNA by providing smart technology for experience-based storage of information and knowledge in intelligent systems.
The neural knowledge DNA
Recent progress in deep learning has been improving the performance of artificial intelligence systems in the continuous control domain and high-dimensional space decision making notably. However, knowledge acquired in these systems is still isolated, and hard to be accessed and reused among different systems. Therefore, we propose the Neural Knowledge DNA (NK-DNA) in order to solve this problem.
The Neural Knowledge DNA is proposed as a framework that tailors the ideas underlying the success of neural networks to the scope of knowledge representation. It is designed to store and represent knowledge captured in intelligent systems that uses artificial neural network as the central power of its intelligence. There are five main features of this idea.
Neural network-based
Generally, knowledge is acquired after training in deep learning systems, which is often called the model. The model usually stores information about the hierarchy of the neural network plus weights and biases of the connections between neurons of the neural network in detail. Once the neural network is trained, giving input, the network will send back a result via the computation from the input layer to the output layer.
Similarly, our NK-DNA stores knowledge of an agent using the same idea. Figure 3 shows the concept of knowledge carried by the NK-DNA.
In the NK-DNA, a neural network is used to carry the relation between actions and states: as we can see in the Fig. 3, each state (represented as S1, S2 … Sn) can have connections with a set of actions (represented as a1, a2, … an). If an action is connected with a state, it means the connected action is an available action in that state; in other words, the agent can choose the action to perform if it is in that state.
The trained neural network provides the knowledge of which action is the best choice to a specific state. The states here are the inputs, which can be the raw sensory data, or data describing the current situation of the agent.
Experience oriented
Another important feature of this approach is that the NK-DNA uses previous decisional experience to collect and expand intelligence for future decision making formalized support.
Experience, as one kind of information gained from practice, is the ideal source for learning and improving performance of agents. Usually, the agent transitions from one state to another during its operation, and it makes decisions (picks actions) in each state and receives feedbacks from its operation; these states, actions, feedbacks, and transitions makes up the called ‘experience’. Inspired by the Markov Decision Processes [20], the experience of an agent is stored as et = (st, at, rt, st+1) at each time-step t: where st is the current state at the time-step, at is the action the agent chooses at that time-step, rt is the reward (feedback) for undertaking the action, and st+1 is the transition state after the chosen action.
As a result, experience is collected as the main source for learning in our NK-DNA. Basically, the experience is treated as samples for doing supervised learning. Additionally, experience is also able to be shared in NK-DNA systems, which allows much larger scale learning in the cloud (discussed in the following third feature, sharable).
Sharable
Very similar to human society, the NK-DNA is designed to allow agents to share knowledge and experience among each other so that the knowledge and experience can be accessed and reused in a much larger scope.
Figure 4 shows the overview of the NK-DNA cloud platform. The platform integrates different agents and their tasks as illustrated in two major levels: local and global.
At the bottom of the platform, there is the local level storing an agent’s knowledge, while the global level is at the top storing knowledge from all NK-DNA based systems. Agents can share, download, and evolve their knowledge via this cloud platform. For more details about knowledge sharing, please refer to our previous work [25].
Flexible
Machine learning is the core of artificial intelligence, which addresses the question of how to build computer systems that can automatically improve themselves through experience. However, there are many different machine learning methods and they are used for different problems. Therefore, the NK-DNA must be flexible to enable itself being used by different systems.
Because the NK-DNA is neural network-based, all kinds of machine learning method based on neural networks would be suitable to use it, such as normal neural networks, convolutional neural networks, recurrent neural networks, etc. To allow this flexibility, we designed the DNA like structure holding details of the neural network in which an agent’s knowledge was acquired.
Consequently, any agent can reuse another agent’s knowledge as long as it has the information about another agent’s neural networks.
DNA like structure
The NK-DNA is constructed in a similar fashion of how DNA formed [28]: built up by four essential elements. As the DNA produces phenotypes, the Neural Knowledge DNA carries information and knowledge via its four essential elements, namely, States, Actions, Experiences, and Networks (see Fig. 5).
The NK-DNA’s four-element combination is able to carry detailed information of reinforcement learning and Markov Decision Processes:
Initial experiment
We examined our proposed NK-DNA in solving a very simple maze problem. In this initial experiment, the agent is asked to learn, store, and reuse the knowledge of the maze (Fig. 6) by utilizing deep reinforcement learning and NK-DNA.
Experiment overview
There are eight blocks in the maze as shown in Fig. 6. At the beginning, the agent knows nothing about the maze, and it is trying to explore and lean the maze by taking four possible actions, which are going up, going down, going left, and going right. At each block, the agent can take one of the four possible actions. At the end, the agent is expected to know the maze, and able to show us the shortest way to get block 8.
Learning
There are two stages for the agent to learn the maze: exploration and training.
Exploration
At the exploration stage, the agent explores the maze. It starts always at block 1, and it randomly takes one action from its four possible actions introduced above. Every time, the agent gets feedback from its environment (i.e. the maze) for taking an action. The feedback is composed of Reward, Terminal, and NextState: Reward is a value gives by the maze after every action; here the maze gives a reward value of 100 once the agent reaches block 8, and a value of zero for any other situations. Terminal is a Boolean value indicating whether the agent reaches block 8, if the agent reaches block 8, the terminal value will be True, and the agent will restart from block 1 again to explore the maze; otherwise, the terminal value will be False. NextState represents the block where the agent is after taking the action; for example, if the agent takes action going up at the block 1, it will get the next state value as block 2; however, if the agent takes action going left, it will remain at the bock 1 as there is a wall on its left at block 1.
The agent repeats randomly taking its possible actions until it reaches the block 8. Meanwhile, the agent stores every single action taken with feedback from the maze as an experience during its exploring of the maze, which is stored in the form of (st, at, rt, st+1) : st represents the state/block where the agent is at time step t; at is the action taken at that time step; rt is the reward for taking that action; and st+1 is the next state of the agent after action at. In this experiment, we let the agent randomly take 1000 possible actions for exploration.
Training
After the exploration, the training starts. To solve the simple maze problem, i.e. finding shortest way to get block 8, the agent needs to remember how blocks are connected so that it can make better decisions to reach block 8. In other words, the goal of the agent is to select actions in a fashion that maximizes cumulative future reward.
In our case, the deep reinforcement learning introduced in Section 3 is applied to help agent learn the maze. More formally, we use a neural network to approximate the optimal action-value function [17]
which is the maximum sum of rewards rt discounted by γ at each time-step t, achievable by a behaviour policy π (a|s), after making an observation (s) and taking an action (a).
Furthermore, the technique named experience replay [17] is used to improve the training. Basically, the experience replay randomly picks agent’s experiences to generate minibatch samples from the stored experience dataset. The advantage of using experience replay is that it breaks strong correlations between consecutive experiences and therefore reduces the variance of the updates. By utilizing this technique the behavior distribution is averaged over many of agent’s previous states, smoothing out learning and avoiding oscillations in the parameter. As we can see from Fig. 7, after 1000 time steps of exploration the training starts, and the cost of the action-value function reduces and keeps to around 0.5 during the training. For more information about algorithms and methods used in the training process, please refer to [12, 17].
Storing
Knowledge learnt after exploration and training processes is stored in the NK-DNA as Actions, States, Experiences, and Networks.
Actions store the agent’s four possible actions. States indicate the eight blocks. Experiences keep 5000 latest experiences of the agent. Networks carry the information of the neural network and framework involved in learning the maze. In this initial experiment, a three-layer neural network is used to approximate the optimal action-value function (with 8 neurons in the input layer, 16 neurons in the hidden layer, and 4 neurons in the output layer), and the Google’s TensorFlow is used to train the agent. Therefore, in Networks, there are trained weights and bias of the network, plus information specifying the framework used (i.e. the TensorFlow).
Results
Finally, to test our approach, the agent is expected to be able to show us the shortest way of how to get block 8 by reusing its knowledge.
Therefore, we examined the agent by sending seven possible states one by one to the agent’s neural network, and checked the outputs representing the actions what the agent should choose in certain states respectively.
As we can see from Fig. 8, the agent can not only tell us what is the shortest way to get block 8 from block 1: go right at block 1 to get block 4, and then go right again at block 4 to get block 7, then go up to get block 8, it also knows how to get block 8 from different blocks, for example, if it is at ‘block 2’, it knows that it shall ‘go up’ first. Additionally, the structure of NK-DNA allows the knowledge learnt to be shared among different neural network-based agents.
Conclusions and future work
In this paper, we proposed the Neural Knowledge DNA, a framework adapting ideas underlying the success of neural networks to knowledge representation for neural network-based knowledge discovering, storing, reusing, improving, and sharing. By taking advantages of neural networks and reinforcement learning, the NK-DNA stores the knowledge learnt through domain’s daily operation, and provides an easy way for future accessing, reusing, and sharing such knowledge. At the end of this paper, we tested our proposal idea in an initial experiment, and the results show that the NK-DNA is very promising for knowledge representation, reuse, and sharing among neural network-based AI systems.
For further work, we will do: Refinement and further development of the neural networks engine; Further design and development of the NK-DNA framework, especially, for supporting a range of third-party deep learning frameworks; Design and development of the cloud server for NK-DNA knowledge management.
Footnotes
Acknowledgments
This work is supported by Scientific Research Fund of Sichuan Provincial Science & Technology Department under Grant Nos. 2014GZ0009, 2015JY0257, and by Scientific Research Fund of Sichuan Provincial Education Department under Grant No. 14ZA0171.
