KinRob: An ontology based robot for solving kinematic problems

Abstract

Intelligent answering technology, which enables computers to solve problems automatically, is often used to develop tutorial systems, and has a wide range of application prospects. However, due to the lack of linguistic analysis and understanding methods, there are few researches on intelligent algorithms for solving kinematics problems. Developing such an algorithm is challenging, because solving kinematics problems is a complex task that includes text understanding, problem analysis, and automatic solution. To understand all these complexities involved in kinematics problems requires background knowledge. And only when an automatic solver contains a powerful internal knowledge representation system can it perform these tasks. We, thus, develop KinRob, an tutorial system for solving kinematics problems by combining neural network and ontology. Firstly, we propose an ontology for KinRob, which defines the knowledge of kinematics, and can help the robot understand a kinematics problem. Secondly, to match the text in natural language with the ontology, we propose a novel tagging scheme based on the kinematic problem understanding model in named entity recognition (NER). Finally, extensive experiments are conducted, and the experimental results show that the performance of the proposed method on a dataset of kinematic problems from authoritative sources better than the baseline algorithms.

Keywords

Robot kinematic problems knowledge representation ontology named entity recognition

1. Introduction

Intelligent tutorial system, a research field started in 1960s [1, 2, 3], has become a hot research again driven by the rapid development of natural language processing (NLP) and reasoning technology in recent years. This type of solver, which aims to automatically solve problems like humans [3], is the core of intelligent education system [4], and has broad application prospects and practical value. However, researches on intelligent algorithms for tutoring system mainly focuses on mathematical problems [2, 5, 6, 7] and physical circuit problems [4, 8], and there are few solvers for kinematics problems. Moreover, these existing kinematics solvers are poor in intelligence due to the limitations of their algorithms. Therefore, developing a high-performance system for solving kinematics problems has become a top priority.

Since the 1960s, many algorithms for solving mathematical problems have been proposed [8, 9, 10]. However, due to the lack of methods to understand kinematic problems, algorithms for solving kinematic problems have not been developed. And this dilemma was changed after the work of Professor Kurt VanLehn et al. [11]. They developed the first guidance system for kinematic problems called Andes [11]. It solves a problem by using a Bayesian network which need to calculate a probabilistic assessment of three kind of information: the general knowledge about physics, specific knowledge about the current problem and the abstract plans that the student may be pursuing to solve the current problem [12]. However, due to the limitations of Bayesian networks, Andes is a semi-automatic tutorial system. Xue believes that the essence for solving a kinematic problem is to convert text into formulas, and proposed a multiple representation model (MRPS) [13]. And he developed a kinematic problem tutorial system based on the multiple representation theory called MR-PPSTS. Since users need to manually define variables and input system equations when using MRPPSTS to solve problems, it is also a semi-automatic solver. The first automatic tutoring system for solving kinematics problems is developed by Yang [14]. The solver developed by Yang first uses the word segmentation tool to analyze the text, and then outputs the solution through a forward reasoning. However, the drawbacks of the system developed by Yang are obvious, due to the use of word segmentation software, the system needs an additional corpus. Moreover, the solver can only solve the simple problems which can be solved in one step. With the rapid development of deep learning, some researchers have introduced neural networks to the development of intelligent tutoring systems [15]. These deep learning-based algorithms directly convert word problems into equation templates through neural networks. However, this type of neural solver is not only inefficient, but also unable to output a readable analysis and solution. The development of intelligent tutorial systems for kinematics problems is still in its infancy, and many challenges such as kinematics problem understanding, problem solving, and readability solutions need to be solved urgently.

Solving kinematic problems, which need to clarify the problem and solve it by using physical knowledge, is considered as a complex process. Generally, the process of solving a kinematic problem is as follows: 1) clarify the physical problem through examination, 2) analyze known and unknown quantities, 3) purify the relationship between physical quantities according to kinematics knowledge, 4) choose appropriate kinematics formulas, 5) use mathematical methods to carry out the answer. Each step in the process involves kinematic knowledge or mathematical knowledge. For a tutorial system, solving a kinematics problem requires no errors in any of these steps. So, a robust knowledge representation and problem model system is essential for a high-performance solver for solving kinematic problems.

With the development of the Semantic Web (SW) [16], many knowledge-based systems represent complex knowledge through web ontology language (OWL) [17], express the relationship between things and things by building ontology model [18, 19, 20, 21]. Therefore, we develop an ontology based robot named KinRob for solving kinematics problems automatically. Firstly, we divide solving kinematics problems into two stages: understanding and solving, in which understanding understanding covers the first three steps and solving include the last two. Secondly, we propose an ontology for the robot aims to understand and solve problems in an efficient manner. The ontology can help the solver in understanding and solving processes. Additionally, in order to map the text in natural language to ontology accurately, we propose a tagging scheme by adding kinematics information into the general schema, and complete the NER task of the problem.

In summary, our mainly contributions are as follows:

1)
We propose a comprehension ontology and a reasoning ontology for kinematic problems and develop an ontology based solver to solve Chinese kinematic problems automatically.
2)
To map the entity to the ontology, we propose a tagging scheme for entity recognition in kinematics. By integrating the understanding model into the traditional scheme, the solver can accurately map the entity to the proposed ontology and solve the challenge of entity overlap.
3)
We compared our model with other models experimentally on a dataset from authoritative sources, and the results showed that our model performed better than others.

Figure 1.
Architecture of KinRob, which is composed of NLP module, knowledge base and interpreter.

The rest of this paper is organized as follows: Section 2 reviews the related work of intelligent tutorial systems. Section 3 introduces the methodology in detail, including the system architecture, the developed ontology and the proposed tagging scheme. Section 4 introduces our system and experiments, in addition to the results and analysis. Finally, Section 5 is a summary of all the work.
2. Related works

Algorithms to develop automatic tutoring systems are mainly divided into four types: rule-based methods [22, 23, 24], statistics-based methods [9, 25, 26], tree-based methods [27, 28] and DL based methods [5, 6, 7, 9].

The tutoring system developed by the rule-based method mainly appeared in the early research. These methods understand or transform the problem by designing different rules or templates [23, 24]. These solvers have poor generalization ability, and the development of a rule base to understand the problem requires many manpower. Statistics-based methods use traditional machine learning models to identify entities, numbers, and operators from the text, and obtain numerical solutions through simple logical reasoning [25, 26]. However, in order to deal with mathematical problems that require multi-step arithmetic expressions to solve, these methods require more advanced logic templates, which usually requires additional labor costs to label these templates. As a typical work of statistical based algorithms, Mitra et al. [9] use three templates: part to whole, variation and comparison to solve arithmetic problems. First, the system associates the problem with a predefined formula. Then a model with learning parameters is used to identify the formula and convert it into an equation. Arithmetic expressions can be naturally expressed as a binary tree structure. Operators with different precedence are placed in different positions, and the root of the tree is the operator with the lowest precedence. Roy et al. [29] Proposed the first method using the expression tree structure. This method transforms problem understanding into tree construction. First, the system constructs multiple expression sub-trees by judging the operator between every two values. Then, an expression tree is generated by merging these sub-trees. However, as the number of problems in the problem increases, the search space of the tree-based algorithm will also increase exponentially, which is a significant shortcoming of this type of algorithm.

With the rapid development of deep learning, tutoring systems based on neural networks have become a research hotpot in recent years. Wang et al [15]. proposed a Sequence-to-Sequence (Seq2Seq) model based on RNN called DNS to solve arithmetic problems, which is the first neural network-based framework. DNS directly converts the text problem into the corresponding equation template, and then maps the number to the template and calculates the answer. Additionally, they also proposed a hybrid framework to improve the performance of the solution. Wang et al. [30] proposed an equation normalization method to solve repeated equations by considering the uniqueness of the expression tree. They proposed an integrated model called Math-EN, combining their respective advantages, so that each model has its own problem-solving expertise. Different from the previous two works, Chiang and Chen proposed an Encoder-Decoder solver S-Align [31] by considering the influence of semantic information on the answer. The algorithm extracts the semantic representation of each constant through the encoder, and then constructs the equation through the decoder. Moreover, they proposed a semantic transformer to generate semantic representations of new symbols generated by applying operators. Xie et al. [32] proposed a tree structure neural model called GTS, which generates expression trees in a goal-driven manner. This algorithm uses a two-layer gated feed-forward network to achieve target decomposition, and uses RNN to encode sub-trees into sub-tree embedding. However, these neural network based algorithm cannot output a readable solution, which is essential for a tutoring system. Additionally, it is difficult to realize understanding, analysis, reasoning and solving through a unified neural network framework.

3. Methodology

Table 1
Information of the example, including system, object, state, given conditions, unknown conditions, and question. ULM represents uniform linear motion

System	Object	State	Given conditions	Unknown conditions	Question
Motion system	Police car	ULM	Velocity: 12 m/s	Time; displacement
composed of police	Truck	ULM	Velocity: 8 m/s	Time; displacement
car and the truck	Police car & truck	–	Distance: 400 m	–	Encounter time

Figure 2.

The structure of this example, which includes Problem, Objects, Overall States, Specific States and Conditions five parts. ULM denotes uniform linear motion and UN represents unknown. In the Conditions layer, the green rectangle denotes the Velocity while the yellow and orange indicate Time and Displacement respectively.

As shown in Fig. 1, we use two ontology slices: Ontology Slice O and Ontology Slice N to guarantee the solving services. The input of the solver is a kinematic problem in natural language, and the output is a readable solution. Firstly, the input problem is converted into an entity sequence through NLP Module when a user requests for a service. Secondly, after the unit conversion and the completion of the unknown quantity of the tagged sequence, the tagged sequence will be mapped into the Knowledge Base. The specific process is that the ontology manager first receives tagged sequence and generates a trigger signal. And the tagged entity in the manager will be sent to the ontology slice N after the system generates an ontology slice N based on the ontology slice O. Thirdly, the ontology manager controls the reasoning manager to use the rules stored in the ontology slice N, which are represented by semantic web rule language (SWRL), for reasoning. Finally, after the inference, the result will be output to the interpreter through the ontology manager for processing and output a readable analysis and solution.

3.1 An comprehension ontology for understanding kinematics problems

A comprehension model for kinematic problems. Each physical problem involves one or more objects or phenomena, which can be abstracted as physical models, and solving physical problems is a process of dealing with physical models. The characteristics of kinematics must be considered to understand a kinematics problem, so we model the kinematics problem through the object (object model) and its corresponding state (state model) in each problem. Generally, due to the nature of kinematics, any kinematic system can be regarded as composed of different object models and state models.

Take “A police car on duty was parked next to the highway. When the police officer discovered that a truck driving at a constant speed of 8 m/s from his side had violated regulations, the violating vehicle was already 400 m away from the police car. Decided to chase, the police car chased at a constant speed of 12 m/s. Question: How long will it take for the police car to catch up with the illegal vehicle?” for example. The information of this example are shown in Table 1. We represent this problem as a tree structure shown in Fig. 2 (essentially, every kinematic problem can be represented as a tree structure) according to Table 1.

Figure 3.

The understanding ontology for solving kinematic problems.

From Fig. 2 we can see that each problem can be classify as five levels, and the root is the problem while the leaves are the conditions. Furthermore, the middle three layers are the Object, the Overall State and the Specific States respectively, which are cores of a kinematics problem. We divide the middle into three layers instead of two because although each research object may have multiple specific motion states, the combination of these states can be regarded as an overall state and used to calculate the various variables of the entire motion of the object. The computer can understand every kinematic problem well through this representation method which with object and state as the core. By expressing this example in the form of Fig. 1, the solver can mathematically calculate that the answer to this question is 100 seconds.

An comprehension ontology for understanding kinematics problems. Inspiring by the Comprehension Model for Kinematic Problems, we propose an ontology to for KinRob. The whole ontology are composed of two parts: understanding and reasoning. We first introduce the understanding ontology in this section.

As shown in Fig. 3, we use a tool named Protege [2] to display the ontology. And the understanding ontology is also composed of five levels: Problem, Object, Overall State, State and Condition. Among them, the core of the ontology is the same as the understanding model, which is the state and the object. The kinematics problem is expressed by mapping various conditions in the problem to the state and the object, which lays the foundation for the subsequent solution. And Fig. 4 gives an example of a specific problem.

As described in Fig. 4, the yellow nodes and purple nodes are classes and instances, respectively. In this example, UN 1 and UN 2 are instances belonging to the Velocity, while UN 3 and UN 4 are instances of Time and Displacement respectively. Moreover, the description of this example is “An object with a mass of 6kg moves in a straight line. The distance covered by the first 5 s is 12 m, and the distance covered by the last 5 s is 18 m. What is its average speed during this period?”. The recognized entities are object, 6 kg, 5 s (1), 12 m, 5 s(2), 18 m and average speed. Although there is no state noun in this example, the solver will automatically construct a default state (uniform linear motion) based on the information contained in the named entity. In Fig. 4, problem k represents this problem, object 1 represents object, and mass 1 represents 6kg. In addition, the first 5s and 12m represent the time and velocity of the first motion state, while the second 5s and 18m describe the last state. Furthermore, except for the variables that need to be solved (average speed, represented by velocity 1) in the problem, the unknown conditions in all states are represented by instances starting with UN.

Figure 4.

Expression of “An object with a mass of 6kg moves in a straight line. The distance covered by the first 5 s is 12 m, and the distance covered by the last 5 s is 18 m. What is its average speed during this period?”.

3.2 A reasoning ontology for solving kinematics problems

When people solve kinematic problems, they usually understand the problem first, and then use kinematics and mathematics knowledge to calculate the answer based on the given conditions of the problem. We transform the solving process into a reasoning task, and use the SWRL [9] language to complete this challenge.

The reasoning is to infer new conclusions from the existing facts and rules. When solving problems using knowledge of mathematics and physics, people always regards given conditions as existing facts and physical formulas as rules. However, due to the drawbacks of SWRL, the solver cannot calculate the final answer directly, we use a method of adding reasoning variables, which allowing the solver to acquire the inference result and complete the calculation through the interpreter. Firstly, the system stores the meta-information of formulas and conditions, and then when solving problems, the solver uses such meta-information to make inferences.

Figure 5.

The reasoning ontology for solving kinematic problems.

As shown in Fig. 5, there are five classes: Formula, Computed Condition, Unknown Condition, Known Condition and Reasoning Variable in the reasoning ontology. The Formula class is responsible for storing physical formulas and the Reasoning Variable storing the variables used in reasoning. The three Condition classes store the known conditions, the unknown conditions and the intermediate variables of the problem. And the mapping from instances to reasoning ontology of the example shown in Fig. 4 is shown in Fig. 6.

Figure 6.

The instances in the reasoning ontology of the example, including known condition, reasoning variable, unknown condition and formula.

Additionally, we define the Computed Variable as the intermediate variables of a problem, which can be computed by given conditions or existed computed conditions. So there is no instance in the Computed Variable class before reasoning, and the instances of the class before and after reasoning are shown in Fig. 7.

Figure 7.

The instances of Computed Variable. The pictures on the left and right represent the instances before and after inference, respectively.

From Fig. 7 we can see that after inference, the instance in Computed Condition has been updated. Moreover, we stipulate that given conditions are variables can be calculated, so the instance in Known Condition has been added to Computed Condition. And the query results of this example are shown in Fig. 8.

Figure 8.

The SPARQL query result of the example shown in Fig. 4.

As shown in Fig. 8, the aim of query are variable, formula and expression. By matching the expression with the known conditions obtained from NLP model, the answer can be calculated through formula reasoning. This task is completed by the external interpreter. However, some unknown variables can be calculated by multiple formulas, such as UN_3 and UN_4 in the figure. However, UN_3 can be calculated by formula_1 is not needed in the correct solution. So we eliminate these useless queries by unfolding backward reasoning with the final question (velocity_1 for this example) as the goal in the interpreter.

3.3 A novel tagging schema for NER of kinematics problems

Representing kinematics problems as a tree structure is the basis for solving kinematics problems. We will introduce how to map a kinematic problem in natural language into this structure in this subsection. As described in Fig. 2, the prerequisite for mapping the problem to this structure is to correctly identify the entities related to kinematics such as 12 m/s, 18 m/s, 400 m etc. As a consequence, we adopt the entity extraction technology based on neural network to ensure this link, and integrate the understanding model into the traditional tagging scheme to propose a new tagging scheme to solve the entity overlapping problem.

Firstly, we give the task definition of NER: Define $S=\{c_{1},c_{2},\ldots,c_{n}\}$ as the input sentence, $c_{i}\in S$ is the $i$ -th character of $S$ , $L=\{l_{1},l_{2},\ldots,l_{m}\}$ are a set of predefined entity types. Take $E=\{e_{1},e_{2},\ldots,e_{p}\}$ as all the possible entities in $S$ . Then, the goal of NER is to acquire a entity type $y_{l}(e_{i})\in L$ for every entity $e_{i}\in E$ or there is no type of this entity: $y_{l}(e_{i})=n$ .

However, since we adopt a problem representation method with object and state as the core, mapping the kinematics problem into a tree structure not only requires the solver to recognize the entity category like the traditional named entity algorithm, but also to accurately identify the object and state corresponding to each entity. For example, a kinematic problem is described as “An object with a mass of 5 kg moves along a straight line. The average speed of the first half is 4 m/s and the average speed of the second half is 8 m/s. The object moves 9 s in total. What is the displacement of the object?”. In this example, there are two velocity entities, 4 m/s and 8 m/s. And it is obvious that these two entities correspond to the first half and the second half of the motion respectively. However, since the general tagging scheme can only identify the type of the entity, the solver cannot accurately match the entity with the understanding tree structure of the problem. Furthermore, there may be multiple entities overlapped (same but with different meanings) in a kinematic problem. This is a serious problem, because the existing solvers, whether based on knowledge system or DL, can not deal with this situation well. For example, there are two entities “5 s” in the example shown in Fig. 4, we also called “5 s” is overlapped in this example. These two “5 s” respectively refer to the time of the object in the two states, and the roles played in the solution are different. Existing solvers, especially those based on DL, often generate a fault solution when encountering this situation.

Figure 9.

The tagging schema we proposed to help the solver understand a kinematic problem. The above is the input sentence and the below is the corresponding label, which consists of four parts: boundary label, category label, object label and state label.

To help the computer capture the entity information which is crucial to mapping the problem to the tree structure, we propose a novel tagging schema by integrating the kinematic understanding model proposed in Subsection 3.1 into the general schema when tagging the kinematic text. Specifically, we add the category information (Part 2 in Fig. 9), object information (Part 3 in Fig. 9) and state information (Part 4 in Fig. 9) corresponding to the understanding model to the BIOES tagging schema. And the proposed tagging schema is shown in Fig. 9. By adding object label and state label in the general schema, the developed solver can identify entities with object information and state information. Additionally, due to the labeled entities have various information, the system can map each entity to the defined kinematics understanding ontology accurately.

Given the strong fitting ability of the pre-training language model BERT [35], we use BERT as the bottom layer of BiLSTM-CRF [36] and use a character based model to finish the NER. Denote $S=\{c_{0},c_{1},\ldots,c_{n+1}\}$ as the input sequence with $n$ characters, $c_{i}$ is the $i$ -th character of the input sentence, $c_{0}$ denotes the start of sentence and $c_{n+1}$ denotes the end of the sentence. Then the input of BiLSTM can be expressed as:

$\displaystyle\textit{vi}=B(\textit{ci})$ (1)

where $v_{i}$ is the character vector of $c_{i}$ , $B$ is the function of BERT model.

The BiLSTM layer aims to capture the sentence and then send the output to CRF [37] layer. It calculates the input sequence in order and reverse order to obtain two different hidden layer representations, and then stitches them to get the final hidden layer feature. Take $L$ as LSTM [38] function, the output of BiLSTM layer is:

$\displaystyle\textit{oi},h_{i}=L(v_{i},h_{i-1})$ (2) $\displaystyle h_{i}^{N}=[h_{i}^{F},h_{n-i+1}^{B}]$

where $o_{i}$ is the LSTM output and $h_{i}$ is the hidden state of the $i$ -th vector, $h_{i}^{N}$ is the hidden state of a BiLSTM layer, $h_{i}^{F}$ is the positive LSTM hidden state of $i$ -th vector, and $h_{n-i+1}^{B}$ is the hidden state of the reverse vector.

Although the BiLSTM layer select the label with the largest probability as the output, it cannot acquire the dependency between the output labels, which may cause two identical tags to connect. While CRF [37] can consider the order of output tags. Therefore, a CRF layer is chosen as the decode layer.

For the character sequence $x_{i}=(x_{1}^{i},x_{2}^{i},\ldots,x_{n}^{i})$ and the label sequence $y_{i}=(y_{1}^{i},y_{2}^{i},\ldots,y_{n}^{i})$ , the predicted score can be expressed as:

$\displaystyle S(x^{i},y^{i})=\sum\limits_{i=0}^{n}{A_{y_{i}^{i},y_{i+1}^{i}}}+% \sum\limits_{i=0}^{n}{P_{i,y_{i}^{i}}}$ (3)

where $P_{i},y_{i}^{i}$ is the probability that the output of the $i$ -th position is $y_{i}^{i}$ , and $A{y_{i}^{i},y_{i+1}^{i}}$ is the transition probability of the tag $y_{i}^{i}$ to $y_{i+1}^{i}$ . Normalize the scores of all possible tagged sequences with the softmax function to have:

$\displaystyle P(y^{i}|x^{i})=\frac{\exp(S(x^{i},y^{i}))}{\sum_{y^{i\sim}\in Y_% {X}}{\exp(S(x^{i},y^{i\sim}))}}$ (4)

where $y^{i\sim}$ is the true label value, and $Y_{X}$ is the set of all possible labels. Therefore, the loss function is:

$\displaystyle\log(P(y^{i}|x^{i}))=S(x^{i},y^{i})-\,\log(\exp(S(x^{i},y^{i\sim}% )))$ (5)

Finally, a Viterbi [39] decoding algorithm is used to get the final tags:

$\displaystyle y^{\ast}=\arg\max(S(x^{i},y^{i}))$ (6)

where $y^{\ast}$ is the best tagged sequence.

4. Experiments

4.1 Experimental setting

Training data. We have collected text problems as the training data through the internet for our experiment. And a total of 5632 text problems were screened out after the exercises were cleaned, de-duplicated, and data enhanced. We label the training data manually, and the statistical information of the data set is shown in Table 2.

Table 2
Statistical information of our experimental dataset

Project	Training	Validation	Test
Characters	192706	63218	66838
Marked characters	14163	4586	4688
Problems	3380	1105	1147
Average characters	57.01	57.21	58.27

Table 3

Examples of each problem include simple, multi state, encounter and catch up

Problem type	Example
Simple	In a certain curling competition, after the athlete throws the curling, the curling stops after 20 s. The position where the curling stops is 22 m away from the position where the curling is thrown. Then, the curling is from the throwing to the stopping. What is the average speed in the process?
Multi state	A car is driving on a 100 km long highway. The speed of the first 50 km is 20 m/s and the speed of the second half is 10 m/s. What is the average speed of the car?
Encounter	Cars A and B set off at the same time from two places 800 km apart and traveled towards each other. Car A’s speed was 80 km/h, and the two cars met after 5 hours. Question: What is the speed of car B in meters per second?
Catch up	A motorcycle with a mass of 100 kg is chasing a truck at a constant speed 120 km in front of it at a speed of 90 km/h. It is known that it only catches up after 270 km. Question: What is the speed of the truck?

Test data. The test kinematics problems are collected from the textbooks, exercise books and examination paper used by the students. And a total of 432 problems for testing in our experiment. The questions of these problems involve displacement, time, speed, and distance. These issues are presented in the form of text, without any graphic description. Furthermore, we define the entity conflict rate (ECR) as the ratio of nonidentical overlapping entity matches to all unique entities matched with training data. And the ECR of our test data is 33.76%.

Problem classification. We divide the collected kinematics problems into four classes: simple, multi state, encounter and catch up according to the object and state. A kinematics problem is a simple problem only if it can be solved by a one-step operation. A problem belongs to the multi state problem if it contains multiple motion states. The encounter problem refers to a problem that contains multiple(two for junior high) objects, and the objects are driving toward each other. Finally, the catch up problem refers to the problem with multiple objects with the same moving direction. Examples of each type are shown in Table 3.

Table 4

Implement details of every model. B denotes BERT, L denotes BiLSTM and C denotes CRF

	DNS	Math-EN	S-Align	GTS	Ours
Embedding	128	128	128	128	768
Epoch	80	80	80	80	40
Batch	32	32	32	32	8
Shuffle	True	True	True	True	True
LR	1e-3	1e-3	1e-3	1e-3	B: 5e-5
					L: 1e-3
					C: 1e-2
LR decay	1e-2	1e-2	1e-2	1e-2	1e-2
Min LR	1e-5	1e-5	1e-5	1e-5	1e-5
Dropout	0.5	0.5	0.5	0.5	0.5

Baseline algorithms. To evaluate the performance of the proposed algorithm, we experiment and analyze the existing automatic solvers on the data we collected. Given that there is no automatic algorithm for kinematics problems, we compare some open-source algorithms for mathematical application problems. Besides, these algorithms are all deep learning based algorithms. We chose four baseline algorithms: DNS [15], Math-EN [30], S-Align [31] and GTS [6]. DNS is the first DL based algorithm proposed by Wang et al. [15], which directly generates expressions about the problem through a Seq2Seq model. While the Math-EN was developed by Wang et al. [30] at 2018. Compared with DNS, it improves the accuracy by reducing the target sample space through equation normalization. Additionally, S-Align is developed by Chiang and Chen [31]. And it is the first work that considers the impact of semantic information on the solution. Finally, GTS [6] is a goal driven algorithm which complete the decomposition of the goal by decomposing expressions.

Implement details. Since we use the BERT model, the dimension of embedding layer is set to 768. And the minimum learning rate of each model are set to 1e-5. Given that the fitting capabilities of the three layers of BERT, BiLSTM and CRF are different,, we set different learning rates for our models. Additionally, since solvers developed by DL cannot output a readable given conditions and analysis, we only compare the calculation results. The parameters for each model (NER model for us) are shown in Table 4.

Table 5

An example to detail the comprehension of solver to a problem

Problem	How long does it take for a balloon rising at a constant speed of 7 m/s to reach a height of 14.5 m from the ground?
Given	Object (balloon);
Conditions	State (Uniform linear motion);
	Time (How long);
	Velocity(7 m/s);
	Displacement(14.5 m);
	Has_Velocity (Uniform linear motion, 14.5 m);
	Has_Displacement (Uniform linear motion, 14.5 m)
Question	Has_Time(Uniform linear motion, How long)

Table 6

Experimental results, including Complete comprehension, Partial comprehension and Incomprehension

Problem type	Question	Number of problem	Number of wrong	Complete comprehension (%)	Partial comprehension (%)
Multi state	Displacement	25	2	92.00	8.00
	Velocity	85	6	92.94	7.06
	Total	110	8	92.73	7.27
Encounter	Displacement	17	3	82.35	17.65
	Time	31	8	74.19	25.81
	Velocity	25	4	84.00	16.00
	Total	73	15	79.45	20.55
Simple	Displacement	51	2	96.08	3.92
	Time	45	1	97.78	2.22
	Velocity	65	2	96.92	3.08
	Total	161	5	96.89	3.11
Catch up	Distance	14	4	71.43	28.57
	Displacement	17	5	70.59	29.41
	Time	36	7	80.56	19.44
	Velocity	21	25	76.19	23.81
	Total	88	21	76.14	23.86
Total		432	49	88.66	11.34

Table 7

Problems of partial comprehension

Unrecognized project	Number of problems	Type and number
Object	20	Multi-state, 1; Encounter, 8; Simple, 1; Catch Up, 10;
States	8	Multi-state, 3; Encounter, 2; Simple, 1; Catch Up, 2;
Conditions	16	Multi-state, 3; Encounter, 3; Simple, 3; Catch Up, 7;
Questions	5	Multi-state, 1; Encounter, 2; Catch Up, 2;

Table 8

Solving results comparison, where C represents the correct solve number while T represents the total problem number

	Multi state (C/T)	Encounter (C/T)	Simple (C/T)	Catch up (C/T)	Total (C/T)
DNS	76/110	38/73	125/161	48/88	287/432
Math-EN	80/110	42/73	129/161	51/88	302/432
S-Align	99/110	51/73	142/161	57/88	349/432
GTS	99/110	49/73	148/161	58/88	354/432
Ours	1 02/110	58/73	156/161	67/88	383/432

As described in Table 4, the embedding dimension of baseline algorithms are set to 128. Additionally, we set the maximum training epochs and batch to 40 and 8 for our model, and the baseline models to 80 and 32. The shuffle, LR decay, Minimum learning rate and Dropout of all models are the same, which are set to True, 1e-2, 1e-5 and 0.5. Additionally, we set the Initial learning rates of BERT, BiLSTM and CRF to 5e-5, 1e3 and 1e-2 respectively. And the Initial learning rate of the baseline algorithms are set to 1e-3. Furthermore, all the models use a Adam [40] optimizer.

Evaluation index. We use three levels of comprehension, which are: Complete comprehension, Partial comprehension and Incomprehension to verify the feasibility of the method. To better define the three levels of comprehension, we put two conditions as follows:

Identify the given conditions, which means the solver get the number, boundary and type of various entities such as object and state etc correctly.

Identify the questions, that is, the solver acquire the reasoning goal correctly.

If and only if the solver completely extracts the given conditions, and questions, it is considered that the computer fully comprehends the kinematic problem. Correspondingly, if and only if the machine fails to extract any of the above conditions, it is considered that it does not understand the problem at all. Furthermore, when the system correctly recognizes one or several but not all of the above, we think that the solver is partial comprehends the problem.

Take “How long does it take for a balloon rising at a constant speed of 7 m/s to reach a height of 14.5 m from the ground?” for example. As shown in Table 5, the recognized entities of this example are “How long”, “balloon”, “7 m/s”, “14.5 m”. Since no state word is detected, the default motion state is uniform linear motion. The given conditions are object is the balloon and the motion velocity and displacement of the object are 7 m/s and 14.5 m respectively. In addition, the question of this problem is the motion time of balloon. We define that only when the acquired conditions are the same with information in Table 5, we think that it has complete comprehends this problem. When the system correctly recognizes one or several but not all in Table 5, we think the solver partial comprehends this example. And if the system fails to recognize any condition in Table 5, it did not understand the problem.

4.2 Main results

Performance of proposed algorithm. As shown in Table 5, the percentages of the solver’s complete understanding, partial understanding, and non-understanding of the multi-state problem are 92.00%, 8.00%, and 0, respectively. Among them, only 2 of the 25 questions about displacement are not complete comprehended, and 6 of the 85 questions about speed are partial comprehended. Among the 73 encounter problems, the percentages of Complete comprehension, Partial comprehension, and Incomprehension are 79.45%, 20.55%, and 0, respectively. Among the 17 problems with displacement question, 3 are partially comprehended and 23 of 31 with time question are solved correctly. For the problems with time question, 4 of 25 are partially comprehended. The percentage of complete comprehension of simple problems has reached 96.89%, which means 155 of the 161 test problems have been completely understood and solved. Additionally, among the 5 partially comprehended problems, there are 2 problems with displacement question, 2 with velocity question, and only one with time question. Moreover, for 88 catch up problems, the percentage of complete comprehension is 76.14%. And 4 of 14 distance questions are partially comprehended, 5 of 17 displacement questions are partially understood, 7 of 36 time questions are partially understood and 16 of 21 velocity questions are completely comprehended. In a summary, for the 432 test questions, 49 are not fully comprehended, accounting for 11.34%, and complete comprehension problems accounted for 88.66%. Futhermore, we also counted the distribution of the partial comprehension problems, and the results are shown in Table 6.

The main reason that the solver partially comprehends some problems is that it cannot correctly identify objects and given conditions. Of the 49 problems, 20 failed to correctly identify the object, and 16 failed to correctly identify the given conditions. In other words, failure to correctly identify objects and known conditions account for 73.47% of problems that cannot be completely understood. Moreover, problems the solver failed to answer are mainly concentrated in two types, Encounter and Catch Up among the four types of questions collected in this paper. The two types of problems accounted for 30.61% and 45.65% of the total respectively, of which Catch Up problems accounted for nearly half.

Compare results. To further evaluate the performance of the proposed algorithm, we tested the baseline models selected on the data we collected in this paper. And the statistical results of the experiment are shown in Table 8.

From Table 8 we can see that the first column is algorithms for comparison, and the remaining columns are the types of kinematic problems. By using a Seq2Seq model, DNS correctly solved 287 of 432 test problems. Math-EN solves 15 more problems than DNS by reducing the target space. Additionally, S-Aligned and GTS have similar performance, solving 349 problems and 354 problems respectively, both of which are better than Math-EN. The last block is the result of proposed model, our model has solved 383 problems by combining DL with ontology, which is the highest of all models.

5. Conclusions

In this paper, a method combining deep learning and ontology is proposed to solve the automatic answering task of Chinese kinematics problems. First, we established the understanding ontology and reasoning ontology for kinematic problems through the kinematic problem understanding model, and then based on the drawbacks of SWRL, we proposed a method of using an interpreter to display a readable solution process. Moreover, in order to accurately match the recognized entity with the proposed ontology, a novel tagging scheme combining the kinematic understanding model with the traditional scheme is proposed to complete the NER. Additionally, extensive experiments have been constructed in this paper. And the experimental results show that the performance of the proposed method for solving Chinese kinematics problems automatically is better than previous algorithms.

Intelligent answering technology is a hot research topic, but current researches mainly focuses on mathematical problems and circuit problems. Research in kinematics problems is still in its infancy, the feature work will be promoted from the two aspects:

This article only considers the uniform linear motion, which is an ideal state. In the future, we will develop algorithms that solve more complex physical kinematics problems such as force analysis;

This article only considers the text-only kinematics problems. However, many kinematics problems are given in the way of combining graphics and text, so developing algorithms can solve problems of combination graphics and texts is also one of the future directions.

References

Chang

Lee

. Symbolic logic and mechanical theorem proving. SIAM Review. 1973; 3(16): 403-407. doi: 10.1137/1016071.

Huang

Shi

Lin

Jian

. Learning finegrained expressions to solve math word problems. In: Hwa

Riedel

, editors. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP2017). Copenhagen: Association for Computational Linguistics (ACL); 2017. pp. 2017-805.

Zhang

Wang

Lee

Lim

. Graph-to-tree learning for solving math word problems. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). Online: Association for Computational Linguistics (ACL); 2020. pp. 2020-3928. doi: 10.18653/v1/2020.acl-main.362.

Jian

Zhang

. A relation based algorithm for solving direct current circuit problems. Applied Intelligence. 2020; 50(7): 2293-2309. doi: 10.1007/s10489-020-01667-7.

Sander

. Arithmetic word problem solving. ANAE – Approche Neuropsychologique des Apprentissages chezl’Enfant. 2018; 30(156): 611-619.

Liang

Wong

Lin

. A goal oriented meaning-based statistical multi-step math word problem solver with understanding, reasoning and explanation. In: Sierra

, editor. Proceedings of 26th International Joint Conference on Artificial Intelligence (IJCAI). Melbourne: AAAI Press; 2017. pp. 2017-5235.

Wang

Zhang

Gao

Song

Guo

Shen

. Mathdqn: Solving arithmetic word problems via deep reinforcement learning. In: McIlraith

Weinberger

, editors. The 32nd AAAI Conference on Artificial Intelligence. Louisiana: AAAI Press; 2018. doi: 10.1609/aaai.v32i1.11981.

Botana

Hohenwarter

Janičić

Kovács

Petrović

, et al. Automated theorem proving in geogebra: Current achievements. Journal of Automated Reasoning. 2015; 55(1): 39-59.

Mitra

Baral

. Learning to use formulas to solve simple arithmetic problems. In: Kordoni

, editor. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL). Berlin: Association for Computational Linguistics (ACL); 2016. pp. 2016-2144. doi: 10.18653/v1/P16-1202.

10.

Huang

Lin

. Explanation generation for a math word problem solver. International Journal of Computational Linguistics and Chinese Language Processing. 2015; 20(2): 27-44.

11.

Schulze

Shelby

Treacy

Wintersgill

. Andes: A coached learning environment for classical newtonian physics. In: Chambers

, editors. 11th International Conference on College Teaching and Learning. Australia: Florida Community Coll.; 2000.

12.

Gertner

Conati

Vanlehn

. Procedural help in andes: Generating hints using a bayesian network student model. In: Mostow

Rich

Buchanan

, editors. The 15th American Association for Artificial Intelligence. Madison Wisconsin: AAAI Press; 1998. pp. 106-111.

13.

Xue

. The design and development of A Multiply Representation based Tutoring System for Problem Solving in High School Physics. PhD thesis, East China Normal University, 2011.

14.

Yang

. The Research and Design of Humanoid Solver for Junior High School Physics Uniform Motion Calculation problems. PhD thesis, East China Normal University, 2015.

15.

Yan

Liu

Shi

. Deep neural solver for math word problems. In: Hwa

Riedel

, editors. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP). Copenhagen: Association for Computational Linguistics (ACL); 2017. pp. 2017-855. doi: 10.18653/v1/D17-1088.

16.

W3C Semantic Web Activity. Available from: http://www.w3.org/2001/sw/Activity.

17.

. Owl: Web ontology language. Springer Berlin Heidelberg; 2011.

18.

Bhatia

MPS

Kumar

Beniwal

. Ontology based framework for detecting ambiguities in software requirements specification. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom). New Delhi: IEEE; 2016.

19.

Sitthithanasakul

Choosri

. Using ontology to enhance requirement engineering in agile software process. In: 2016 10th International Conference on Software, Knowledge, Information Management & Applications (SKIMA). Chengdu: IEEE; 2016. doi: 10.1109/SKIMA.2016.7916218.

20.

Murtazina

Avdeenko

. An Ontology-based Approach to Support for Requirements Traceability in Agile Development. Procedia Computer Science. 2019; 150: 628-635. doi: 10.1016/j.procs.2019.02.044.

21.

Pakdeetrakulwong

Wongthongtham

Khan

. An Ontology-Based Multi-Agent System to Support Requirements Traceability in Multi-Site Software Development Environment. In: Proceedings of the ASWEC 2015 24th Australasian Software Engineering Conference. New York: Association for Computing Machinery; 2015. pp. 2015-96. doi: 10.1145/2811681.2811700.

22.

. Basic principles of mechanical theorem proving in elementary geometries. Journal of Automated Reasoning. 1986; 2(3): 221-252.

23.

Bobrow

. Natural language input for a computer problem solving system. DSpace@MIT; 1964. p. 1-130.

24.

Kintsch

Greeno

. Understanding and solving word arithmetic problems. Psychological Review. 1985; 92(1): 109-29. doi: 10.1037/0033-295X.92.1.109.

25.

Roy

Vieira

Dan

. Reasoning about quantities in natural language. PhD thesis, University of Illinois at Urbana-Champaign, 2017.

26.

Hosseini

Hajishirzi

Etzioni

Kushman

. Learning to solve arithmetic word problems with verb categorization. In: Moschitti

Pang

Daelemans

, editors. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha: Association for Computational Linguistics (ACL); 2014. pp. 2014-523. doi: 10.3115/v1/D14-1058.

27.

Roy

Dan

. Unit dependency graph and its application to arithmetic word problem solving. In: Schuurmans

Wellman

, editors. Proceedings of the AAAI Conference on Artificial Intelligence. Arizona: AAAI; 2016. doi: 10.1609/aaai.v31i1.10959.

28.

Koncel-Kedziorski

Hajishirzi

Sabharwal

Etzioni

Ang

. Parsing algebraic word problems into equations. Transactions of the Association for Computational Linguistics. 2015; 3(1): 585-597.

29.

Roy

, Dan

,Solving general arithmetic word problems, In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). Lisbon: Association for Computational Linguistics (ACL). 2015. pp. 2015-1743. doi: 10.18653/v1/D15-1202.

30.

Lei

Yan

Deng

Zhang

Liu

. Translating a math word problem to an expression tree. In: Riloff

Chiang

Hockenmaier

Tsujii

, editors. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). Brussels: Association for Computational Linguistics (ACL); 2018. pp. 2018-1064. doi: 10.18653/v1/D18-1132.

31.

Chiang

Chen

. Semantically-aligned equation generation for solving and reasoning math word problems. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). Minneapolis: Association for Computational Linguistics (ACL); 2018. doi: 10.18653/v1/N19-1272.

32.

Xie

Sun

. A goal-driven tree-structured neural model for math word problems. In: 28th International Joint Conference on Artificial Intelligence IJCAI-19. Macao: International Joint Conference on Artificial Intelligence (IJCAI); 2019. pp. 5299-5305. doi: 10.24963/ijcai.2019/736.

33.

Protégé. Available from: https://protege.stanford.edu/.

34.

Lavbi

Bajec

Krisper

. Semantic web rule language. Elektrotehniška zveza Slovenije. 2006.

35.

Devlin

Chang

Lee

Toutanova

. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint. 2018. arXiv:1710.10903.

36.

Lample

Ballesteros

Subramanian

Kawakami

Dyer

. Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). San Diego: Association for Computational Linguistics (ACL); 2016. pp. 2016-300.

37.

Lafferty

Mccallum

Pereira

FCN

. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Brodley

Danyluk

, editors. Proceedings 18th International Conference on Machine Learning. Morgan Kaufmann Publishers Inc.; 2001. pp. 2001-282.

38.

Hochreiter

Schmidhuber

. Long short-term memory. Neural Computation. 1997; 9(8): 1735-1780.

39.

Viterbi

. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm. IEEE Transactions on Information Theory. 1967; 13(2): 260-269. doi: 10.1109/TIT.1967.1054010.

40.

Kingma

. Adam: A method for stochastic optimization. In: Bengio

Lecun

, General, editors. The 3rd International Conference for Learning Representations. San Diego; 2014.

KinRob: An ontology based robot for solving kinematic problems

Abstract

Keywords

1. Introduction

3. Methodology

Table 1 Information of the example, including system, object, state, given conditions, unknown conditions, and question. ULM represents uniform linear motion

4.1 Experimental setting

Table 2 Statistical information of our experimental dataset

5. Conclusions

References

Table 1
Information of the example, including system, object, state, given conditions, unknown conditions, and question. ULM represents uniform linear motion

Table 2
Statistical information of our experimental dataset