Self-learning knowledge base using Naïve Bayes classifier

Abstract

With the abundance of technology, digital natives prefer to be more comfortable and effortless. They expect an instant, correct and crisp response to their queries. Keeping this in mind, this paper proposes an agent, HELP, which fetches the unstructured data from the web, aligns it with the structured knowledge base by transforming it into rules and updates the CLIPS knowledge base progressively. It helps to conquer the issue of format non-uniformity. The proposed model is simulated for a technical education university. The objective is to provide an interface to the newcomers who are unfamiliar with the rules, regulations, and policies of the new environment and at the same time reluctant to pose to seniors or faculty.

Keywords

Self-learning knowledge base Naïve Bayes classifier learning agent CLIPS knowledge base automatic rules extraction expert system

1. Introduction

Today everyone is dictated by convenience. The use of the abundance of technology on a daily basis has changed the way digital natives think and interact. It is further changed their expectations. Now, they look for effortless engagement and immediate responses, at any moment in their “round-the-clock” lifestyle [1]. Moreover, instead of reading the lengthy document, they are more interested in concise and to the point answer to their queries. The information that a person seeks in an environment already exists in the form of texts either offline or online. The only problem is tantamount to retrieve the crisp and correct information on demand. There arises a need for a smart and intelligent agent which can search for the correct information easily and produce it flawlessly. A robust knowledge base is a core requirement for such an agent. Such systems can be employed as an inquiry system or question answer system (QAS) situated in a large variety of domains such as schools, universities, organizations, customer care centers, hospitals, railways, airports, banks, entertainment.

Question answering systems (QAS) serve a good solution to these problems. The existing QAS answers the queries either from the text [1, 2, 3] or the knowledge base [4, 5, 6]. While knowledge base (KB) methods are effective at answering compositional questions, their performance is often affected by the incompleteness of the KB. Moreover, it requires knowledge engineering with the help of domain experts which are costly and time-inefficient. Lots of information is available online in textual form. However, it is highly unstructured and cannot be directly employed to answer the queries. So, another challenge is of knowledge extraction from unstructured text [7]. Thus, there is a high need for aligning the structured knowledge base with the unstructured text on a common space. It is important to take into account the knowledge base to answer the queries due to its generative capacity. At the same time, the unstructured text is important in order to keep the model updated with the continuously changing and rapidly increasing data.

However, QAS working with the combination of structured knowledge base and unstructured text is challenging owing to the structural non-uniformity. Different researchers [8, 9, 10, 11, 12, 13, 14] address this problem partially by means of aligning text patterns with KB. But the rich and ambiguous nature of the language allows a fact to be expressed in many different forms which these models are failed to capture.

In this paper, we propose an agent, HELP, that fetches the unstructured data from the web, align it with the structured knowledge base by transforming it into the rules and update the CLIPS KB dynamically. It helps to conquer the issue of format non-uniformity. Moreover, a common inference engine is used to answer all kinds of queries sufficiently as the unstructured data are also represented in the form of rules and facts. The proposed model is simulated for a technical education university. The objective is to present an interface to the newcomers who are unfamiliar with the rules, regulations, and policies of the new environment and at the same time reluctant to pose to seniors or faculty. The advantages of such a system are no need of expert all the time, up-to-date information available and snappy answers to the queries.

The paper is structured as follows. First, the similar work in the same domain is discussed followed by detail discussion on the proposed model and its architecture. Then, implementation and results are discussed. Finally, the paper is concluded with limitations and directions for the future work.

2. Related work

In recent years, several kinds of methods and algorithms have been suggested by researchers in various domains [1, 2, 15, 16, 17, 18]. One such system presents a structure for obtaining complex relations from codes and texts in the form of rules using natural language processing (NLP) tools and text matching methods. The structure of their system consists of existing knowledge and free texts. The knowledge is represented in the form of Web Ontology Language (OWL) whereas the free text is in the unstructured format. The unstructured input is converted to a format and then the grammatical category is identified using Stanford Parser. The model is simulated for Gynecology, a medical field. The limitation of the system is that in order to extract rules, knowledge base should be in the form of OWL ontology as Semantic Web Rule Language (SWRL) works only with OWL [2]. Another researcher has developed their own learning engine for testing real time data. Here, the author has also compared different association based algorithms and then propose best suited algorithm that can be used to generate rules for the data. Various association rule algorithms that are compared were Apriori, Predictive Apriori, FPGrowth and Tertius. From the analysis of different association rule algorithms, Tertius Algorithm was found to be best suitable as it was able to generate valid rules in the shortest possible time. The algorithms were tested on real time data by implementing their Learning Engine in NTPC power plant. The demerit of the learning engine is that the information used for testing is already structured and the structure of the knowledge base should be strictly in the form of if-then only [1]. In another research work, the authors have used a supervised machine learning algorithm for detecting short messages started by mobile malware on the basis of characteristics derived from the content of those messages. Here, authors have compared the detection abilities of various Machine Learning Algorithms like support vector machines (SVM), k-nearest neighbors algorithm (KNN), Decision Trees, Random Forests and Multinomial Naive Bayes. The algorithms were compared in three separate cases. In first case, all small messages are treated independent of each other while in the second case, half of the messages are treated as training dataset. In the last case, the classifier is trained on the dataset up to a particular period of time and is tested for the upcoming messages. From this research paper, the author concludes that all the machine learning (ML) techniques perform extremely well with the average accuracy of 98%. Random Forest (RF) outperforms other algorithms with accuracy of 99.36% [15].

Diverse kinds of work are also performed on Hindi language for classification and simplification purpose. One such work presents a Hybrid approach for determining sentimental texts or phrases from Hindi text and categorizes them into positive, negative and neutral on the basis of their polarity. The dataset of 1000 sentences has been prepared after collecting resources from websites, blogs, discussion forums, etc. The different phases of the model include Text Pre-processing, Part of Speech (POS) Tagging, Hybrid Approach which include Rule-based model and Statistical-based model. The proposed classification model generates result with 70% accuracy. The limitation of their method is that during testing the experiment is conducted on a small dataset in a specific domain [16]. Another work based on Hindi language provides an improved annotation scheme for indirect anaphora in Hindi based on Emille corpus. Here, the methodology of the work consists of four parts. Firstly, the selection of corpus in Hindi was done followed by identification of characteristics that define indirect anaphora. After that proposal was validated using ML techniques and then designing of a classification system for indirect anaphora. The drawback of this model is that author was not able to produce required results due to lack of desirable rules. Diminutive size of dataset was another limitation of the research work [17]. On the similar track, research work was carried out focusing on the problem of simplifying complex sentences into multiple simple sentences using linguistic resources in Hindi language. The linguistic resources used comprise of verb demand frame and conjuncts’ list. For the purpose of testing, both human and automated evaluation was performed. Although they have achieved a good score during evaluation, however, it cannot handle complex predicates as they are generative in nature [18].

3. Proposed model

Agreeing on the need of an agent which can instantly give a response to our query without expecting much effort from us, we propose an agent, HELP, that works with the combination of the knowledge base and unstructured text to process the query. Initially, it has an underlying knowledge base (KB) developed by domain expert [4]. Further, it extracts the information from the web to keep its knowledge up-to-date, then transforms the data into facts and (if-then) rules, and aligns them with the CLIPS KB to keep the format uniformity. Afterward, CLIPS inference mechanism along with backward chaining algorithm (implemented in JAVA [4]) is utilized to respond to the user’s query. This way the agent, HELP, keeps its KB up-to-date without any intervention of an expert all the time. Moreover, every information (structured or unstructured) is aligned at a commonplace which eradicates the problem of format non-uniformity and hence allows to use the same inference mechanism to deal with all type of information. Figure 1 displays the architecture of the proposed model.

Figure 1.

Architecture of proposed model, HELP.

Here, the raw data is crawled from online relevant sources which are then preprocessed and forwarded to Classifier module. This module takes the decision of categorizing the sentences. After categorizing, it is further passed to the Knowledge Extractor. This module transforms the sentences into facts and rules. This structured information is now updated in the knowledge base in form of rules for further inferencing.

3.1 Crawler and preprocessor

The crawler crawls the unstructured or semi-struct-ured data from the web and passes it to the preprocessor. The preprocessor decomposes the complete text into sentences and tag each word with appropriate part of speech. It also tags the sentence as simple, complex or compound. A simple sentence commonly composed of a subject and a predicate. Sometimes, it can also have an object. A complex sentence is composed of numerous simple sentences which are dependent on each other. While two or more simple sentence combined into a single sentence with the coordinating conjunction forms a compound sentence. Table 1 shows a few examples of different sentence categories.

Table 1
Examples of sentence category

Type of sentence	Example	Explanation
Simple sentence	I run.	I: [subject] Run: [predicate]
	Ria ran into bathroom.	Ria: [subject] Ran into: [predicate] Bathroom: [object]
Compound sentence	Ria started on time but she arrived late.	Ria started on time: [sentence 1]
		But: [Coordination conjunction]
		She arrived late: [sentence 2]
Complex sentence	I enjoyed the ice-cream that you brought for me.	I enjoyed the ice-cream: [sentence 1]
		you brought form me: [sentence 2]

Table 2

Analysis of classifiers

	Classifiers
	RF	SVM ${}_{\rm linear}$	SVM ${}_{\rm nonlinear}$	MNB
Time taken for training	5 min 27 sections	31 sections	4 min 58 sections	46 seconds
Correctly classified instances	78	63	75	77
Accuracy	97.5%	78.75%	93.75%	96.25%

Table 3

Confusion matrix of multinomial naive bayes classifier

	Confusion matrix
	What	What-happen	Where	When
What	18	0	2	0
What-happen	0	19	1	0
Where	0	0	20	0
When	0	0	0	20

Table 4

Examples of knowledge extraction and its structuring

Sentence: “A pregnant goldfish is called a twit.” Sentence category: WHAT-IS Strucutred knowledge: is-called (pregnant goldfish, twit). Explanation: It composed of a simple sentence. We can directly extract the information and store it in form of fact in knowledge base.

Sentence: “The tallest monument built in the US.” Sentence category: WHERE Strucutred knowledge: in (tallest monument, US). Explanation: It composed of a simple sentence. A fact that tallest monument is loacted in US can be extracted form it.

Sentence: “When heated above 825 ${}^{\circ}$ C calcium carbonate converts calcium oxide.” Sentence category: WHAT-HAPPEN Strucutred knowledge: heated (825

{}^{\circ}

C calcium carbonate)

\to

converts (calcuim oxide). Explanation: It contains two sub sentences which are capable to form a rule. The first half is “heated above 825

{}^{\circ}

C calcium carbonate” and second half was “converts calcium oxide”. Both the half were converted into form of subject object and predicate as follows: “heated”, “825

{}^{\circ}

C calcium carbonate” and “converts”, “calcuim oxide”.

Sentence: “RIT act was passed in 2005” Sentence category: WHEN Strucutred knowledge: time (RIT act passed, 2005). Explanation: It composed of a simple sentence. We can directly extract the information and store it in form of fact in knowledge base. A fact that RIT act passed at what time can be extracted from it.

Table 5

Examples of knowledge extraction and its structuring in technical education university

Sentence	Sentence category	Strucutred knowledge
“The training and placment cell is situated in JIIT Noida.”	WHERE	have (training-placement-cell, jiit-noida)
“For more details, please visit http://www.ucrjiit.com”	WHERE	have ( http://www.ucrjiit.com, more-details)
“Robotics is the science of using a machine (a robot) to perform repeated and hazardous task without human intervention.”	WHAT-IS	is (robotics, science-of-machine-a-robot-to-hazardous-task-without-human-intervention)
“KNUTH is a Programming Hub (one of the technical hubs of Department of CSE/IT) of JIIT Noida and was incepted in year 2011 with a vision to develop a community of coders/programmers at JIIT Noida.”	WHAT-IS	is (knuth, programming-hub-one-of-technical-hubs-of-department-of-cse/it-of-jiit-noida-in -year-2011-with-vision-to-community-of-co-ders/programmers-at-jiit-noida)
“Student failing in subject will be awarded F grade.”	WHAT-HAPPEN	failing (students, subject) $\to$ be-awarded (f-grade)
“If the attendance profile of a student is unsatisfactory (as given in the rules below), he/she will be debarred.”	WHAT-HAPPEN	is (attendance-profile-of-student, unsatisfactory-as-in-rules-below) $\to$ be-debarred (he/she)
“Eleventh IC3 conference is on 2-8-2018”	WHEN	time (Eleventh IC3 conference, 2-8-2018)

3.2 Classifier

The module classifier plays an important role in updating the knowledge base with new facts and rules. Upon investigation, it is found that the queries generally asked by the users lies into four classes: WHAT-IS, WHERE, WHAT-HAPPEN and WHEN. This classification helps in understanding whether the sentence should be represented as a fact or a rule and how. The sentences, belonging to “WHAT-IS or WHERE” query class, are the simple atomic sentences and can be used directly for inferencing. Hence, these sentences should be stored as facts in the knowledge base. The sentences, belonging to “WHAT-HAPPEN” or “WHEN” class, are compound or complex sentences and should be stored as (if-then) rules in the knowledge base. Now, the challenging task is to identify the query-class of any sentence. A classifier is required for this task. We picked Multinomial Naïve Bayes, random forest, SVM with linear kernel and non-linear kernel to select the best classifier for the proposed model. The Section 4 discusses the pros and cons of all these classifier for the proposed model. Based upon the analysis, we selected Multinomial Naïve Bayes classifier [19] for query-class identification. Multinomial Naïve Bayes classifier estimates the conditional probability of a particular word given a class as the relative frequency of term $t$ in documents belonging to class $c$ :

$\displaystyle P(t|c)=\frac{T_{ct}}{\sum_{t^{\prime}\in V}T_{ct^{\prime}}}$ (1)

Here, $T_{ct^{\prime}}$ represents the number of occurrences of each term $t^{\prime}$ existing in the vocabulary V in the training documents from class $c$ , including multiple occurrences. And $T_{ct}$ represents the number of occurrences of the term $t$ in the training documents from class $c$ , including multiple occurrences. The reason of selecting Multinomial Naïve Bayes classifier is to reduce the training time heavily and at the same time, achieving a decent accuracy.

For training the model, two Question Answer (QA) datasets: SQuAD [20] and WikiQA [21] are used. SQuAD is one of the popularly used QA dataset which is comprised of comprehensions and question answer pair. WikiQA is a QA dataset containing questions and its answer as small sentences. Both the datasets contain questions and answers. However, as per the requirement of the proposed model, we need only sentence and its query class. Thus, the datasets are pre-processed to extract only these two requisite attributes. It is further filtered to keep the specified four query classes (WHAT-IS, WHERE, WHAT-HAPPEN and WHEN). After preprocessing, 1019 samples are retrieved. The model is trained with this preprocessed dataset (939 samples). Later on, the trained model is tested with remaining 80 samples and used to determine the query class of crawled sentences from the web.

3.3 Knowledge extractor

Once the query class is known, the sentence needs to be transformed and stored as knowledge in the knowledge base so that it can be later used in the inference mechanism. Here, transformation basically means the converting the unstructured text into CLIPS compatible facts and (if-then) rules with the help of POS tags. Upon analysis, it is found that sentences belonging to the same class exhibit some common pattern. The sentences categorized as “WHAT-IS” query-class generally composed of two components: subject and information related to the subject. These two components can be arranged in either way: subject $+$ information or information $+$ subject. The most commonly used connectors are “is”, “is called” and “known as”. For example, “Lightning is a discharge of electricity into the atmosphere.” In this sentence “Lighting” is subject and “a discharge of electricity in the atmosphere” is the information related to subject “Lighting”. The sentence of “WHERE” query-class generally composed of two nouns separated by some set of words or phrases such as “located in”, “situated at” etc. For example, “The National Orchid Garden, located within the Singapore Botanic Gardens”. So, the basic structure of this kind of sentence is noun $+$ noun. The sentences classified as “WHAT-IS” or “WHERE” query class can be represented as facts.

The sentences which belong to “WHAT-HAPPEN” query-class are mainly the complex sentences or compound sentences. They either consist of two sub-sentences or made up of several simple sentences joined with help of conjunction or other connecting words. These kinds of sentences should be represented as if-then rules. Next challenge is to identify the antecedent and consequence part of the rule from the given sentence. For example, “If you keep a Goldfish in the dark room, it will eventually turn white.” In this sentence, the first half of the sentence is antecedent and second half belongs to consequence part. Whereas, in the sentence, “Butterflies can only fly when their temperature is above 27 ${}^{\circ}$ C”, the second half of the sentence belongs to antecedent while the first half is the consequence.

Algo: Knowledge Extraction & Structuring Input: sentence, question class Output: structured_info s â† Sentence_preprocessing(sentence) if question class = WHAT-IS first noun â† getFristNoun(s) verb â† getVerb(s) second noun â† getSecondNoun(s) structure_info â† createStrucutre(first noun, verb, second noun) else if question class = WHERE first noun â† getFristNoun(s) second noun â† getSecondNoun(s) structure_info â† createStrucutre(first noun, second noun)

else if question class = WHEN first noun â† getFristNoun(s) verb â† getVerb(s) time â† getNumber(s) structure_info â† createStrucutre(first noun, verb, time)

else if question class = WHAT-HAPPEN antecedent sentence, decedent sentence â† lineSplit(s) first noun â† getFristNoun(antecedent sentence) verb â† getVerb(antecedent sentence) second noun â† getSecondNoun(antecedent sentence) antecedent â† createStrucutre(first noun, verb, second noun) first noun â† getFristNoun(decedent sentence) verb â† getVerb(decedent sentence) second noun â† getSecondNoun(decedent sentence) decedent â† createStrucutre(first noun, verb, second noun) structure_info â† createRule(antecedent, decedent) end if

Basically, two types of sentences can be categorized in “WHEN” query-class: sentences which are associated with time and sentences which are associated with a situation. We majorly focused on the first category of sentences in this paper. Sentences of “WHEN” query class are composed of three components: noun, verb, and numbers or phrase containing numbers. For example, “India won the world cup in 2011” in the following sentence “India” is a noun. “Won” in a verb and “2011” is number. It is observed that the sentences of this category mostly have the word “in” followed by a number. These sentences contain atomic knowledge and can be stored as facts. After all the observations, we propose an algorithm for knowledge extraction and its structuring.

Table 6
Confusion matrix of technical education university dataset

	Confusion matrix
	What	What-happen	Where	When
What	99	7	2	3
What-happen	3	76	1	0
Where	4	2	99	0
When	1	0	1	105

In this algorithm, input is a sentence and the sentence query class. Firstly, the sentence is preprocessed to remove special characters and to create symmetry between the words present in the sentence. Then, according to the query-class, the sentence is decomposed. This content is passed as parameter to the createStructure() function which returns a structured string representing either a fact or a rule. At the end, all the structured information is updated in the knowledge base.

4. Implementation and results

The implementation is done using JAVA with CLIP-S. The dataset consists of two attributes i.e. sentence and its query class. After training, the model is tested on remaining 80 samples of the preprocessed dataset with 20 sentences from each category.

To begin with, an experiment is conducted to select a most appropriate classifier for the proposed model. Here, the simulation is carried out using Multinomial Naive Bayes (MNB), random forest (RF), SVM with linear kernel (SVM ${}_{\rm linear}$ ) and SVM with non-linear kernel (SVM ${}_{\rm nonlinear}$ ). The detailed analysis is shown in Table 2.

It is observed that time complexity of training process for RF and SVM ${}_{\rm nonlinear}$ are very high as compared to MNB and SVM ${}_{\rm linear}$ . Moreover, SVM ${}_{\rm linear}$ is very fast but it reported accuracy of 78.75% which is very less as compared to all other classifiers. Although the accuracy of RF is slightly more than MNB, its training time is very high. As a result, we selected Multinomial Naive Bayes as a classifier for the proposed model which reported 96.25% accuracy in reasonable time. Table 3 shows the confusion matrix of the Multinomial Naive Bayes classifier.

The trained model is then used with the knowledge extraction module. This module used the trained classifier to map the text into facts and rules. Table 4 shows the excerpts of the same.

4.1 Simulation for a technical education university

The designed framework is also simulated for a technical education university. For this, the data are crawled from its website. The crawled information is preprocessed and classified into query class using the already built classifier module. It is to be noted that as we are working with query class only, the dataset and the classifier module designed earlier are good enough to be used in any domain to determine the query class of any English sentence. Now, the knowledge is extracted and fed to the knowledge base. Table 5 shows the excerpts of the same. For example, the sentence “Student failing in a subject will be awarded F grade” is first cleaned and classified as WHAT-HAPPEN query class. Then, it is converted into CLIPS compatible rule format for use in further inferencing. The model is tested for 403 sentences from this domain and it reported accuracy of 94.04%. Table 6 shows the confusion matrix for the same.

5. Conclusion

Technologically savvy lifestyle of modern digital natives wants everything with effortless engagement and comfort. Keeping this in mind, we proposed an agent, HELP. It initiates by the generation of a knowledge base through knowledge acquisition. The data collection module acquires knowledge by crawling information from numerous online resources. Then, this module performs preprocessing to segregate acquired information into smallest meaningful unit i.e. sentences. These sentences were further tagged to a query class using a Multinomial Naive Bayes classifier. After tagging of sentences, the knowledge extractor converts the raw sentences in the CLIPS compatible facts and rules. It helps to conquer the issue of format non-uniformity. Moreover, a common inference engine is used to answer all kinds of queries sufficiently as the unstructured data is also represented in the form of rules and facts. The proposed model is simulated for the preprocessed dataset and technical education university dataset. It reported accuracy of 96.25% and 94.04% respectively.

The proposed model can be employed as an inquiry system or question answer system situated in a large variety of domains such as schools, universities, organizations, customer care centers, hospitals, railways, airports, banks, entertainment. Further, the model can be extended to cope with more query classes. It can also be extended to deal with WHEN class handling situations. Moreover, work can be to improve the scalability of the model.

References

Kaimal

Metkar

, RG. Self learning real time expert system. Computer Science and Information Technology (CS&IT). 2013; 361-372.

Boufaida

Boufrida

. Automatic rules extraction from medical texts. 2014 International Workshop on Advanced Information Systems for Enterprises, Tunis. 2014; 29-33.

Smith

Heilman

Hwa

. Question generation as a competitive undergraduate course project. in: Proceedings of the NSF Workshop on the Question Generation Shared Task and Evaluation Challenge, Arlington, VA. September 2008.

Lodhi

Mishra

Jain

Bajaj

. StuA: An intelligent student assistant. International Journal of Interactive Multimedia and Artificial Intelligence (in press), 2018; DOI: http://dxdoi.org/10.9781/ijimai.2018.02.008.10.9781/ijimai.2018.02.008.

Borgohain

Sanyal

. Rule-based expert system for diagnosis of neuromuscular disorders. International Journal of Advanced Networking and Applications. 4: 2012.

Samy

Almursheidi

. A knowledge based system for neck pain diagnosis. World Wide Journal of Multidisciplinary Research and Development (WWJMRD). 2016; 2(4): 12-18.

Das

Zaheer

Reddy

McCallum

. Question answering on knowledge bases and text using universal schema and memory networks. http://arxiv.org/abs/1704.08384\t_blankarXiv:1704.08348.2017.

Bunescu

Mooney

. Learning to extract relations from the web using minimal supervision. in: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 2007.

Mintz

Bills

Snow

Jurafsky

. Distant supervision for relation extraction without labeled data. in: Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics. 2009.

10.

Riedel

Yao

McCallum

. Modeling relations and their mentions without labeled text. in: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD’10). 2010.

11.

Yao

Riedel

McCallum

. Collective cross-document relation extraction without labelled data. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. October 09–11, 2010; 1013-1023, Cambridge, Massachusetts.

12.

Zeng

Liu

Chen

Zhao

. Distant supervision for relation extraction via piecewise convolutional neural networks. in: EMNLP. 2015; 1753-1762.

13.

Gardner

Krishnamurthy

. Open vocabulary semantic parsing with both distributional statistics and formal knowledge. in: AAAI. 2017.

14.

Verga

Belanger

Strubell

Roth

McCallum

. Multilingual relation extraction using compositional universal schema. 2016.

15.

Kühnel

Meyer

. Classification of short messages initiated by mobile malware. 2016 11th International Conference on Availability, Reliability and Security (ARES), Salzburg. 2016; 618-627.

16.

Malakar

Dwivedi

Kashyap

. Sentiment classification of hindi language using natural language processing techniques. International Journal of Research Studies in Computer Science and Engineering. August 2015; 39-42.

17.

Dutta

Kaushik

Prakash

. Machine learning spproach for the classification of demonstrative pronouns for indirect anaphora in hindi news items. The Prague Bulletin of Mathematical Linguistics. April 2011; (95): 33-50.

18.

Soni

Jain

Sharma

. Exploring verb frames for sentence simplification in Hindi. International Joint Conference on Natural Language Processing. 2013; 1082-1086.

19.

Manning

Raghavan

Schütze

. Introduction to Information Retrieval, Cambridge University Press. 2008.

20.

Rajpurkar

Zhang

Lopyrev

Liang

. SQuAD: 100000+, questions for machine comprehension of text. in: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2016.

21.

Yang

Yih

Meek

. Wikiqa: A challenge dataset for open-domain question answering. in: Proceedings of the Conference on Empirical Methods in In Natural Language Processing. September 2015; 2013-2018, Lisbon, Portugal.

Self-learning knowledge base using Naïve Bayes classifier

Abstract

Keywords

1. Introduction

2. Related work

3. Proposed model

Table 1 Examples of sentence category

Table 6 Confusion matrix of technical education university dataset

4.1 Simulation for a technical education university

5. Conclusion

References

Table 1
Examples of sentence category

Table 6
Confusion matrix of technical education university dataset