Abstract
Nowadays, chatbots have become popular tools in such a way that they are used in different sectors like commercial, elderly care, tourism, and education. The COVID-19 pandemic has forced many students and teachers to suspend face-to-face classes. Therefore, schools and governments have found it necessary to continue education remotely, using the resources provided by the Internet. This fact has created a greater interest in educational chatbots, so several projects have been proposed to develop these academic tools, each following its way of implementation and addressing issues from different points of view. This paper presents a proposal for chatbot classification, following the Systematic Mapping Study and an iterative method to review and classify educational chatbots. We also discuss the resulting categories and their characteristics and limitations and possible uses by developers and researchers.
Introduction
A chatbot is a software that interacts with humans using natural language [1]. In 1966, one of the pioneers in this field, Weizenbaum [2], presented his famous ELIZA chatbot, whose goal was to hold a conversation with a human being. ELIZA is based on templates and pattern matching techniques but lacks reasoning capabilities. The design of this chatbot is limited to the technology and knowledge of that time. However, this former and the subsequent research works are the foundations for the design of current chatbots.
Modern chatbots are capable of understanding the context of a conversation and learning from it, improving themselves over time thanks to the use of Machine Learning techniques [3] and novel computer architectures [4].
In customer service departments, the use of chatbots is constantly increasing. Thus, enterprises have plans to launch their own versions in the short term [5]. For this reason, companies do research in this field, in order to stay ahead and reduce costs [4].
With the arrival of the COVID-19 pandemic, governments were forced to implement mechanisms for students to continue with their education. Although the schools closed their facilities, the courses continue. The concern of governments and authorities in charge of education focused on not losing the school year. Consequently, students have had to adapt their academy training conditions and thus continue their classes online. This abrupt change directly have affected the traditional educational models that mainly focus on face-to-face classes [6].
Chatbots for the educational sector are not new at all. Before the COVID-19 pandemic, there were already proposals that involved chatbots in said domain, e.g.,
In the terminology by Gregor [15] and Nickerson et al. [16], the concepts of typology, framework, taxonomy, and classification are interchangeably. For our purposes, in this paper we used the term classification.
A classification of objects is a basic mechanism for organizing knowledge [17]. Some benefits are analyzing and understanding complex domains [16], providing structure and organization of knowledge in a field [18], implementing an order that allows establishing relationships between concepts [19], and understanding the differences between previous research works [20].
The study and development of chatbots is a topic of great importance and popularity today, and it seems unlikely that this trend will change in the coming years. That is why it is essential to identify the characteristics and classifications of proposals in the area, especially in the educational domain, as it is a fertile field in which artificial intelligence can have a significant impact. However, to do so, it is necessary to have solid methodologies that allow gathering a representative sample of the proposals in question, and it was thanks to the Nickerson et al. [16] method and the Systematic Mapping Study (SMS) [21] that we were able to achieve it.
The originality of our work lies in the classification per se because educational chatbots classifications are practically non-existent. The few examples found in state of the art are classifications that have been carried out with ad hoc or trivial methodologies, which leads to possible biased results and very poor reproducibility.
We hope that the dimensions we identify will be of great value to specialists in the area, as they can serve as a guide to improve implementations of educational chatbots and for researchers who want to enter the field to obtain an overview of the work carried out.
This article is structured as follows. After presenting in Section 2 the related work, Section 3 exposes the iterative development of the adopted classification method. Then, in Section 4, we detail the resulting classification of chatbots, focusing on its dimensions and characteristics. In Section 5, we discuss our results. Finally, Section 6 presents conclusions and future work.
Related work
Chen et al. [22] classify chatbots into two main groups: task-oriented and non-task-oriented. The former chatbots help the user to complete a task, e.g., search for a product or have a short conversation in a closed domain. The latter chatbots interact with the user following a question/answer approach for ludic purposes, usually in an open domain.
Nuruzzaman and Hussain [23] divide chatbot applications into four categories: goal-based, knowledge-based, service-based, and response generated-based. Goal-based chatbots are designed for a specific task and a short conversation, e.g., answer/question approach or problem solving for customers in a website. Knowledge-based chatbots are intended for both open and closed domains, e.g., answer to general and particular topics, respectively. Service-based chatbots provide customers with facilities, e.g., orders to food stores. Response generated-based chatbots focus on how to answer user questions, i.e., a prioritized answer is returned, and it is chosen depending on what is established in the model’s policy.
Gnewuch et al. [24] propose a bidimensional classification: primary mode of communication and context. The former dimension indicates the modality of interaction with the chatbot, i.e., text or voice. The latter dimension focuses on a specific domain or a conversation topic with its users.
Ramesh et al. [25] suggest a six groups classification: retrieval-based, generative-based, shor-text conversation, long-text conversation, open domain, and closed domain. Retrieval-based chatbots have a set of predetermined answers and to give an appropriate one they make use of heuristics. Generative-based model avoids depending on predefined answers to generate new answers. Shor-text conversation refers to a succinct answer when the chatbot receives a specific question. Long-text conversation means that the chatbot can have a lasting chat. Open domain refers to the possibility in a discussion to switch between different domains. Closed domain indicates a specific knowledge, therefore, an appropriate response is desirable.
Diederich et al. [26] present a taxonomy of eleven-dimensions for chatbot: communication mode, context, language, intelligence, implementation, hosting, pricing model, reporting, sentiment detection, enterprise integration, and platform integration. The communication mode and context are extends of Gnewuch et al. [24] classification. Language indicates whether the chatbot can support one o more languages. Intelligence refers to whether the chatbot is based on rules, such as pattern matching or self-learning skills. Implementation indicates the technology used for the development of the chatbot. Hosting is the platform where the chatbot is deployed. Pricing model indicates the price to be payed for using the platform, e.g., Microsoft Azure Bot has a cost depending on the number of interactions and Dialogflow has a limited free version. Reporting considers whether the platform has a monitor to know the details of interactions, users, and number of conversations. Sentiment detection indicates whether the platform supports the detection of user sentiments in a conversation. Finally, enterprise integration means that the platform offers an APIs (Application Programming Interface) or pre-built interfaces.
Nimavat and Champaneria [27] classify chatbots in four groups: knowledge domain, service provided, goals, and input proccesing and response generation method. Knowledge domain refers to whether the chatbot can interact in an open or closed domain. Service provided is based on a proxemics subclassification: interpersonal, intrapersonal, and interagent. Goals involves a subclassification based on a primary goal: informative, chatbot based/conversational, and task based. Input processing and response generation method indicates the procedures to handle inputs and generate answers.
Hussain et al. [28] establish the following groups for their classification: interaction mode (text-based or voice/speech-based), chatbot application (task-oriented or non-task-oriented), rule-based or AI (machine learning, deep learning or templates), and domain (specific or open).
Adamopoulou and Moussiades [29] propose seven categories of chatbots. The first four categories are the same as presented by Nimavat and Champaneria [27]. The human-aid category refers to the need of flexibility, so the operations are carried out by the chatbot with human intervention. Permissions considers whether the chatbot is open source or commercial. Communication channel depends on the interaction modality: text, voice, or image.
Quiroga Pérez et al. [30] mention a classification of educational chatbots with two categories: service-oriented and teaching-oriented. The former chatbots are focused on service support, such as, FAQs. The latter chatbots are divided into formal and informal ones.
As can be seen, there are relatively few chatbot classifications, with most works presented as surveys [22, 28–30]. This is a limitation because those classifications did not present in detail the process to get them.
On the other hand, some authors use the method proposed by Nickerson et al. [16] for the development of their classification, e.g., Diederich et al. [26], Feine et al. [31], Janssen et al. [32, 33], and Bittner et al. [34]. However, these are not educational chatbot classifications.
It is important to mention that educational chatbot classifications are almost nonexistent, except for the work by Quiroga Pérez et al. [30], that classify educational chatbots in two categories, as mentioned before. However, they do not give details of their classification process.
Iterative classification process
In this section, Nickerson et al. [16] method is presented in a general way, highlighting important theoretical elements. Then, we show the development of the step-by-step method to obtain classification of educational chatbots.
General theoretical elements
For the development of our classification, one of the most important points was the choice of Nickerson et al. [16] method. We consider that this method is adequate to formalize the process because it is flexible and has a solid foundation that is established by objective and subjective ending conditions, steps, and iterations to obtain a concrete result. Nevertheless, it can be confusing when used for the first time because, according to Nickerson et al. [16], the subjective conditions are difficult to identify and apply, and evaluating them requires the insight, experience, and skills of researchers.
The method uses terms as: dimensions and characteristics. Dimensions can be viewed as variables and characteristics as instances of variables.
Fig. 1 presents the seven-step method of Nickerson et al. [16]: (1) identifying the meta-characteristic, which refers to the purpose of the classification; (2) determining ending conditions, which serve as a guide to stop the classification process; (3) selecting an approach: empirical-to-conceptual (new objects are examined to determine whether features are sufficient or new features and possibly new dimensions are needed) or conceptual-to-empirical (it begins by conceptualizing dimensions without examining real objects); depending on the selected approach steps 4-6, may change: (4e) identifying subset of objects by recognizing the characteristics from a systematic sample; (5e) identifying common characteristics and grouping objects; (6e) grouping characteristics into dimensions to create a taxonomy; or (4c) conceptualizing characteristics and dimensions of objects; (5c) examining objects for these characteristics and dimensions; (6c) creating a taxonomy; and (7) asking whether the ending conditions have been met.

Nickerson et al. [16] seven-step method.
There are eight possible objective ending conditions: All objects, or a representative sample, have been examined. No object was merged with a similar object or splitted into multiple objects in the last iteration. At least one object corresponds to each characteristic of every dimension. No new characteristics or dimensions were added in the last iteration. No characteristics or dimensions were merged or splitted in the last iteration. Every dimension is unique. Every characteristic is unique within its dimension. Each cell (combination of characteristics) is unique.
As for the subjective ending conditions, there are five possibilities: Concise: it indicates the number of dimensions that allow the classification to be established. Robust: it means that the characteristics and dimensions identified are enough to differentiate the objects of interest. Comprehensive: it indicates that all analyzed objects can be classified within the domain under consideration or the classification includes all dimensions of the objects of interest. Extendible: it refers to the possibility of adding new characteristics and new dimensions. Explanatory: it provides a useful explanation of the objects of study.
It is worth mentioning that step 2 is determined by the objective and subjective ending conditions; once they are satisfied, the iteration process stops. To get to the end of the method, it is necessary to check that the conditions are met; otherwise a new iteration must be started from step 3. The method is flexible, since not all objective and subjective ending conditions have to be fulfilled, i.e., it is possible to choose from those conditions.
A classification may not be perfect but it is useful if we want to explain the nature of the objects in an study [16].
Table 1 presents the number of iterations that we got and how the objective and subjective conditions were met. We start by showing the development of the first iteration.
Detail of the iterations with respect to the objective and subjective ending conditions [16]
Detail of the iterations with respect to the objective and subjective ending conditions [16]
The research question (RQ) is What kinds of chatbots are in the education domain?
The inclusion, exclusion and selection criteria are needed.
We define the following inclusion criteria are: a) the proposal was published from 2006 onwards; b) the work contains relevant terms in the title, abstract, or keywords; and c) the proposal is focused on educational chatbots.
As for the exclusion criteria we have: a) the work is duplicated; b) the proposal does not satisfy the RQ; c) the work is not from a journal, conference or workshop; d) the proposal is not written in English or Spanish language; and e) the work is a survey or a systematic review.
Our selection criteria are: a) apply inclusion and exclusion criteria in documents; b) in the remaining proposals, read the conclusion to look for properties that we need and that were not identified in the abstract; and c) obtain a refined list of articles and proceed to read them taking into account the inclusion and exclusion criteria, but now on the main body of the paper.
The search expression for the RQ is:
We use the ACM Digital Library, IEEE Xplore, Science Direct, Scopus, and Springer Link search databases to obtain our sample of proposals.
In the first search we got 1703 results. Applying the inclusion, exclusion, and selection criteria, the number was reduced to a total of 45 papers:
In the second iteration, we also select the empirical-to-conceptual approach for the same reason as the first iteration. Continuing the steps 4 to 6, we identify the Teacher dimension. To make the process concise and comprehensive, we group and rename the Student dimension as Student/Teacher-Orientation, and it has the following new characteristics: Reports, Support, and Topics. Furthermore, we identify a new characteristic, Procedures, for the School Service-Oriented dimension. The ending conditions have not been met, because a new dimension has been created in this iteration. Therefore, a new iteration is necessary.
In the third iteration, we also select the empirical-to-conceptual approach and no new dimension and characteristics have been found in subsequent steps. Thus, the classification method does not need a new iteration, because the objectives and subjectives ending condition were met.
After applying the Nickerson et al. method, we found the three dimensions and fifteen characteristics, which are presented in Fig. 2.

Structure of the chatbots classification and their characteristics in education domain.
The School Service-Oriented dimension groups chatbots whose main function is to provide general information, FAQs, schedules, or procedures. This type of chatbots is useful for educational institutions, as they can provide a complete service to both the community and external users that need more information about fees, or educational offer. The main advantages of this type of chatbots are 24/7 availability, reduction of workload for the staff, handling many students at the same time, and accessibility from a mobile device. The characteristics of this dimension are: Information: the chatbot provides users with information, e.g., educational offer, directory, and study plans. FAQ: questions and answers that are common for users. This characteristic is also found in the Student/Teacher-Oriented dimension. Procedures: a guide for students to carry out a specific procedure, e.g., how to enroll in a class or what is needed to get certified. This characteristic is also found in the Student/Teacher-Oriented dimension. Schedule: specific information on activities, e.g., academic events and evaluations.
The e-Learning-Oriented dimension covers chatbots that were designed for MOOCs (Massive Open Online Course), LCMS (Learning Content Management Systems), and education-oriented software, e.g., Moodle, that students use remotely. This type of chatbots are a complement of massive courses that do not necessarily belong to a formal academic environment, e.g., classes of English as a foreign language. The characteristics of this dimension are: Courses: when a chatbot is an ad hoc implementation where users can enroll remotely, e.g., massive courses on a foreign language learning. LCMS: those chatbots that are designed to be integrated into an existing MOOC platform. Software: chatbots that are one more tool in a specialized system and can generally be accessed through the Web.
The Student/Teacher-Oriented dimension groups chatbots that not only interact with students, but also with teachers. We found several elements as: evaluation, FAQs, feedback, Q&A, reports, schedule, subjects, suppport, topics, and tutorships. The characteristics of this dimension are: Evaluation: assessment tools for students, e.g., exams, homework, quizzes, practices, and essays. Feedback: the chatbot provides feedback to students according to their progress in class. Q&A: it has specific questions and concrete answers to the student. Reports: details provided by the chatbot to the teacher about the progress of their students. Subjects: interacting with the student about the classes they have registered. Support: it offers some kind of support to the student, e.g., how to connect an electronic device to the laboratory network. Topics: answering questions on specific topics, e.g., complexity of a sorting algorithm. Tutorships: it offers students some form of educational or personal orientation.
Table 2 represents the relative and absolute frequencies of the characteristics that we identified in our classification. From the 45 papers analyzed, 24.44% belongs to School Service-Oriented, 22.22% to e-Learning-Oriented, and 53.34% to Student/Teacher-Oriented (see Fig. 3). As each paper can be identified with more than one characteristic, the number of characteristic incidences was 89. The Papers column represents the percentage of characteristics based on the number of papers in a given dimension (relative) and the total number of papers (absolute). Similarly, the Incidence column was calculated using the number of incidences in a given dimension (relative) and the total number of incidences (absolute).
Relative and absolute frequencies of the characteristics among 45 educative chatbots papers and 89 characteristic incidences

Education-focused chatbot classification.
As can be seen in Section 2, most of the works followed an ad hoc methodology, which prevents its use in other domains (e.g., educational chatbots) and makes it more challenging to replicate the results they obtained. Instead, the Nickerson et al. [16] method establishes general elements that facilitate its use for different domains. Thus, our proposal has the advantage of reproducibility, as the research question, search expression, inclusion, exclusion, and selection criteria are well established and laid in sound foundations. Furthermore, choosing between empirical/inductive and conceptual/inductive approaches gives us flexibility and versatility to develop our classification.
From our proposal, three groups emerge that classify chatbots according to their characteristics. We not only show how these groups were reached but also the particularities of each one. This is an advance over other works with much less detailed groups, such as the one proposed by Quiroga Pérez et al. [30]. Unlike those proposals that follow arbitrary or ad hoc methods, our classification allowed us to represent many more chatbots in a characteristic group without the need to create small clusters that can be trivial. That is because we used a systematic process [16].
Regarding the three classifications that resulted from our iterations (see Fig. 2), we can mention that they are related according to the granularity of the information that the chatbot can send and receive. The classification that contains the chatbots that are oriented to more general tasks and therefore handle basic information is found in School Service-Oriented since the target users usually include a general public looking for rudimentary information. On the contrary, the most specialized chatbots are classified as Student/Teacher-Oriented. They fill the need for more personalized interactions and carry out much more precise tasks, since they are aimed at users as individuals. The chatbots within e-Learning-Oriented are generally somewhere in the middle. Although they also serve a broad population, their use context is more limited.
In the classification of the works shown in Fig. 3, we can observe that there is a clear trend towards the development and study of chatbots that focus on students and teachers. This may be due to the way these chatbots interact with people. As we already mentioned, the need to provide more personalized attention to specific users in specific contexts opens up broader possibilities for research, since problems arise with the capabilities inherent to artificial intelligence and those related to software engineering, usability, and user experience.
According to Nimavat and Champaneria [27] and Adamopoulou and Moussiades [29], it is possible to have multiple categories for a chatbot, e.g., both
We can analyze the implications of our clasification from two points of view. The first benefits developers, since we identify the basic interaction mechanisms they must implement according to their development context. For example, if they develop a chatbot for a high school class, which both students and teachers will use, then our Student/Teacher-Oriented group is relevant because it tells them that they have to create mechanisms that satisfy the characteristics of that classification.
On the other hand, our proposal also impacts research groups since we create a base for more research on the classification of education-oriented chatbots. From our search expression, inclusion, exclusion, and selection criteria, as well as our approach and ending conditions, researchers may find more features groups in years to come.
A limitation of the Nickerson et al. [16] method is that it is qualitative, so it is not possible to carry out a formal analysis in a quantitative way. However, from the analysis that we carried out in Table 2, we can say that the most frequent characteristics of our sample of works were those of Information, Evaluation, Feedback, Subjects, Topics, and Q&A. This is fascinating because it represents a mixture of more automation-oriented functions with those that are much more advanced and require the chatbot to participate in the teaching/learning process actively.
On the other hand, it seems that there is a tendency for research in these rudimentary functions to disappear since the least frequent features were Schedule, Procedures, Reports, FAQ, and Support. This may be since new technologies have solved all the problems in these functionalities. Other rare features were Courses, LCMS, Software, and Tutorships. We attribute this to the sheer cost and complexity of these systems.
We know that this quantitative analysis does not replace a formal method, representing our proposal’s limitation. To remedy this, we plan in the future to explore quantitative tools such as those offered by Likert scales [74], the framework of Szopinski et al. [75], the Fuzzy Comprehensive Evaluation [76], and the Analytic Hierarchy Process [77].
Conclusion and future work
We develop an educational chatbot classification using the Nickerson et al. [16] method. Through a series of iterations, we identified three dimensions: School Service-Oriented, e-Learning-Oriented, and Student/Teacher-Oriented. Furthermore, the SMS allowed us to cover a significant sample of papers for our proposal, giving us an advantage over other classifications that used ad hoc or trivial methods. In this way, we identify a tendency, as the most prevalent dimension was Student/Teacher-Oriented.
The passive role of chatbots as virtual assistants is being left behind. Conversational systems have taken a more active role in current education thanks to technological advances. Today, some applications help students through immediately graded exercises and receive adequate feedback in case of doubts or errors. On the other hand, teachers also take advantage of these tools to discover their students’ progress, which is essential since chatbots are not intended to replace educators in the classroom but rather are a tool for them to provide more specialized education for their students. Another facet that chatbots can offer is an early identification tool since, based on the system reports, pedagogues and psychologists could identify emerging problems.
As for future work, we plan to create a new classification method that reduces uncertainties, because as we already mentioned, the Nickerson et al. [16] method may have a problematic entry barrier for inexperienced researchers. We found a knowledge gap in this aspect, since a much more precise methodology is needed to deal with collecting, identifying, and organizing works under qualitative and quantitative perspectives.
Footnotes
Acknowledgement
We thank CONACyT (Consejo Nacional de Ciencia y Tecnología) for funding José Fidel Urquiza Yllescas’s doctoral fellowship. Scholarship number: 331560.
The work described in this paper was funded by “Fondo SEP-CINVESTAV de Apoyo a la Investigación (Call 2018).” Number of project 120 titled “Desarrollo de un chatbot inteligente para asistir el proceso de enseñanza/aprendizaje en temas educativos y tecnológicos.”
José Fidel Urquiza-Yllescas thanks IEMS-CDMX for giving him the opportunity to continue his doctoral studies.
