Investigating the role of chatbot-based language tutors utilizing deep learning to facilitate English language acquisition in mobile applications

Abstract

In recent years, advancements in artificial intelligence (AI) and natural language processing (NLP) have significantly changed the landscape of education. Among the most promising developments is the emergence of chatbot-based language tutors, which leverage AI to offer personalized and interactive language learning experiences. These tutors can assist learners in mastering vocabulary, grammar, pronunciation, and conversation skills across various languages. This research examines the role of chatbot-based language tutors utilizing deep learning (DL) to facilitate English language acquisition in mobile applications. Intent categorization is a fundamental component of these systems, allowing chatbots to understand user questions and respond appropriately. To address related issues, the research created a proofreading chatbot designed to help academic authors with grammatical corrections. Data was collected from a publicly available chatbot-based English learning dataset. The data was preprocessed using stop word removal and tokenization. Term Frequency Inverse Document Frequency (TF-IDF) is utilized to extract features from the preprocessed data. Efficient pigeon inspired fused bidirectional long short-term memory (EPI-BiLSTM) is applied to classify the intent based on the text to determine the user’s intent. After the classification, to address data scarcity in grammatical error correction for the English language, back translation is employed as a data augmentation tool. Back translation involves translating error-prone sentences into a different language and then translating them back to the original language, generating parallel corpora with their corrected counterparts, derived from texts. The experimental results demonstrated that EPI-BiLSTM outperforms traditional algorithms based on domain (80.5%), intent (90.3%), entity (75.2%), and average accuracy (81.3%). These findings illustrate the potential of combining chatbot-based systems and DL techniques to address both proofreading and grammatical error correction challenges in mobile applications.

Keywords

English language acquisition mobile applications chatbot term frequency inverse document frequency (TF-IDF)efficient pigeon inspired fused bidirectional long short-term memory (EPI-BiLSTM)

Introduction

The rapid advancements in mobile technologies have transformed language learning. Mobile technology allows learners to access resources, tools, and a platform that offers more convenience for acquiring a target language.¹ Among the revolutionary advances in this regard is a language learning module based on integrated artificial intelligence (AI) with deep learning (DL) approaches for creating intelligent, interactive language tutors. Language learning chatbots using DL algorithms serve as excellent trainers, providing a tailor-made, adaptive learning experience addressing the specific requirements and preferences of each learner in the English language acquisition process.² These learning chatbots provide an interactive experience for learners integrated into mobile apps, allowing students to practice live communication, grammar, pronunciation, and vocabulary exercises.³ Advances in machine learning (ML) and natural language processing (NLP) enable chatbots to comprehend, interpret, and respond to user input in a conversational manner, which has brought them to prominence in language learning.⁴ DL is a subset of ML that enhances the accuracy and performance of chatbots in language learning applications. DL uses large amounts of data and complex algorithms to recognize complex linguistic patterns, adapt to individual learning styles, and provide individualized feedback.⁵

Another crucial benefit of language tutors in chatbots is the ability to make the system accessible to learners 24/7. Chatbots do not rely on a physical classroom, as students only access the human instructor within certain hours; whereas in mobile applications, chatbots are always available and ready to use whenever needed for practice. The immediate feedback given by these AI-driven tutors helps learners identify their mistakes and improve quickly, fostering a more effective and continuous learning process.⁶ Besides convenience, chatbot-based language tutors can offer learners the opportunity to practice conversational English in a low-pressure environment. Constructive practice is necessary for learning. Due to time and location constraints, students cannot attend traditional classes or get enough help from teachers after class. For non-native English speakers, this is much more difficult, and there is a domain barrier. Chatbots mimic real-life conversations, allowing learners to feel more confident in their language skills.⁷

Speech recognition technology adds to these interactions. DL algorithms learn and improve with experience. Data from learner languages, errors, preferences, and progress are analyzed to refine the chatbot’s responses and teaching strategy.⁸ The chatbot becomes more accurate at predicting what the learner needs and provides specific exercises to meet their learning challenges. This data-driven learning method allows a chatbot to present customized material, such as exercises for vocabulary and grammar, and interactive dialogues. This means learners receive materials that fit their skill and goal level in learning.⁹ This advancement of using DL in language-based tutors using chatbots marks the beginning of important developments in language acquisition. These AI-driven tutors offer innovative solutions for English language learners through personalized, interactive, and adaptable learning experiences. The flexibility to practice in a judgment-free environment, combined with the scalability and continuous improvement of DL models, makes chatbot-based language tutors a powerful tool in the ongoing evolution of language education in mobile applications.¹⁰

The primary objective is to investigate the role of chatbot-based language tutors utilizing DL techniques to facilitate English language acquisition in mobile applications. The research focuses on developing a chatbot system that aids in grammatical error correction, enhances vocabulary, and improves conversational skills through personalized interactions. The research also explores the integration of advanced AI models to address the challenges of intent categorization and data scarcity in language learning environments. The key contributions of the research are listed as follows.

(1) To introduce the EPI-BiLSTM model that improves intent classification accuracy in chatbot-based English language tutors.

(2) To collect data from a publicly available Chatbot-based English learning dataset and utilize tokenization and stop word removal for preprocessing.

(3) To employ back translation as a data augmentation technique, addressing data scarcity and enhancing chatbot-based proofreading capabilities in mobile applications.

(4) To use the Term Frequency Inverse Document Frequency (TF-IDF) for feature extraction.

Related works

Recent advancements in chatbot technology have demonstrated significant potential in enhancing language acquisition. Assayed et al.¹¹ developed HSchatbot, a system designed to classify intents from questions posed by high school students. Their approach employed Random Forest (RF) and Multinomial Naïve-Bayes classifiers, with RF achieving over 90% accuracy, outperforming the Multinomial Naïve-Bayes model. Notably, the study revealed that Multinomial Naïve-Bayes performed better with TF-IDF vectorization than with CountVectorizers. Dongbo et al.¹² proposed an AI-driven method for improving chatbot comprehension of customer inquiries. Their framework combined a Bidirectional Recurrent Neural Network (BRNN) with a Fuzzy Naïve-Bayes (FNB) classifier (BRNN-FNB) to generate real-time responses. The integration of sentiment analysis enhanced interaction precision, particularly for voice-based chatbots. The model achieved 92% accuracy without sequence-to-sequence (Seq2Seq) architecture and 93% with it, demonstrating applicability in digital marketing, education, and online forums. Addressing pandemic-induced communication challenges, Yang et al.¹³ developed DR-COVID, a multilingual NLP-based chatbot for COVID-19 information dissemination. Tested across multiple languages, DR-COVID exhibited high accuracy and rapid response times (1.12–2.15 seconds across devices), outperforming existing chatbots in both speed and precision. For Turkish misspelled word detection, Aytan and Şakar¹⁴ integrated a false-positive reduction model into a two-step deep learning framework. Their approach evaluated syllable-based, character-based, and byte-pair encoding tokenizers using LSTM and Bi-LSTM architectures. The multi-class dataset revealed that tokenization strategies significantly influenced error correction efficacy.

Despite advancements, Sayenju et al.¹⁵ highlighted persistent biases in NLP models such as BERT (Bidirectional Encoder Representations from Transformers). These biases arise from discrepancies between training corpora and domain-specific chatbot inputs, potentially compromising model generalizability in language learning contexts. Hew et al.¹⁶ investigated chatbots in online education through two studies. In Study 1, a SMART goal-setting chatbot guided learners in structured exercises, while Study 2 employed a “learning buddy” chatbot for EFL listening practice. Both trials reported positive learner experiences, emphasizing chatbots’ usability and pedagogical value in fostering engagement. Leveraging mobile sensor networks, Jingning¹⁷ enhanced speech recognition in an intelligent English learning system. By preprocessing voice signals and optimizing feature extraction, the system achieved robust performance against background noise and speaker variability, surpassing traditional methods in recognition accuracy. Imran et al.¹⁸ utilized a convolutional neural network (CNN) to classify English words into nine grammatical categories (e.g., nouns, verbs, adjectives). The model achieved 97.22% overall accuracy, with perfect classification for pronouns, determiners, verbs, adverbs, and prepositions. Chi-square tests validated its utility for non-native speakers, establishing benchmarks for automated grammatical analysis. Wang¹⁹ analyzed Duolingo, an AI-driven language tutoring system, in a study involving 125 students. The platform’s adaptive exercises and spaced repetition mechanisms improved vocabulary retention and grammar mastery, underscoring conversational AI’s role in scalable language education. Yang et al.²⁰ evaluated Ellie, a task-based speech chatbot for Korean EFL learners. Participants (n = 31, ages 10–15) completed three speaking tasks, achieving an 88.3% success rate. The chatbot facilitated dialogue-driven practice, addressing a critical gap in conventional EFL instruction. Chien et al.²¹ implemented LINE ChatBot to improve English speaking/listening skills among 73 students. While learning gains were modest, extrinsic motivation increased significantly during anonymous interactions. Incorporating competitive elements further boosted intrinsic motivation, highlighting chatbots’ potential for engagement. Albornoz-De Luise et al.²² developed a Rasa-based conversational agent for hypergraph problem-solving. The system achieved F1-scores of 0.965 (intent recognition) and 0.989 (entity extraction), demonstrating robust performance in natural language interaction for tutoring systems. Hsu et al.²³ designed TPBOT, a TOEIC-focused chatbot to reduce EFL learners’ speaking anxiety. Chinese students with TOEIC^® oral scores <100 reported improved confidence and satisfaction, with teachers endorsing its efficacy in enhancing oral proficiency. Rizou et al.²⁴ created a multilingual customer service chatbot using BiLSTM and Conditional Random Fields. Evaluated on the UniWay dataset, the model effectively processed user queries in Greek and English, showcasing cross-linguistic adaptability.

While AI-powered chatbots show promise in language acquisition, critical limitations persist in mobile learning applications. First, reliance on large labeled datasets^11,14,18 exacerbates data scarcity and impedes generalization across diverse learner contexts. Although models like CNNs¹⁸ and BiLSTMs²⁴ excel in intent classification and error detection, they struggle with real-time grammatical correction for non-native speakers.^14,20 Traditional approaches, including Seq2Seq architectures,¹² inadequately address the dynamism of conversational inputs, limiting pedagogical effectiveness. Furthermore, biases in domain-specific training data¹⁵ risk undermining model accuracy. To address these gaps, our study proposes an EPI-BiLSTM model for intent classification, augmented by back-translation techniques. This hybrid framework enhances grammatical feedback and enables context-aware language support, optimizing mobile English acquisition.

Methodology

This section discusses all the methodologies necessary to enhance a chatbot with language learning proficiency, including dataset preparation, data augmentation, feature extraction, and other model optimization. These processes collaborate to improve a chatbot system’s ability toward intent categorization and grammatical error correction to enhance its basic English learning tool. Figure 1 shows how the methodology unfolds.

Figure 1.

Methodology flow.

Data set

The Chatbot-Based English Learning Dataset²⁵ is an open-source dataset that can be obtained from Kaggle and is intended to support the use of language learning research, including AI. This dataset provides structured data for grammar error correction and intent categorization, which aids in the chatbot-based course of English tutoring. It has 200 rows comprising a language learning user inquiry, the chatbot intent categorization that corresponds to it, a wrongly typed sentence, and the AI-generated correct version of the sentence. Training chatbot models on this dataset can be beneficial for improving mobile applications’ grammar-checking and interactive language support capabilities. Table 1 gives the features and description of the dataset.

Table 1.

Data set structure of chatbot-based English learning.

Features	Description and examples
Sentence	User queries for intent classification (e.g., “Can you check my grammar?”)
Intent	Categorized chatbot responses (e.g., Grammar_Check and vocabulary assistance)
Incorrect sentence	Common grammatical errors in English writing
Corrected sentence	AI-corrected versions of the incorrect sentences

Preprocessing

Techniques like tokenization and stop word removal are major to make the input text even simpler so that the model can focus on real issues. All these preparation techniques enhance the capability of a chatbot in terms of understanding and processing the capacity of user input with reduced noise and content, making it more sorted. The process of tokenization and removing stop words are illustrated in Figure 2.

Figure 2.

Preprocessing techniques process.

Tokenization

This is another necessary preprocessing step that divides text into smaller pieces, usually words or sub words. Tokenization enables the structured analysis of text, and with tokenization, the chatbot can easily identify grammatical patterns, intent, and mistakes in user-generated utterances. This method breaks up the text into tokens, which improves user interaction and learning results. Besides, tokenization supports additional operations such as word embeddings and syntactic parsing that are crucial for DL-based language models.

Removing stop words

Effective textual preprocessing is fundamental to the improvement of the effectiveness of the language tutors based on chatbots in supporting the learning of English as a second language. One of the basic preprocessing steps involves eliminating the frequently used words in a language, which could include the usual “the,” “is,” and “in,” with no useful information content for the analysis. These words would ensure the model can focus on more important phrases that support the intent-categorizing and grammar correction processes. This stage minimizes the number of irrelevant characteristics the DL model needs to process; this is not only reducing noise in the dataset but also maximizing computational resources.

Data augmentation using back translation

To address data scarcity in grammatical error correction, we employ back translation as a data augmentation technique. This process involves selecting specific segments within the original erroneous sentence—primarily those with three or more tokens that are likely to contain grammatical errors. These segments are first translated into an intermediate language (e.g., Chinese, French, or Spanish) using a high-quality neural machine translation (NMT) model. Subsequently, the translated segment is re-translated back into English using the same or another NMT model. This two-step translation process produces paraphrased versions of the original segment, often with improved grammatical correctness and varied phrasing. The resulting paraphrased sentences serve as augmented data, providing the model with more diverse examples of correct and erroneous sentence structures. By focusing on context surrounding potential errors, back translation helps generate parallel corpora with both incorrect and corrected sentences, enabling the chatbot to learn more robust grammatical correction strategies.

By adding grammatically correct sentence variations to the data, it helps the chatbot-based system learn how to fix grammatical errors better. Back translation does better in mobile applications designed to assist in learning English by increasing the training set, which increases the strength of the model’s ability to generalize.

Extracting feature using Term Frequency Inverse Document Frequency (TF-IDF)

Chatbot-based language instructors have to extract important features from textual material to classify user intent and assist in grammatical correction. A method of feature extraction named TF-IDF measures the importance of words in a given corpus. To ensure that regularly used terms in a particular context are correctly weighted, the term frequency (TF) calculates how frequently a word emerges in a document about the total number of words. Because this is frequently occurring words that show up in several documents may not be that helpful, thereby using inverse document frequency (IDF). Advanced weight to terms that arise uniquely within a specific text while giving lower weights to those terms that appear very often in a dataset were given.

The combination of TF (equation (1)) and IDF (equation (2)) helps the chatbot to focus more on words that are more relevant to intent classification and grammatical error detection. For instance, in learning the English language, TF-IDF (equation (3)) can help the chatbot distinguish between erroneous sentences and the correct ones according to grammar, thus providing better correction and recommendations. The capacity of a chatbot to evaluate phrase structures, identify mistakes, and give contextually relevant information can be improved by utilizing TF-IDF, which eventually improves the experience of learning by users.

T F = \frac{T o t a l a p p e a r a n c e o f a w o r d i n d o c u m e n t s}{t o t a l w o r d s i n a d o c u m e n t}

(1)

I D F = \log \frac{A l l d o c u m e n t N u m b e r}{D o c u m e n t F r e q u e n c y}

(2)

T F - I D F = T F \times I D F

(3)

Efficient pigeon inspired fused bidirectional long short-term memory (EPI-BiLSTM) to classify the intend based on the text

The proposed method is the integration of Bi-LSTM and EPI. The bidirectional method manages to pick important contexts both in the future and past of text sequences, provided that it uses forward and backward processes. This EPI system has navigation algorithms through swarm intelligence to enhance the correction error of the model. If used together, it enhances the capability of grammar correction in chatbots by taking context and data of previous sequences into account in making more precise language learning.

Bidirectional long short-term memory (BiLSTM)

Grammatical error correction using chatbot-based language tutors, the traditional LSTM cell’s limitation is its inability to process both preceding and subsequent content. To address this limitation, two different LSTM hidden layers with related output in opposite directions were proposed as bidirectional recurrent neural networks. This method uses information from the past and future in the output layer. The bidirectional approach enables the model to utilize both historical and future context, which is particularly beneficial in tasks like grammatical error correction, where both contexts before and after a word or phrase may be crucial for accurate correction.

The input sequence $x = (x_{1}, x_{2}, . . ., x_{s} \dots ., x_{m})$ is calculated in BiLSTM in two directions: forward ( $\overset{⃐}{g_{j}} = (\overset{⃐}{g_{1}}, \overset{⃐}{g_{2}}, \cdot \cdot \cdot, \overset{⃐}{g_{m}})$ and backward ( $\overset{⃐}{g_{s}} = (\overset{⃐}{g_{1}}, \overset{⃐}{g_{2}}, \cdot \cdot \cdot, \overset{⃐}{g_{m}})$ . Both $\overset{⃐}{g_{j}}$ and $\overset{⃐}{g_{s}}$ combine to form the cell’s ultimate output, $y_{t}$ , which has the sequence $Y = (Y_{1}, Y_{2}, . . ., Y_{m})$ . Figure 3 gives the structure of BiLSTM.

Figure 3.

BiLSTM structure.

In deep networks, the choice of activation function (AF) significantly affects training dynamics and performance. For the classification model, Swish, an AF suggested and expressed as $e (y) = y . s i g m o i d (β y)$ , was selected. A leaky rectified linear unit (Leaky ReLU) is added after output gating, and a $\tanh$ AF is incorporated into the cell propagation to solve the generic model’s cell divergence problem. When taken as a whole, these demonstrate the eliminated negative outputs and decreased classification oscillation, as shown in equations (4)–(10):

e_{s} = s w i s h (U_{e} y_{s} + U_{g e} g_{s - 1} + a_{e})

(4)

j_{s} = s w i s h (U_{j} y_{s} + U_{g j} g_{s - 1} + a_{j})

(5)

p_{s} = s w i s h (U_{j} y_{s} + U_{g p} g_{s - 1} + a_{p})

(6)

{\tilde{d}}_{s} = \tanh (U_{d} y_{s} + U_{g d} g_{s - 1} + a_{d})

(7)

d_{s} = e_{s} (d_{s - 1} + {\tilde{d}}_{s} + t a n g)

(8)

g_{s} = 0_{s} * d_{s}

(9)

x_{s} = p_{s} * d_{s} * L e a k y R e L U

where

a (e, j, p, d) a n d U (e, j, p, d)

stand for their bias vectors, and weight matrices, respectively, and

h

stands for hidden value,

e, j, p,

and crepresents the forget, input, output gate, and cell activation vectors, respectively. The memory cell’s input at time

s

is named

y_{s}

, and the current and prior memory cell units in addition to the final output are indicated by the letters

{\tilde{d}}_{s}

d_{s}

, and

x_{s}

, respectively.

The sophisticated AF and gating mechanisms, along with the BiLSTM’s bidirectional nature, enable efficient processing of text sequences for the detection and correction of grammatical errors. The model is better able to recognize and fix grammatical errors in a sentence by taking into account both past and future contexts. This enhances the chatbot-based language tutor’s overall performance in language acquisition and grammatical correction tasks.

Efficient pigeon inspired (EPI) optimization

A swarm intelligence focused method called Pigeon Inspired Optimization (PIO) mimics the collective behavior of homing pigeons searching for their homes using landmarks, the sun, and magnetic fields as navigational aids. This optimization approach can be effectively applied to the grammatical error correction within chatbot-based language tutors. Pigeons first rely mostly on instruments that resemble compass devices to find their direction, althoughmay eventually be used to and continuously correct their course. PIO consists of two operators: the map and compass operator, which uses magnetoreception to sense the field of earth and shape the map in their brains, and the landmark operator, which simulates pigeons searching the path based on landmarks.

The map and compass operator: It is made up of two key components that are updated iteratively in the solution space: location and velocity. Equations (10) and (11) provide the updating formulae.

Z_{j}^{s} = Z_{j}^{s - 1} \times e^{- R T} + r a n d \times (Y_{h} - Y_{j}^{s - 1})

(10)

Y_{j}^{s} = Y_{j}^{s - 1} + Z_{j}^{s - 1}

(11)

where

r a n d

is a random number, and

R

is the compass and map parameter. The velocity is determined by

Z_{j}^{s}

and the position of pigeon

j

at the

s

th iteration is calculated by

Y_{j}^{s}

. The global optimal position is frequently obtained by comparing the positions of every pigeon. In Figure 4, a subfigure (Figure 4) is employed to further explain the map and compass operator model. Let A stand for the best pigeon and B, C, D, and E for the others. Typically, other pigeons will shift their flight paths and fly in the direction of A, meaning they will switch from using thin to thick arrows.

Figure 4.

The models for the two operators of EPI.

This operator mimics how pigeons use landmarks and magnetic fields to navigate, just as the language model uses linguistic structures and contextual awareness to fix grammar mistakes.

Landmark operator: It simulates pigeons searching the path based on landmarks, with some pigeons being familiar with the landmarks and others following familiar pigeons. Equation (12) states that the pigeons that are distant from their goal will be simply thrown out. Equation (13) defines the center of the retaining pigeons. The landmark (the center location) can be used to update the new positions of every pigeon, as shown in equation (14).

{T M}^{s} = \frac{{T M}^{s - 1}}{2}

(12)

Y_{d}^{s} = \frac{\sum_{j = 1}^{{T M}^{s}} Y_{j}^{s} \times e (Y_{j}^{s})}{T M \times \sum_{j = 1}^{{T M}^{s}} (Y_{j}^{s})}

(13)

Y_{J}^{s} = Y_{J}^{s - 1} + r a n d \times Y_{c}^{s} - Y_{c}^{s - 1}

(14)

Pigeon $j$ th position at the $t$ -th generation is illustrated by the objective function value $e (Y_{J}^{s})$ . Each iteration of the landmark operator reduces the population size by half, which causes the remaining pigeons to travel swiftly to their destination. Pigeons inside a circle indicate proximity to the destination, while those outside the circle indicate distance, according to the landmark operator’s methodology.

This operator mimics how pigeons look for landmarks to help them navigate, which is similar to how the chatbot system improves its grammar correction by considering a sentence’s context.

Object-based initialization method

Random initialization is frequently used to initialize the beginning population for meta-heuristic algorithms in the context of improving grammatical error correction for chatbot-based language learning systems. However, given the wide variety of potential grammatical structures, it could be challenging to find an ideal solution space when used for text-based tasks like grammatical error correction. A $q$ -dimensional vector called an object-based initialization approach is employed, in which $q$ objects that correspond to $o$ cluster centroids are chosen at random. A position’s overall number of dimensions is represented by $m = q * o$ , which is the result of concatenating these objects. Figure 5 displays the model of a pigeon’s position.

Figure 5.

Pigeon’s position model.

Using these context-aware clusters to start the population increases the likelihood that the chatbot will find correct grammatical corrections in real-world language learning scenarios by starting its optimization process on a more appropriate collection of error patterns.

Parameter control strategy

EPI may encounter difficulties when used for language tasks in the field of grammatical error repair. In some applications, the current parameter $e^{- R T}$ continuously decreases over time. However, in clustering-based error correction, where linguistic patterns must be more flexible, this is not always the case. To dynamically modify the optimization parameters in response to the chatbot’s corrective performance, a parameter control technique is presented. Specifically suggested to employ an adaptive inertia weight $w$ . By gradually modifying the rate of error corrections, this adaptive inertia enables the chatbot to better maintain context and prevent abrupt changes in its grammatical correction patterns. The inertia is maintained when the gap between the global ideal (the most exact grammatical correction) and the current position (the chatbot’s grammatical state) is small, enabling the chatbot to progressively improve its corrections.

The discrepancy between a pigeon’s present position and the existing global ideal position can be used to assess the quality of its initial velocity. Since the pigeon should maintain more inertia to go in the direction of the initial velocity with high quality, the smaller the difference, the better the quality. The parameter $e^{- R | f Y_{j}^{s - 1}) - f (Y_{h})}$ is displayed in the dataset, with $R$ and a random pigeon is chosen to illustrate the new method for updating the velocity (Equation (15)). An adaptive modification will be made to the value of $e^{- Q | f Y_{j}^{s - 1}) - f (Y_{h})}$ .

Z_{j}^{s} = Z_{j}^{s - 1} \times f^{- Q | e Y_{j}^{s - 1}) - e (Y_{h})} + r a n d \times (Y_{h} - Y_{j}^{s - 1})

(15)

where

f

represents the fitness of the grammatical correction at each step.

Y_{j}^{s - 1}

is the previous state of the grammatical correction.

EPI implementation procedure for grammatical error correction

A chatbot is guided through a series of adjustments by the map and compass operator and landmark operator as part of the EPI optimization process for grammatical error repair. By updating its corrections according to the most recent global best correction and error pattern, the map and compass operator assists the chatbot in navigating through potential corrections. By concentrating on certain corrections, the landmark operator enables the chatbot to modify grammar according to context. Over several cycles, the chatbot’s error correction is progressively improved by combining these operators. Based on the chatbot’s current location in the correction space, the number of function evaluations for the operators is modified. This procedure is repeated until the chatbot achieves a nearly ideal grammatical correction method, utilizing linguistic patterns for dynamic enhancements.

The strengths of EPI and BiLSTM are used in the model EPI-BiLSTM. Using improved activation functions and gating techniques, the BiLSTM enhances chatbot-based language tutors’ grammatical error correction by using both past and future contexts. Imitating the pigeon navigation behavior, EPI improves the correction by fine-tuning corrections with landmark operators and updating global location by map and compass operators. The use of these techniques ensures comprehensive learning and adaptation skills by providing iterative optimization and context-aware changes, improving the grammar correction capabilities of the chatbot. Algorithm 1 provides the process of proposedEPI-BiLSTM method.

Experimental result

This section discusses system configuration, hyperparameters, and detailed experimental results, including accuracy measurements for various features and comparison with traditional models. The effective outcomes proved the superior functions of the EPI-BiLSTM model in various language learning tasks.

System configuration

The system setup used for this research includes 16 GB of RAM and a strong computer with 512 GB of SSD, making sure to enable efficient processing and training of models on the given data. For managing complicated topologies of a neural network or big datasets, the system comprises an Intel Core i7 CPU that offers immense processing power. The environment is built using Python 3.8 and will include the libraries necessary for DL, data processing, and model assessment. These include TensorFlow, Keras, NumPy, and Pandas. It also makes use of NLTK and spaCy for NLP tasks like stopword removal and tokenization. Additionally, the system employs GPU acceleration by NVIDIA GeForce RTX 2060 to hasten training, especially for DL models. Table 2 provides the hyperparameter setting.

Table 2.

Hyperparameter setting of the EPI-BiLSTM.

Hyperparameter	Tuning range	Optimal value
Number of LSTM layer	{1, 2, 3, 4}	2
Hidden unit	{64, 128, 256, 512}	128
AF	{ReLU, swish, Tanh, Leaky ReLU}	Swish
Learning rate	{0.001, 0.005, 0.01}	0.005
Batch size	{16, 32, 64, 128}	32
Dropout rate	{0.2, 0.3, 0.4, 0.5}	0.3
Sequence length	{50, 100, 150, 200}	100
EPI population size	{20, 50, 100, 150}	50
Map and compass factor (R)	{0.1, 0.3, 0.5, 0.7}	0.5
Inertia weight (w)	{0.4, 0.6, 0.8, 1.0	0.6
Maximum iterations	{50, 100, 200, 500}	200

Output phase

Several aspects of the chatbot-based English learning system’s accuracy are displayed in Table 3 and are illustrated in Figure 6. The outcomes indicate the excellent performance of the proposed EPI-BiLSTM for identifying user inquiries and correcting grammatical mistakes. The feature of corrected sentences, which obtained the highest accuracy of 93.1%, shows the performance of the model in the ability to identify and correct errors in users’ sentences. With 92.5% accuracy, sentence recognition takes second place, showing how well the system can interpret user inputs. At an intent classification accuracy of 90.3%, the chatbot is efficient in determining the correct user intent from the input query. At an accuracy of 88.7%, the incorrect sentence repair tool shows how well the algorithm can recognize usual grammatical mistakes and provide suitable corrections.

Table 3.

Feature-wise accuracy of the EPI-BiLSTM.

Features	Accuracy (%)
Sentence	92.5
Intent	90.3
Incorrect sentence	88.7
Corrected sentence	93.1

Figure 6.

Performance accuracy across features.

Comparison phase

A proposed EPI-BiLSTM is compared to the traditional single semi-supervised multi-domain joint model (SEMI-MDJM),²⁶ especially on three tasks.

• Model: Domain classification is essential to be able to guide the chatbot to the correct area of knowledge (e.g., determining if the user is asking about grammar or vocabulary). High accuracy in this domain guarantees that the chatbot will classify inquiries effectively and provide relevant material accordingly.

• Intention: This capability of intent recognition helps the system understand what exactly the user intended to say using the message. The right response can be furnished more rapidly and accurately by the system if it can more profoundly identify the intention of the user. Significant intention accuracy is also required to give meaningful, and customized interaction.

• Entity: The chatbot can understand the content of the query and respond more appropriately by identifying entities. For example, if it identifies a name or location in a question, the chatbot can modify its response accordingly.

• Avg. accuracy: The average accuracy (Avg. accuracy) provides a general notion of how effective the model is at performing a large number of tasks overall. Better average accuracy indicates that the model will have consistent performance over different components and thus, result in quality user interactions with language learning applications.

The EPI-BiLSTM model outperforms the SEMI-MDJM model on all challenges (Table 4 and Figure 7). Specifically, EPI-BiLSTM outperforms SEMI-MDJM in domain classification with an accuracy of 80.5% compared to 76.65%. EPI-BiLSTM achieves an accuracy of 90.3% in intent recognition, which is higher than SEMI-MDJM’s 84.84%. Moreover, the EPI-BiLSTM model has a 75.2% accuracy for the entity recognition, as contrasted with that of SEMI-MDJM with 59.37%; and an overall avg. accuracy achieved by the proposed EPI-BiLSTM for this at 81.3% in comparison with the conventional method SEMI-MDJM being at 73.62%; the comparison validates the efficiency that the EPI-BiLSTM model represents to improve both grammatical error correction and linguistic understanding in application through mobile base language learning tool.

Table 4.

Comparative evaluation of SEMI-MDJM and EPI-BiLSTM in language learning task.

Model	SEMI-MDJM²⁶	EPI-BiLSTM [Proposed]
Domain	76.65%	80.5
Intent	84.84%	90.3
Entity	59.37%	75.2
Avg. accuracy	73.62%	81.3

Figure 7.

Task-wise performance comparison of models.

Discussion

The performance of the chatbot system in language learning applications was enhanced by proposing a more efficient model, EPI-BiLSTM, that overcomes the limitations of the existing model. Despite its efficacy in domain classification, intent identification, and entity extraction, the current SEMI-MDJM model has several drawbacks. For one, much-labeled data is used, which takes a lot of time and resources to gather. SEMI-MDJM performance and scalability are also weakened in practical applications where the data labeling cannot be done with this dependency. Also, SEMI-MDJM fails to find appropriate entities where the queries are complicated or ambiguous, which also results in inferior responses. In addition, because the model cannot adapt to multiple languages or domains in its handling of multi-tasking, the overall model’s accuracy is further restricted in delivering customized solutions. The suggested EPI-BiLSTM model addresses these flaws by integrating a BiLSTM network, which efficiently handles context and enhances the comprehension of questions from specific domains. In addition, through the use of unlabeled conversational data, the EPI-BiLSTM model greatly reduces the need for manually annotated datasets. This enables better scalability and performance, especially in language learning applications where domain classification and intent recognition have improved significantly.

While our proposed EPI-BiLSTM framework demonstrates promising performance in intent classification and grammatical error correction within the English language context, several limitations warrant consideration. First, the reliance on high-quality translation models for back translation may pose challenges when extending to low-resource languages, potentially affecting the quality and diversity of augmented data. Second, the current approach’s scalability to larger datasets or real-time applications requires further validation, as the computational complexity of the EPI optimization and deep learning components may impact deployment efficiency. Third, the model’s adaptability to different learning contexts—such as varying proficiency levels or domain-specific language—may necessitate additional fine-tuning or domain adaptation techniques. Future research should explore multilingual extensions, optimize computational efficiency, and evaluate the approach in diverse educational settings to enhance its applicability and robustness.

Conclusion

The above experiment demonstrates the applicability of combining DL models with chatbot-based language tutors to enhance English acquisition through mobile applications. The research illustrates the need to interpret user inputs accurately, which can be fulfilled by the EPI-BiLSTM model for the classification of intent. A chatbot-based English learning dataset was used, and these data were preprocessed using techniques like stop word removal and tokenization. TF-IDF-based feature extraction enhances the working efficiency of the chatbot even with complex language patterns. A novel application of back translation for data augmentation that helps it enhance its grammatical error correction strategy in creating parallel corpora is also useful for offering accurate language corrections. The experimental results indicate that the EPI-BiLSTM surpasses traditional algorithms with higher accuracy in terms of tasks like domain (80.5%), intent (90.3%), entity (75.2%), and average accuracy (81.3%). This further indicates that combining chatbot-based systems with advanced DL techniques can improve grammatical error correction and language learning experiences to a great extent. It also opens new avenues for the development of intelligent AI-driven language acquisition tools and promises implications for personalized learning and real-time language assistance in mobile applications. Future research will explore expanding its capabilities to add support for further languages and grammar rules.

Footnotes

ORCID iD

Teli Chen

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The authors declare that the data supporting the findings of this study are available within the article. The raw/derived data supporting the findings of this study are available from the corresponding author at request.

References

Opryshko

Novik

Smolinska

, et al. Web-based applications in higher education: revolutionizing language learning in the digital age. Amazon Investig 2024; 13(73): 209–219.

Dai

. Mobile-assisted pronunciation learning with feedback from peers and/or automatic speech recognition: a mixed-methods study. Comput Assist Lang Learn 2023; 36(5-6): 861–884. https://www-tandfonline-com-s.web.bisu.edu.cn/10.1080/09588221.2021.1952272

Zhou

. Real-time feedback and E-learning intelligent entertainment experience in computer English communication based on deep learning. Entertainment Computing 2024; 51: 100752.

Harisha

Suminih

Oktaviana

. The power of chatbots in English language learning: a new age of learning. Lingua 2024; 20(1): 19–31.

Zheng

. The effects of Chatbot use on foreign language reading anxiety and reading performance among Chinese secondary school students. Comput Educ Artif Intell 2024; 7: 100271.

Annamalai

Ab Rashid

Hashmi

, et al. Using chatbots for English language learning in higher education. Comput Educ Artif Intell 2023; 5: 100153.

Lam

Kong

, et al. The design and evaluation of a digital learning-based English chatbot as an online self-learning method. Int J Eng Bus Manag 2023; 15: 18479790231176372.

Deng

Liang

, et al. Using a smartphone-based Chatbot in EFL learners’ oral tasks. Int J Mob Blended Learn (IJMBL) 2022; 14(1): 1–7.

Zhang

Huang

. The impact of chatbots based on large language models on second language vocabulary acquisition. Heliyon 2024; 10(3): e25370.

10.

Zhang

. Integrating chatbot technology into English language learning to enhance student engagement and interactive communication skills. J Comput Methods Sci Eng 2025; 25: 14727978241312992.

11.

Assayed

Shaalan

Alkhatib

. A chatbot intent classifier for supporting high school students. EAI Endorsed Scal Inf Syst 2023; 1: e1.

12.

Dongbo

Miniaoui

Fen

, et al. Intelligent chatbot interaction system capable of sentimental analysis using hybrid machine learning algorithms. Inf Process Manag 2023; 60(5): 103440.

13.

Yang

Lei

, et al. Development and testing of a multi-lingual Natural Language Processing-based deep learning system in 10 languages for COVID-19 pandemic crisis: a multi-center study. Front Public Health 2023; 11: 1063466.

14.

Aytan

ŞAKAR

. Deep learning-based Turkish spelling error detection with a multi-class false positive reduction model. Turk J Electr Eng Comput Sci 2023; 31(3): 581–595.

15.

Sayenju

Aygun

Boardman

, et al. Quantification and mitigation of directional pairwise class confusion bias in a chatbot intent classification model. Int J Semantic Comput (IJSC) 2022; 16(04): 497–520.

16.

Hew

Huang

, et al. Using chatbots to support student goal setting and social presence in fully online activities: learner engagement and perceptions. J Comput High Educ 2023; 35(1): 40–68.

17.

Jingning

. Speech recognition based on mobile sensor networks application in English education intelligent assisted learning system. Measurement: Sensors 2024; 32: 101084.

18.

Imran

Qureshi

, et al. Classification of English words into grammatical notations using deep learning technique. Information 2024; 15(12): 801.

19.

Wang

. AI-driven autonomous interactive English learning language tutoring system. J Comput Methods Sci Eng 2024; 25: 14727978241296719.

20.

Yang

Kim

Lee

, et al. Implementation of an AI chatbot as an English conversation partner in EFL speaking classes. ReCALL 2022; 34(3): 327–343.

21.

Chien

Lai

, et al. Investigation of the influence of artificial intelligence markup language-based LINE ChatBot in contextual English learning. Front Psychol 2022; 13: 785752.

22.

Albornoz-De Luise

Arevalillo-Herráez

Arnau

. On using conversational frameworks to support natural language interaction in intelligent tutoring systems. IEEE Trans Learn Technol 2023; 16(5): 722–735.

23.

Hsu

Chen

. Proposing a task-oriented chatbot system for EFL learners’ speaking practice. Interact Learn Environ 2023; 31(7): 4297–4308.

24.

Rizou

Theofilatos

Paflioti

, et al. Efficient intent classification and entity recognition for university administrative services employing deep learning models. Intell Sys App 2023; 19: 200247.

25.

https://www.kaggle.com/datasets/ziya07/chatbot-based-english-learning-dataset

26.

Uprety

Jeong

. The impact of semi-supervised learning on the performance of intelligent Chatbot system. Comput Mater Continua (CMC) 2022; 71(2): 3937.