Neural networks application based on language features in the classification of complex English textbooks granularity

Abstract

The surge in modern information has led to a significant increase in text complexity. To meet the needs of various fields for effective information extraction, research on text complexity grading urgently is urgently needed. The study uses the Flesh-Kincaid Grade Level (FKGL) model to extract language features, selects English textbooks as training corpus, and introduces the Graph Convolutional Network of Attention Mechanism (GCN_ATT) model of attention mechanism to construct a text complexity grading model. The research results indicated that in the 10-fold crossover experiment, GCN_ATT’s accuracy, recall, and F1 all reached over 88%. Compared to multi class logistic regression models, GCN_ATT’s various performance indicators were approximately 2% to 3% higher. Meanwhile, GCN_ ATT’s F1 standard deviation decreased by 0.7% and 1.78% compared to the other two models. In addition, GCN_ATT’s fluctuation range of recall and accuracy was less than 20%, a decrease of 12% and 18% compared to the ordered multi classification regression model. Explanation based on GCN_ATT’s text complexity grading has higher accuracy and more stable performance, providing an effective method reference for current text complexity grading problems.

Keywords

Neural network text classification complexity grading training corpus feature extraction

1. Introduction

The growth of the Internet era has led to the increasing complexity of text information. As one of the basic tasks in language processing, text classification plays a crucial role in modern information processing. The traditional classification method is to manually classify according to established classification standards, but now it is no longer able to adapt to the rapid growth of information in the information age, and is gradually being phased out. By the 1990s, text classification methods based on machine learning and deep learning emerged and achieved relatively good classification results [1]. These algorithm models consist of two parts: Feature engineering and Classifier. Among them, feature engineering refers to the process of converting text into mathematical form of feature matrices, and classifier mainly includes graph neural algorithm, graph convolutional network (GCN), convolutional neural network, etc. The text classification method of machine learning has improved the classification accuracy to a certain extent, but there are problems such as the lack of representativeness in feature extraction during model construction [2]. In addition, the text information of several commonly used text classification methods is independent of each other, and now many texts have citation relationships between them, which leads to a certain negative impact on the accuracy of text classification. Therefore, the study introduces the Graph Convolutional Network of Attention mechanism (GCN_ATT) model, combined with more representative feature extraction methods and training corpus, and applies it to the establishment of text complexity models [3] to achieve better practical application results in text complexity grading.

2. Related works

With the arrival of the information age, the amount of information has grown rapidly. Text classification has great significance for real life, and domestic and foreign researchers have conducted relevant research on it. Liu Y et al. explored the impact of text complexity on translation difficulty, using the Likert scale to assess cognitive load, and using the NASA-TLX scale to assess. The results showed that the complexity measured by readability and non-literal meaning was consistent with the evaluation results of translation difficulty [4]. To explore the post editing process of neural machine translation paradigm, researchers such as Jia Y evaluated the output quality of NMT systems from English to Chinese, and the results proved that the text fluency and accuracy of NMT translation were better than other systems [5]. Aydın S proposed a method for complex emotional labeling to address the problem of discrete emotion classification caused by emotional videos. This method estimated the phase locking value of the left and right hemispheres by using the long-term and short-term memory network to observe the nine discrete emotions of clustering. The results showed that the classification accuracy of this algorithm was as high as 98% [6]. To explore the longitudinal language development trends of college students, Biber D’s team proposed a corpus-based method for long-term tracking of college student groups and collecting language features of subject writing. The results showed that phrase complexity has important guiding significance in academia [7]. Scholars such as Zhao Y designed an asymmetric denoising method to address the issue of traditional sequences overly relying on the number of parallel sentences. This method added different noises during encoder modeling, and the results showed that it effectively simplified the unsupervised model and enhances the competitiveness of the simplified system [8].

Chen W compared the quality of articles written by two classes of students after participating in collaborative writing activities to explore the benefits of second language collaborative writing activities for individual text writing. The results showed that the personal creative accuracy and quality of students who participated in the activity were better than those who did not participate in the practice [9]. On the basis of morphological complexity, researchers such as Lee JD proposed the morphological complexity index. This index mainly assisted native Italian and non-native English speakers in writing argumentative papers, and the results showed that there was no correlation between morphological complexity and other complexity measures [10]. Adhariani D’s team designed techniques such as reading ease to evaluate the readability of sustainability reports to investigate their readability. The results indicated that this method was beneficial for evaluating the readability of sustainable development reports and was easy to read and understand [11]. Katarya R’s team developed a new product recommendation system, CapsMF, to address the filtering and screening issues of a large amount of online information. The system utilized a new neural network architecture to optimize the text analysis model, and the results showed that the performance of the text analysis model using this system was significantly better than other models [12]. To distinguish the literariness of different novels, scholars such as Van Cranenburg A proposed a machine learning method, which divided the text into blocks and used neural file embedding to create vector space to predict the literary rating of novels. The results showed that this method had strong feasibility [13].

In summary, researchers have proposed many effective text classification techniques for various fields, and also introduced machine learning algorithms to improve the accuracy of text classification. However, the current algorithms are not precise enough for feature extraction in the model, which affects the accuracy of classification. Therefore, the model still needs further optimization.

3. Text complexity granularity classification based on language feature neural network

3.1 Text classification based on GCN

Text classification refers to the process of associating target text with one or more categories based on the characteristics of the text within a pre-defined classification system. The complexity grading of text belongs to the category of text classification, which constructs a classification model by learning language features from corpus of different complexity levels to determine which complexity level the text belongs to [14]. The study introduces classification of GCN based on deep learning algorithms and applies it to text complexity classification. The GCN model is shown in Fig. 1.

Figure 1.

GCN model.

The construction of GCN model for text classification requires the establishment of adjacency matrix and feature matrix. The adjacency matrix represents the weight of edges between each point [15]. The characteristic matrix refers to the node vector matrix representing the attributes of each node. The initialization vector of each node will be used as an identity matrix. Its representation is shown in Eq. (1).

$\displaystyle I=\left\{{\begin{array}[]{ll}0,&\text{where }a\neq b\\ 1,&\text{where }a=b\\ \end{array}}\right.$ (1)

In Eq. (1), $I$ means the identity matrix. $a$ and $b$ represent the node. In the GCN model, the text used for training will be combined with processed words to construct a graph structure of all nodes in the data. At the same time, an edge will be connected between words in the same sliding window and between all words contained in the text, which forms the graph structure data of the model and is represented by the adjacency matrix. The definition of elements of adjacency matrix is denoted in Eq. (2).

$\displaystyle A_{ab}=\left\{{\begin{array}[]{l}1,a=b\\ 0,\text{otherwise}\\ \textit{PMI}({a,b}),a,b\text{ are words},\textit{PMI}({a,b})>0\\ \textit{TF-IDF}_{ab},a\text{ is document},j\text{ is word}\\ \end{array}}\right.$ (2)

In Eq. (2), $A_{ab}$ stands for an element in the adjacency matrix. When $a$ and $b$ are words, Pointwise Mutual Information (PMI) is used to indicate the correlation between them. When $a$ is a document and $b$ is a word, the correlation between text and words is measured using word frequency inverse of text frequency [16]. When $a=b$ , the weight of the node itself is 1. The element values in other cases are represented by 0. Among them, the PMI algorithm is indicated in Eq. (3).

$\displaystyle\textit{PMI}({x,y})=\log\frac{p({x,y})}{p(y)p(x)}$ (3)

In Eq. (3), $x, y$ refer to the words in the corpus. $p({x,y})$ and $p(x)$ are calculated as shown in Eq. (4).

$\displaystyle\left\{{{\begin{array}[]{l}{p({x,y})=\frac{E_{w}({x,y})}{S_{w}}}% \\ {p(x)=\frac{E_{w}(x)}{S_{w}}}\\ \end{array}}}\right.$ (4)

In Eq. (4), $E_{w}(x)$ refers to the number of sliding windows for words, and $E_{w}({x,y})$ represents the amount of sliding windows containing words $x$ and $y$ . Among them, the higher the PMI value, the greater the correlation between words. TF-IDF consists of two parts: word frequency (TF) and inverse text frequency (IDF), where TF represents the frequency at which each word appears. The calculation is shown in Eq. (5).

$\displaystyle TF_{ab}=\frac{n_{ab}}{\sum\nolimits_{k}{n_{k,b}}}$ (5)

In Eq. (5), $n_{ab}$ refers to the number of occurrences of the word in the document, and ${\sum\nolimits_{k}{n_{k,b}}}$ represents the number of occurrences of all words in the document. IDF is the result obtained by dividing the total number of files by the number of files with that word and taking the logarithm. The higher the frequency of reverse text, the fewer files in the total file containing that word. The calculation is shown in Eq. (6).

$\displaystyle\textit{IDF}_{a}=\log\frac{|D|}{|{\{{b:t_{a}\in d_{b}}\}}|+1}$ (6)

In Eq. (6), $|D|$ represents the total number of files, and $d_{b}$ represents the files. $t_{a}$ refers to the words in the text, and $|{\{{b:t_{a}\in d_{b}}\}}|$ represents the number of files containing the word $t_{a}$ . The value of TF-IDF is equal to TF multiplied by IDF, indicating that the value of TF-IDF is directly proportional to the frequency of the word appearing in the document, and inversely proportional to its frequency appearing throughout the entire corpus. Word2vec is a high-quality word vector extraction tool that can convert the mathematical representation of words into an N-dimensional vector and calculate the similarity between two words using a cosine function [17]. Word2vec includes two models, namely the Skip-gram model and the Continuous Bag of Words (CBOW) model. The specific structure is shown in Fig. 2.

Figure 2.

Skip gram model and CBOW model structure.

In Fig. 2, $W_{(t)}$ denotes the word in the text. The CBOW model includes an input layer, a mapping layer, and an output layer. It scans the text from left to right through a sliding window and forms a sentence with the scanned words as the center word and the words on both sides representing the context. The calculation is shown in Eq. (7).

$\displaystyle p({\textit{wldocument}(w)})=\prod_{p}^{l^{w}}({d_{a}^{w}|{x_{w},% \alpha_{a-1}^{w}}})$ (7)

In Eq. (7), $w$ indicates the word in the text, and $p^{w}$ expresses the shortest path. $d_{a}^{w}$ refers to the Huffman encoding value of the $a$ -th node in path $p^{w}$ , and $\alpha_{a-1}^{w}$ represents the vector value of the $a$ -th non leaf node in path $p^{w}$ . $x_{w}$ refers to the concatenation value of $n-1$ words encountered by word $w$ in path order. The objective function of the CBOW model is derived from this, as shown in Eq. (8).

$\displaystyle\zeta=\sum\limits_{w\in C}\sum\limits_{1^{*}}^{b=2}{\{{\log[{% \theta({X_{w}^{T}\alpha_{a-1}^{w}})}]\cdot({1-d_{b}^{w}})+\log[{1-\theta({X_{w% }^{T}\alpha{}_{a-1}^{w}})}]\cdot d_{b}^{w}}\}}$ (8)

In Eq. (8), $\theta(x)$ is calculated as shown in Eq. (9).

$\displaystyle\theta(x)=\frac{1}{e^{-x}+1}$ (9)

The Skip gram model has a similar structure to CBOW, but the prediction direction is opposite. It mainly infers surrounding words based on the current input words. The calculation is shown in Eq. (10).

$\displaystyle\zeta=\sum\limits_{w\in C}\sum\limits_{w\in\textit{document}(w)}% \sum\limits_{1^{*}}^{b=2}{\{{\log[{\theta({v(w)^{T}\alpha_{b-1}^{u}})}]\cdot({% 1-d_{b}^{w}})+\log[{1-\theta({v(w)^{T}\alpha_{b-1}^{u}})}]\cdot d_{b}^{w}}\}}$ (10)

In Eq. (10), $v(w)$ represents the word vector corresponding to word $w$ . Further research has introduced graph attention mechanism into the GCN model and developed GCN_ATT model. This model uses a graph attention mechanism to redistribute the weight values of adjacent nodes, while increasing the weight of useful nodes to achieve effective data fine-grained extraction. The specific model diagram is shown in Fig. 3.

Figure 3.

GCN_ATT schematic diagram.

$G=\{{V,E}\}$ means a graph. $V$ and $E$ indicate the corresponding node set and edge set, $V=\{{v_{1},v_{2},\ldots v_{n}}\}$ . The feature matrix corresponding to each node as shown in Eq. (11).

$\displaystyle X=\{{x_{1},x_{2},\ldots,x_{n}}\},\in R^{F}$ (11)

In Eq. (11), $x_{i}$ refers to vertex features and $F$ expresses dimensions. To convert input features into high-level features, a learnable linear simple transformation is required to calculate the graph attention mechanism of nodes, as shown in Eq. (12).

$\displaystyle e_{ab}=c({W_{x_{a}},W_{x_{b}}})$ (12)

In Eq. (12), $W$ refers to the weighted matrix, and $c$ represents the graph attention mechanism. $e_{ab}$ means the importance of adjacent node $b$ to central node $a$ . When recalculating the weight value of the model, it is necessary to consider the neighboring nodes of the center node $a$ and use the softmax function for normalization, as shown in Eq. (13).

$\displaystyle\beta_{ab}=\textit{soft}\max({e_{ab}})=\frac{\exp({e_{ab}})}{\sum% \limits_{k\in\lambda_{a}}{\exp({e_{ak}})}}$ (13)

In Eq. (13), $\lambda_{a}$ refers to a certain domain of node $a$ . After calculating the weight values of adjacent nodes of node $a$ , the weighted sum is performed to obtain the characteristic values of node $a$ , as shown in Eq. (14).

$\displaystyle x^{\textit{att}}=\sum\limits_{b\in\lambda_{a}}{x_{a}\beta_{ab}}$ (14)

The above steps are repeated to gather information from all nodes in the text to obtain a new feature matrix, as shown in Eq. (15).

$\displaystyle X^{\textit{att}}=\{{x_{1}^{\textit{att}},x_{2}^{\textit{att}},% \ldots,x_{n}^{\textit{att}}}\}$ (15)

In Eq. (15), $X^{\textit{att}}$ represents the new feature matrix.

3.2 A text complexity granularity grading model construction

The purpose of constructing a text complexity grading model is to enable the target text content to be understood by machines and determine which complexity level the target text should belong to based on its displayed features. The specific steps are to first use appropriate text processing tools to extract language features that represent the complexity of the target text, and then perform principal component analysis to reduce dimensionality and optimize them. Then, a set of reasonably graded textbooks is selected as the training corpus for constructing the model. Finally, the best algorithm is selected by comparing the actual calculation results of several commonly used algorithms, and based on the training data of this algorithm, a hierarchical model is established and performance evaluation is conducted [18]. The specific steps are shown in Fig. 4.

Figure 4.

Steps for building a text complexity grading model.

The research introduces text processing software TAALES, TAASSC, and TAACO to extract the complexity features of the vocabulary, syntax, and discourse dimensions of the target text. At the same time, BNC baby is used as a reference corpus, and principal component analysis is used to provide corresponding indicators for various software for dimensionality reduction processing. Because the distribution of indicators of normal distribution is too sparse in the paper, the study first judges the data distribution state through the nonparametric Kolmogorov-Smirnov test and removes indicators of normal distribution. Then, multicollinearity test is performed on the remaining indicators to delete the highly relevant indicators to ensure the diversity of test variables. Finally, Kaiser Meyer Olkin (KMO) and Barlet sphericity are used to test the remaining indicators and complete the screening and extraction of principal components [19]. The selection of training corpus has a great impact on the performance of the model, and the higher its representativeness, the better the performance of the model. The research requires that the selected corpus must have reliable sources, standardized language, and accurately reflect differences in complexity. At the same time, the text complexity of the training corpus needs to increase step by step, and multiple texts with the same complexity level should have consistency. In addition, the size of the training corpus should not be too small, otherwise it will affect the effectiveness of the training results. The study selects the textbook New Concept English for analysis, and its unit clustering tree diagram is shown in Fig. 5.

Figure 5.

New Concept English unit clustering tree diagram.

In Fig. 5, the left and right numbers on the node represent two $p$ -values, namely AU and BP. The $p$ value of AU on the left indicates the possibility of cluster analysis. When the $p$ -value reaches 95 or above, it indicates that the probability of clustering existence reaches over 95%, and the clustering level is very significant. The closer the $p$ -value is to 100, the better the clustering effect. In Fig. 5, the clustering effect of each unit in Volumes 2, 3, and 4 of New Concept English is very ideal, and the clustering order also meets the requirements very well. In addition, although the clustering order of Volume 4 is not as unit by unit as in Volumes 2 and 3, the main purpose of writing Volume 4 is to help students understand the essence of English. Therefore, the text of this volume covers numerous disciplines, with a wide range of themes and diverse language forms. Different themes and genres will inevitably have a certain impact on the complexity of the text, and if we continue to follow the clustering method of the first two volumes, it is difficult to achieve the same effect. The complexity principal component features extracted through unsupervised clustering methods automatically classify the 13 units of New Concept English into the previous 3 volume levels, indicating a clear distinction between the textbooks. Therefore, taking into account various factors, the study selects the textbook New Concept English as the training corpus for the experiment and applies it to the text complexity grading model.

Text complexity grading is actually a text classification problem. Traditional classification methods include ordered multi classification regression and multi class logistic regression, which first make assumptions about the model data, then mathematically derive these assumptions, and finally obtain the model formula. The assumptions about the model directly determine its usability, making it relatively complex to operate. The different neural network is mainly from the objective point of view, using the idea of bionics to imitate the human brain to build models. The model structure can be seen as a nonlinear superposition of multiple linear regression models, which transforms the originally nonlinear problem into an approximately linear problem through layers of nonlinear changes. The study selects New Concept English for training and divides the texts of volumes 2, 3, and 4 into three complexity levels. The research involves the concurrent extraction of language features from text while performing dimensionality reduction. This is achieved by utilizing multi class logistic regression, ordered multi classification regression, and GCN_ATT algorithm. Additionally, a complex nonlinear function is fitted from features to level target values [20].

Model evaluation plays a key role in natural language processing. The study uses a 10-fold cross validation method to test the actual application effectiveness of the model. The specific steps are to first sort the 201 texts in the New Concept English into 3 levels based on complexity, and randomly divide them into 10 mutually exclusive subsets. Then, one subset each time is selected as one training set, and the union of the remaining nine subsets as another training set. The operation is repeated 10 times, and finally the mean of these 10 test data can be obtained. The specific process is shown in Fig. 6.

4. Improved neural network application in text complexity grading model

To construct a better performance text complexity grading model, the study first verified the effectiveness of text language readability features. The study used BFSU-HugeMind Readability Analyzer 2.0 to calculate six readability formulas, namely Automated Readability Index (ARI), Flesh Kincaid Grade Level (FKGL), Flesh Reading Ease Score (FRE), SMOG index, Gunning Fog index, Coleman-Liau index, and Coh-Metrix L2 Readability (RD2). At the same time, New Concept English was set as the training corpus, and after summarizing the readability indicators mentioned above, it was sequentially used as input features to construct a text complexity grading model. The performance indicators obtained through 10-fold cross validation are shown in Table 1.

Table 1
Performance analysis of different readability indicators

Feature	Number of indicators	Accuracy (%)	Recall (%)	F1 (%)	F1 standard deviation (%)
ARI	1	75.23	70.69	70.92	9.36
FKGL	1	75.23	72.54	71.24	8.21
FRE	1	68.59	68.74	67.03	12.23
SMOG	1	70.24	69.88	69.61	11.56
Fog	1	66.89	65.14	64.77	10.08
Coleman	1	70.20	66.21	66.36	8.56
RDL2	1	56.74	53.17	52.19	14.58

Figure 6.

10 fold cross validation flowchart.

In Table 1, compared to other readability formulas, the FKGL model had the highest accuracy, recall, and F1 value, reaching 75.23%, 70.69%, and 70.92% with the lowest standard deviation of F1 value being 8.21%. This indicated that the FKGL model had the best performance and stronger stability. The RDL2 model could accurately distinguish 59% of reading materials, which was significantly better than the FKGL model. However, its training corpus was generally non academic news texts, so the results of various indicators of the RDL2 model were not ideal. Due to the fact that the target corpus of the study is English textbook texts, factors such as sentence length and word length had a significant impact on the grading of textbook texts.

Corresponding performance tests were conducted on the algorithm model of the text complexity grading model in the experiment to verify its effectiveness. The specific experimental parameters were a sliding window size of 40 and a step size of 5. The embedding dimension value of words was 200, and dropout was set to 0.6. Each round of training was limited to a maximum of 400 times, and training stopped after ten rounds. The study selected several common text classification models, including CCN-rand and Bi_LSTM, TextGCN, and GCN_ATT models. And the model performance was compared with the F1 value and accuracy reflected on the 20NG and MR datasets. The experimental results are shown in Fig. 7.

Figure 7.

F1 value and accuracy of different models on the dataset.

Figure 8.

The influence of different factors on the accuracy of experimental results.

As shown in Fig. 7, compared to other models, GCN_ATT model had significant advantages in F1 values and accuracy on both datasets. Among them, on the 20NG dataset, The F1 of GCN_ATT was as high as 85.9%, with an accuracy of 86.8%, which was 14.5% and 13.7% higher than Bi_LSTM. In the MR dataset, the highest F1 value of this model was 75.5%, 4.3% higher than the TextGCN model. The study continued to conduct experimental validation on factors that may affect the accuracy of text classification. The validation results are shown in Fig. 8.

In Fig. 8a, before the sliding window size reaches 40, the accuracy of text classification increased with the increase of the window, reaching a maximum of 94.6%. But when it exceeded 40, the accuracy began to decline. The effective text information increased with the increase of the window, but when it exceeded a certain limit, the feature extraction effect would be affected. Therefore, it was necessary to select a sliding window of appropriate size to ensure that the training can proceed in an orderly manner. In Fig. 8b, there is a certain fluctuation in the accuracy of text classification. When the embedding dimension of text words was 200, the accuracy of text classification reached the maximum value of 94.28%. When it exceeded 200, the accuracy showed a downward trend. Within a certain range, the increase in word embedding dimension increased the effective information of the text. But exceeding a certain range would have a negative impact on the information aggregation performance of GCN, leading to a decrease in accuracy. As shown in Fig. 8c, as the number of convolutional layers increased, the accuracy of text classification showed a downward trend. The increase in the number of layers could improve computational accuracy for forward propagation, but it also led to an increase in computational complexity. For back-propagation, due to the long waiting time, the probability of overfitting phenomenon would increase significantly, thus affecting the final result. Therefore, the model generally selected 2 layers as the number of convolutional layers for research.

The study further utilized three algorithms for a 10 fold crossover experiment, namely GCN_ATT, multi class logistic regression, and ordered multi classification regression. Among them, the language input features of the text complexity grading model included 1 text length, 3 discourse complexity, 6 lexical complexity, and 9 syntactic complexity. At the same time, the input dataset was divided into 10 equal parts according to three text complexity levels, followed by 10-fold cross validation and comparison of the average performance of the three algorithms. The recall rate, accuracy, F1 value, and standard deviation of each model are shown in Fig. 9.

Figure 9.

Comparison of 10 fold crossover performance of three models.

As shown in Fig. 9, the performance of the ordered multi classification regression model was relatively weak, with none of the three indicators reaching 74%. The performance of GCN_ATT and multi class logistic regression was stronger, with indicators exceeding 88%. The performance advantages of these two models were mainly reflected in the fact that each dependent variable in the model was not affected by the position of the segmentation point. In addition, compared to multi class logistic regression models, the performance indicators of GCN_ATT were approximately 2% to 3% higher. Meanwhile, the F1 standard deviation was an indicator that reflected the volatility of the model. The larger the SD, the greater the volatility of the model, and its performance was also more unstable. Compared with multi class logistic regression and ordered multi classification regression models, the F1 standard deviation of the GCN_ATT model was reduced by 0.7 and 1.78. The performance comparison of the 10-fold cross validation model is shown in Fig. 10.

Figure 10.

Comparison of 10 fold cross validation model performance.

As can be intuitively seen from Fig. 10, the accuracy and recall of the GCN_ATT model were higher than the other two models in 10 validations. Meanwhile, the ordered multi classification regression model had the highest volatility, with a maximum range of 30% for recall and 38% for accuracy, indicating extremely unstable performance. The fluctuation range of recall and accuracy of the GCN_ATT model was less than 20%, indicating its superior stability. Continuing research on the error analysis of confusion matrix between GCN_ATT model and multi class logistic regression model is shown in Fig. 11.

Figure 11.

Model classification result confusion matrix.

As shown in Fig. 11, compared to GCN_ATT model, the multi classification regression model was only 0.82 in terms of imprecise text judgment for levels 2 and 3. Among them, the target text with incorrect judgments mostly belonged to the explanatory style. And GCN_ATT classified it into the correct level when predicting levels, thus avoiding judgment errors, resulting in a stable result of 0.89. The complexity grading model established by GCN_ATT was less susceptible to the influence of the target text style than the multi class logistic regression model. By comparing the performance indicators of the three algorithm models, GCN_ATT model had higher calculation accuracy, stronger stability, and better performance.

5. Conclusion

The construction of a language model is conducive to measuring and evaluating text complexity. The study selected New Concept English as the training corpus and develops a text complexity grading model based on GCN_ATT neural network algorithm. The experimental results showed that in the validation experiment of text language feature sets, compared with other readability formulas, the FKGL model had the highest accuracy, recall, and F1 values, reaching 75.23%, 70.69%, and 70.92%. At the same time, its F1 value had the lowest standard deviation, only 8.21%. In the comparison experiment of F1 value and accuracy on the 20NG and MR datasets, F1 value of the GCN_ATT model reached 85.9% and 75.5%, which were 2.7% and 4.3% higher than those of the TextGCN model. Meanwhile, the accuracy of the model reached 86.8% and 75.3%, 13.7% and 0.5% higher compared to Bi_LSTM model. In the experiment to verify the factors that affected the accuracy of text classification, the highest accuracy was 94.6% when the sliding window was 40, and the maximum accuracy of text classification was 94.28% when the embedding dimension of text words was 200. In addition, the accuracy of text classification decreased as the number of convolutional layers increased. In the error analysis experiment of confusion matrix, the accuracy of GCN_ATT in text judgment at levels 2 and 3 was stable at 0.89, 0.07 higher than the multi classification regression model. GCN_ATT model had high accuracy and excellent performance in text complexity grading. However, its drawbacks are relatively complex operations and high computational complexity, so further improvement is needed.

References

Crossley

Skalicky

Dascalu

. Moving beyond classic readability formulas: New methods and new models. Journal of Research in Reading. 2019; 42(3-4): 541-561.

Perski

Short

. Acceptability of digital health interventions: Embracing the complexity. Translational Behavioral Medicine. 2021; 11(7): 1473-1480.

Lupo

Tortorelli

Invernizzi

Ryoo

Strong

. An exploration of text difficulty and knowledge support on adolescents’ comprehension. Reading Research Quarterly. 2019; 54(4): 457-479.

Liu

Zheng

Zhou

. Measuring the difficulty of text translation: The combination of text-focused and translator-oriented approaches. Target. 2019; 31(1): 125-149.

Jia

Carl

Wang

. Post-editing neural machine translation versus phrase-based machine translation for English-Chinese. Machine Translation. 2019; 33(1-2): 9-29.

Aydın

. Deep learning classification of neuro-emotional phase domain complexity levels induced by affective video film clips. IEEE Journal of Biomedical and Health Informatics. 2019; 24(6): 1695-1702.

Biber

Reppen

Staples

Egbert

. Exploring the longitudinal development of grammatical complexity in the disciplinary writing of L2-English university students. International Journal of Learner Corpus Research. 2020; 6(1): 38-71.

Zhao

Chen

. Semi-supervised text simplification with back-translation and asymmetric denoising autoencoders. Proceedings of the AAAI Conference on Artificial Intelligence. 2020; 34(5): 9668-9675.

Chen

. An exploratory study on the role of L2 collaborative writing on learners’ subsequent individually composed texts. The Asia-Pacific Education Researcher. 2019; 28(6): 563-573.

10.

Lee

Kolodge

. Exploring trust in self-driving vehicles through text analysis. Human factors. 2020; 62(2): 260-277.

11.

Adhariani

Du Toit

. Readability of sustainability reports: Evidence from Indonesia. Journal of Accounting in Emerging Economies. 2020; 10(4): 621-636.

12.

Katarya

Arora

. Capsmf: A novel product recommender system using deep learning based text analysis model. Multimedia Tools and Applications. 2020; 79(47-48): 35927-35948.

13.

Van Cranenburgh

van Dalen-Oskam

van Zundert

. Vector space explorations of literary language. Language Resources and Evaluation. 2019; 53(4): 625-650.

14.

Abrao

Andres

Miller

Gingold

. AAGL 2021 endometriosis classification: an anatomy-based surgical complexity score. Journal of Minimally Invasive Gynecology. 2021; 28(11): 1941-1950.

15.

Ehret

Szmrecsanyi

. Compressing learner language: An information-theoretic measure of complexity in SLA production data. Second Language Research. 2019; 35(1): 23-45.

16.

Cheng

Bao

Zarifis

Gong

Mou

. Exploring consumers’ response to text-based chatbots in e-commerce: The moderating role of task complexity and chatbot disclosure. Internet Research. 2021; 32(2): 496-517.

17.

Khushik

Huhta

. Investigating Syntactic Complexity in EFL Learners’ Writing across Common European Framework of Reference Levels A1, A2, and B1. Applied Linguistics. 2020; 41(4): 506-532.

18.

Bacha

Ajina

. CSR performance and annual report readability: Evidence from France. Corporate Governance: The International Journal of Business in Society. 2020; 20(2): 201-215.

19.

Housen

De Clercq

Kuiken

Vedder

. Multiple approaches to complexity in second language research. Second Language Research. 2019; 35(1): 3-21.

20.

Jin