Nesterov-accelerated Adaptive Moment Estimation NADAM-LSTM based text summarization 1

Abstract

Automatic text summarization is the task of creating concise and fluent summaries without human intervention while preserving the meaning of the original text document. To increase the readability of the languages, a summary should be generated. In this paper, a novel Nesterov-accelerated Adaptive Moment Estimation Optimization based on Long Short-Term Memory [NADAM-LSTM] has been proposed to summarize the text. The proposed NADAM-LSTM model involves three stages namely pre-processing, summary generation, and parameter tuning. Initially, the Giga word Corpus dataset is pre-processed using Tokenization, Word Removal, Stemming, Lemmatization, and Normalization for removing irrelevant data. In the summary generation phase, the text is converted to the word-to-vector method. Further, the text is fed to LSTM to summarize the text. The parameter of the LSTM is then tuned using NADAM Optimization. The performance analysis of the proposed NADAM-LSTM is calculated based on parameters like accuracy, specificity, Recall, Precision, and F1 score. The suggested NADAM-LSTM achieves an accuracy range of 99.5%. The result illustrates that the proposed NADAM-LSTM enhances the overall accuracy better than 12%, 2.5%, and 1.5% in BERT, CNN-LSTM, and RNN respectively.

Keywords

Text summarization automatic text summarization Nesterov-accelerated Adaptive Moment Estimation Long Short-Term Memory

1 Introduction

In Text Summarization, web resources like user reviews, websites, current news, blogs, social media, and many other sources are available on the Internet and developed rapidly which is considered to be the major source of textual content, and after that came textual data proliferation does on many media, including news reports and magazines, legal records, biomedical files, research articles, etc. [1]. Consequently, the data retrieving process is time-consuming and highly complicated. It is not applicable for reading and understanding the text data of search outcomes [2].

Uncertainty leads to irregular data in the absolute context. Hence, summarizing the text is highly essential significant for better understanding. Generally, manual summarization is complicated and time-consuming due to the application of enormous textual content [3]. ATS is one of the best solutions to overcome predefined issues. The major responsibility of the ATS model is to generate a summary with effective suggestions and opinions for input documents with limited storage space and the reputation of a method has to be retained [4].

The ATS technique is quite helpful for people to extract the necessary and important information from background papers without having to read the entire document. Furthermore, an automatically created summary greatly benefits users as it consumes minimum time and work [5]. An automated summary is depicted as: ”The productive summary purifies an essential content from an original document and generates a compact version of significant details for a specific user(s) and operation (s)” [6].

A summary is simply called a text generated from maxi-mum texts, that represents the essential data as same as actual text(s). Here, the text involved can be applied to speech, multimedia documents, hypertext, and so on [7]. Therefore, the emulated summary is too small when compared to the input text and is composed of vital content [8, 9]. Single Document Summary (SDS) and Multiple Document Summary (MDS) ATS models are classified. The first model generates a synopsis from a single document, whereas the second generates a synopsis from a group of papers. [10].

An ATS contains three types they are Extractive, abstractive, and Hybrid. Initially, an extractive model is used for selecting significant sentences from the input documents and a summary has been generated [11, 12]. Secondly, the abstractive framework shows the input text in in-between for-mat and summarizes effectively with the drastic difference among the original text [13]. Finally, a combination of extractive and abstractive approaches is defined as a hybrid method. The typical structure of the ATS method is depicted in Fig. 1 and the different components are listed below [14, 15].

Fig. 1

Structure of ATS model.

The main contributions of the proposed NADAM-LSTM-based ATS technique are given asfollows.

Initially, the Giga word Corpus dataset is pre-processed utilizing Tokenization, Word Removal, Stemming, Lemmatization, and Normalization.

In the summary generation phase, the text is converted to the word-to-vector method. Further, the text is fed to LSTM to summarize the text. The parameter of the LSTM is then tuned using NADAM Optimization.

The performance analysis of the proposed NADAM-LSTM is calculated based on parameters like accuracy, specificity, Recall, Precision, and F1 score.

The remaining portion of the work has been followed by, Section 2, which discusses the literature review. Section 3 illustrates the proposed NADAM-LSTM-based TS method. Section 4 illustrates the Results and Discussion of the proposed technique. Section 5 discusses the conclusion.

2 Literature review

ATS is a technique for condensing long texts so that the summary contains all of the important points from the original document. This section discusses several text summarization methods that have been performed in this area due to the importance of Automatic text summarization.

In 2020 Mansoor, M., et al., [16] presented a deep-learning method for predicting duplicate question pairs. This method used CNN and LSTM networks, which are suitable for storing long-term dependencies. This suggested method of finding duplicate questions utilized DL models worked very well and gave better results than existing techniques. The suggested model was computed using the Quora dataset and achieved an acc of 87.50%.

In 2022 Sharma, J., et al. [17] developed a long-term solar PV power forecasting method based on the LSTM technique with the Nadam optimizer. Because it retains information from more time steps, the LSTM technique works better with time series data. The Nadam optimizer outperforms the autoregressive integrated moving average by 30.56%, the seasonal autoregressive integrated moving average by 47.48%, and models employing the RMSprop, Ada-max, Adam, Ftrl optimizer, SGD, Adagrad, and Adadelta by 1.43%, 1.35%, 3.51%, 11.84%, 4.88%, 50.69%, and 58.29%.

In 2022 Jain, A., et al. [18] devised a Hindi Health Data (HHD) corpus, an ATS method for the Hindi languages. ATS employs the Real Coded Genetic Algorithm (RCGA), which uses choice, designed to simulate Boolean Crossover (SBX), and Quadratic formula Mutation to optimize feature weights. The ATS extraction system reduces summary by 65% when compared to current summarizingtechniques.

In 2019 Thattinaphanich, S. and Prom-on, S., et al., [19] introduced a Bi-LSTM-CRF application with word. First, in this method prepare the text by symbolizing a sentence into a group of words. This model has been evaluated by the NER open-source corpus of the ThaiNLP Facebook group. The model results for precision, recall, and F1 score are 91.51%, 91.79%, and 91.65%, respectively.

In 2019 Song, S., et al., [20] created an ATS framework (ATSDL) built on LSTM-CNN that can build new sentences by finding pieces better than sentences, i.e., semantic sentences. Regarding semantic and syntactic structure, the ATSDL framework surpasses the state of the art and gets comparable results in terms of quality assessment, according to test findings on the CNN and Daily Mail datasets. Increase the use of written speaking.

In 2021 Jiang, J., et al. [21] suggested four new ATS models with sequence structure (Seq2Seq), using two-way attention-based LSTM, with additional improvements to increase the correlation between summary summaries generated and source text, solve out-of-vocabulary (OOV) problems, eliminate repetitive words, and prevent the propagation of errors that accumulate in generated text summaries.

In 2019 Rahman, M.M. and Siddiqui, F.H., et al., [22] presented an abstract text summarization model, LSTM (MAPCoL), a multilayer convolutional peephole ie automatically produces a summary from a long text. MAPCoL also outperforms traditional LSTM-based methods when it comes to semantic consistency in output summaries.

In 2023 Al Abdulwahid, A., et al., [23] presented a synthesis of short text with improved accuracy using a state-of-the-art algorithm called BERT. The suggested strategy, which aims to enhance the duration and accuracy of training data for short text summaries, using a confusion matrix to track and analyze results and enhance them, BERT+Transformer has been proven to be 97% accurate.

In 2019 Al Munzir, A., et al. [24] introduced an extracted text summarization technique based on a recurrent neural network (RNN) deep learning model to summarize a single document. Our method consists of classifying sentences as meaningful or not to the summary. We used an RNN based on short-term long-term memory (LSTM), and generalized repeat units (GRU). Overall, LSTM was found to be more promising, with average F1 scores of 0.63, 0.59, and 0.56 for Rouge-1, Rouge-2, and Rouge-3, respectively.

In 2018 Li, Y., et al. [25] suggested CS-GAN, a text-generation model that can capture sentence construction and to include categorization data helpful for supervised learning. Particularly with multi-category datasets, the proposed technique must be capable of supervised learning with produced phrases. The suggested working model was also tested exclusively on the sentiment classification task, where CS-GAN excelled with a range of phrase lengths, especially for a small labeled dataset containing brief sentences.

2.1 Review comparison

2.1.1 Summary of the literature review

According to the literature review, most approaches fail to achieve topic identification, interpretation, summary generation, and evaluation of the generated summary. To overcome these drawbacks, Nesterov-accelerated Adaptive Moment Estimation based Text Summarization (NADAM- LSTM) has been proposed, which increases the energy efficiency and network lifetime and eliminates redundant data.

3 Structure of the paper

Initially, the Giga word dataset is pre-processed using Tokenization, Word Removal, Stemming, Lemmatization, and Normalization. In the summary generation phase, the text is converted to word vector method. Further, the text is fed to LSTM to summarize the text. The parameter of the LSTM is then tuned using NADAM Optimization. The architectural structure of the proposed technique is illustrated in Fig. 2.

Fig. 2

Proposed NADAM-LSTM.

Table 1

Comparative analysis with existing techniques

Author with Year	Methods	Advantages	Disadvantages
In 2020 Mansoor, M.,	CNN and LSTM	Deep learning has the primary benefit of having just phrases as input, requiring little to no preprocessing, and saving time by eliminating sophisticated feature engineering.	Measuring text similarity is one of the crucial difficulties since, in comparison to numerical data, text data is comparatively big in quantity and variety.
In 2022 Sharma, J.,	LSTM	Two datasets use an effective solution	The dataset has some spelling mistakes
In 2022 Jain, A.,	HHD	TS is to reduce a lengthy document while maintaining the essential information.	The size and complexity provide a challenge to the Hindi language ATS processing.
In 2019 Thattinaphanich, S. and Prom-on, S.,	Bi-LSTM-CRF	This method was calculated by the NER open-source corpus from a Facebook group ThaiNLP.	Thai language development is nevertheless constrained by a lack of resources and linguistic problems, such as the absence of boundary markers, phrases, and sentences.
In 2019 Song, S.,	ATSDL	It consists of two primary phases: the first generates text summaries using deep learning, and the second extracts phrases from source sentences.	For humans, manually summarizing lengthy text documents is exceedingly challenging and time-consuming.
In 2021 Jiang, J.,	Seq2Seq	It is also much easier to find suitable words for copying from the source text.	It requires a lot of processing resources and has poor precision.
In 2019 Rahman, M.M. and Siddiqui, F.H.,	MAPCoL	Peephole Convolutional LSTM (PCLSTM) is a sort of LSTM that excels at making accurate predictions and classifications in a variety of situations.	The generation of summaries from lengthy texts via abstract text summarising is still a significant open issue in natural language processing. If the escape gate is shut, it is impossible to return to the prior cell condition.
In 2023 Al Abdulwahid, A.,	BERT	It gives high accuracy for using the confusion matrix	It required a high amount of data
In 2019 Al Munzir, A.,	RNN	Training this model with a huge number of high-quality data sets will undoubtedly improve its accuracy and performance.	Lots of training data, a lot of effort, and expensive training gear
In 2018 Li, Y.,	CS-GAN	It gives highly classified information.	It tends to overfit when the training dataset is small, and it is unable to accurately capture category information when the class number is large.

3.1 Preprocessing

The modifying or removing of data before use to ensure or improve performance is known as data preprocessing, which is an important step in the data mining process. In the starting phase, noise from the data has been eliminated from the social media by applying the given procedures:

Remove the URL through routine expression mapping. A regular expression is defined as a textual pattern that shows the exploring pattern of strings and text.

Restore “@Username” with “usr” by applying regular expression matching.

As “hash-tag (#)” offers helpful details, it eliminates # and retains the word as same. viz., “# Lee” is restored with “Lee”.

Avoid forward slash (/), parenthesis, backward slash (∖), and – from Twitter context.

Eliminate multiple white spaces by replacing single white spaces.

Pre-processing methods include tokenization, word Removal, Stemming, Lemmatization, and Normalization. The majority of these methods are typically applied during an ATS system’s pre-processing stage.

Tokenization:

Tokenization preserves the sense of the text while breaking it up into useful data. In this step, long paragraphs also referred to as pieces of text or tokens are divided into individual phrases. These phrases can also be divided into individual words.

Word Removal:

In this process, repeated words are removed from the text. There are several stop words, such as “are,” “of,” “the,” and “at.” As a result, these must be removed from the text.

Stemming:

Through the process of stemming, words in several tenses are reduced to their most basic forms, eliminating needless computations.

Lemmatization:

A lemmatization process combines two or more words into one single word. This process eliminates endings such as shocked to shock, caught to catch, etc based on the morphology of the word.

Normalization:

Many actions are completed at once to accomplish normalization. In the process, all text will be converted to upper- or lowercase, punctuation will be removed, and numerals will be replaced with words. Consequently, each text will undergo more uniform pre-processing.

3.2 DL-based summary generation

The Nesterov-accelerated Adaptive Moment Estimation (NAdam) technique augments the Adaptive Movement Estimation (Adam) optimization process with Nesterov’s Accelerated Gradient (NAG) or Nesterov momentum, which is an upgraded kind of momentum.

3.2.1 GloVe

GloVe, which stands for Global Vectors, is a distributed word representation approach. This model produces vector representations of words using an unsupervised learning process. GloVe stands for Global Vector, and GloVe Embedding is a distributed word representation method for text extracted from web pages that use unsupervised learning. Because GloVe is implemented in parallel, training on data is made simpler. It displays, in vector space, the semantic relationships between words. A global co-occurrence matrix X was built to train the GloVe word embedding model with words from the Wikipedia dataset. This post makes use of vectors with an embedding layer of size 100 that were generated using a vocabulary of 400,000 words, as well as pre-trained word vector data called glove6b.zip.

In the co-occurrence matrix, X: Xij represents the number of context words i that occur with word j. The GloVe model minimizes the following objective function: $I = Minimize \sum_{j = 1}^{U} \sum_{i = 1}^{U} (V_{j}^{g} V_{i} + a_{j} + a_{i} - log y_{ji})$ (1)

3.2.2 Architecture of LSTM

The LSTM network is a deep learning-related recurrent neural network (RNN) network development. LSTMs, as opposed to feature feedback links, are standard feed-forward networks. Since 1997, Juergen Schmidhuber and Sepp Hochreiter have been developing the LSTM to address the vanishing gradient problem with RNN. Nowadays, LSTM has a wide variety of applications because of the organization and popularization it underwent in the past by multiple people. LSTM networks are highly suited to categorize, analyze, and predict utilizing time series data since significant events in a time series may occur at unpredictable times apart. Output to determine what has to be remembered and what can be disregarded. The LSTM cell also has cell update, commonly known as the tanh layer, which is typically a component of the cell state, in addition to the three gates. There are three variables assigned to each LSTM cell: Xt, current input, ht–1, previous output, Ct–1 previous cell status. In contrast, two variables emerge from the cell: ht current output, and Ct current cell status Fig. 3 depicts the LSTM model’s structure. The LSTM cell must first be used before implementing the LSTM repetitive network.

Fig. 3

Architecture of LSTM.

$g_{s} = σ (V_{g} [f_{s - 1}, Y_{s}] + C_{g})$ (2) $j_{s} = σ (V_{g} [f_{s - 1}, Y_{s}] + C_{j})$ (3) ${\overset{⌣}{D}}_{s} = tan h (V_{d} [f_{s - 1}, Y_{s}] + c_{d})$ (4) $D_{s} = g_{s} \times D_{s - 1} + j_{s} \times tan h (V_{cd} [h_{t - 1}, X_{t}] + b_{c})$ (5) $P_{s} = σ (V_{p} [f_{s - 1}, Y_{s}] + c_{p})$ (6) $f_{s} = p_{s} \times tan (D_{s})$ (7)

The bias vector restrictions must be acquired during training. The integers d and h reflect the number of input features and hidden units, respectively.

The network learns new information by contrasting it with prior states while choosing whether to maintain the current input. The prior state vector, ht-1, contains the modifications to the previous input and output. Based on this, the first input gate’s sigmoid function range is a number between 0 and 1, which represents the range of new information accessible at the current input. The value produced is equal to the input value times 1. Become. As a result of the input gate values’ range of [0,+], the input vector stays clear of vanishing gradients for values in the same range, enabling the network to gain knowledge from each input. $j_{g} = y_{g} \times σ V_{j} . [h_{g - 1}, y_{g}] + a_{j}$ (8)

For the candidate vector $\tilde{C} t$ , the tanh function ranges from –1 to+1. The resulting value is multiplied by the input value to produce a candidate vector between (−∞,+∞) as shown in Equation (2). This improves the ability to learn and update new information in the cell state vector and reduces time series variation. ${\tilde{C}}_{g} = y_{g} \times \tanh V_{j} . [h_{g - 1}, y_{g}] + a_{q}$ (9)

The current hidden state (hs) and current memory state (Cs) are the LSTM’s memory cell outputs. Equations (3) and (4) reflect the LSTM neural network’s output gates and hidden states, respectively. $q_{g} = σ (V_{o} . [f_{g - 1}, y_{g}] + a_{o}$ (10) $f_{g} = q_{g} \times \tanh (C_{g})$ (11)

The LSTM cell’s first stage is to determine which information should be removed from the cell status. A SiLU layer also called the “Forget Gate Layer” makes this choice. (f_t). It checks the X_t and h_t-1, then assigns C_t-1 a cell status between 0 and 1 representing an output value. The expression “Forget This Fully” is expressed by the output value of 1, and the phrase “Get Rid of This Fully” is conveyed by the output value of 0.

Choosing what additional data should be added to the cell state is the next stage. There are two components to it. The “input layer,” a SiLU layer, chooses which values should be updated first. Next, a tan h layer generates a new applicant value vector, ${\tilde{C}}_{t}$ can be further added to the new state.

In the next level, to update the state, i_t and ${\tilde{C}}_{t}$ are combined. As a result, the choice of how much to update each value is scaled, yielding an applicant value. The previous decision was forgotten by multiplying C_t-1 by f_t.

Finally, a decision is taken about what will be produced. The output is influenced by the condition of the cell. It could, however, be a filter version. A SiLU layer is first conducted to decide where the output will be delivered in the cell state. The new value is multiplied by the output of the SiLU between (+1) and (–1) generated from the cell state C_t by the o_t tan h layer.

3.2.3 NADAM-optimized Long Short-term Memory (NADAM-LSTM) model

NADAM is constructed with a combination of ADAM and Nesterov Accelerated Gradient (NAG). NADAM is better than ADAM because NAG could achieve better results than classic momentum. The proposed NADAM-LSTM model involves three stages namely preprocessing, LSTM-based summary generation, and NADAM-based parameter tuning. The classical LSTM method is not applicable to capture longer distance semantic connections; even if it is capable of transferring semantic data among words. In the case of parameter training, a gradient is reduced slowly until it gets diminished. Finally, the length of the series data is mitigated. Figure 4 illustrates the performance ofNADAM-LSTM.

Fig. 4

Performance of NADAM-LSTM.

LSTM is a well-known model applied in resolving the gradient diminishing by using Input gate j, Output gate 0, Forget gate g as well as Memory cell. Here, $\vec{b}$ implies the distributed word vector equipped using word2vec. A weighted word vector $\vec{Q}$ developed in this study is depicted in the following: $\vec{Q} = v_{j} . \vec{b}$ (12)

Forget gate g estimates the data to be forgotten in the Memory cell at the consequent moment. The input is f (s-1) and y (t). Hence, output values among and 1. The estimation model is depicted as: $g_{s} = σ (V_{g} Y_{s} + Q_{g} f_{s - 1} + W_{g} D_{s - 1} + C_{g})$ (13)

Followed by, f_s-1 and Y_s shows the inputs of the LSTM unit. V_g denotes a connecting weight of Y_s and forget gate g. Q_g implies connecting the weight of f_s-1 as well as forget gate g. means a state of a Memory cell at the final moment. W_g depicts a connecting weight of D_s-1 and forget gate g. C_g refers to a bias term. 0 represents a SiLU activation function. The input gate computes the data to be upgraded in the Memory cell recently. The calculation approach is depicted as: ${\begin{matrix} j_{s} = σ (V_{j} Y_{s} + Q_{j} f_{s - 1} + W_{j} D_{s - 1} + C_{j}) \\ D_{i} n_{s} = tan f (V_{d} Y_{s} + Q_{d} F_{s - 1} + W_{d} D_{s - 1} + C_{D}) \\ D_{s} = g_{t} . D_{s - 1} + j_{s} . D_{i} n_{s} \end{matrix}$ (14)

Hence, V_j signifies a connecting weight of Y_s and j_s, Q_j denotes a connecting weight of f_s-1 and j_s, W_j implies a connecting weight of D_s-1 and j_s . V_d refers a connecting weight of Y_s and D_in_s shows a connecting weight of D_in_s and F_s-1. tan f defines a tan f activation function. g_t and j_s refers to the weights of D_s-1 and D_in_s . C_j and C_D indicates the bias terms. The output gate computes the final value of the LSTM unit. Henceforth, the calculation mechanism is illustrated as, ${\begin{matrix} P_{s} = σ (V_{p} Y_{s} + Q_{p} f_{s - 1} + W_{p} D_{s - 1} + C_{p}) \\ f_{s} = P_{s} tan f (D_{s}) \end{matrix}$ (15)

Where denotes the connecting weight of Y_s and P_s. Q_p means the connecting weight of f_s-1 and P_s W_p showcases a connecting weight of D_s-1 and P_s. C_p implies a bias term.

The model uses the pseudo-code of optimization algorithm Nadam, as shown in Algorithm.

Algorithm: NADAM

Input: α₀, . . . , α_R ; μ₀, . . . , μ_r ; u ; Hyperparameters: ɛ;

n_{0}; m_{0} \leftarrow 0 (\frac{first}{second} moment vectors)

Output: θ_r

While θ_r not converged do

h_r ← ∇ θ_r-1d_t (θ_r-1)

n_r ← μ_rn_r-1 + (1 - μ_r) h_r

m_{r} \leftarrow u n_{r - 1} + (1 - u) h_{r}^{2}

\hat{n} \leftarrow (\frac{μ_{r} + 1 n^{r}}{1 - \prod_{j = 1}^{r - 1} μ_{j}}) + (\frac{(1 - μ_{r}) h_{r}}{1 - \prod_{j = 1}^{r} μ_{j}})

\hat{m} \leftarrow \frac{u m_{r}}{1 - u^{r}}

θ_{r} \leftarrow θ_{r - 1} - \frac{α_{r}}{\sqrt{\hat{m_{r} + ɛ}}} \hat{n_{r}}

end while

Nesterov-accelerated adaptive moment estimation is attempted by the Nadam optimizer in the Adam. The major benefit of this integrative manner is that the adaptive moment estimation used allows for highly accurate steps in the transformed data via notifications of design variables with the momentum growth and developmental to gradient computation.

3.3 Sentence ranking

Sentences are rated and sorted according to the weights that have been given to them. The input material is parsed into its high-level paragraphs, a thorough summary is produced, and the summary is then recorded as audio. A significant variety of information retrieval tasks, including categorization, sentiment classification, and others, heavily rely on ranking. Sentence rating is a technique used by researchers to identify suspicious and original sentence pairings in texts that contain both original and suspicious sentences. The strategy is based on how similarly related the sentences are. Using this method, which eliminates superfluous words and streamlines execution, a list of source and suspect sentence pairings is generated. The idea of sentence ranking is discussed in this study. A dataset from the original with “n” unique words is kept as a starting point. Then, each sentence in the suspect document is compared to each sentence in the original document using a vector-matching technique. They determine the cosine angle between two words using a cosine similarity metric. The question and sentence vectors are increasingly similar the closer the cosine angle is near 1.0, which has a value between 0 and 1. As a consequence, they amass a collection of similarity measure values between each suspect paragraph and the phrases in the original text. We choose the paragraph pairings from the collection with the best accuracy value.

4 Result and discussion

The experimental arrangement of the proposed NADAM -LSTM-based TS was implemented utilizing MATLAB to identify Automatic Text Summarization. Accuracy, specificity, precision, and recall are the different metrics used to evaluate it. Several text summarizations are used to evaluate the suggested model’s effectiveness. A comparison of the suggested NADAM-LSTM Performance with BERT, CNN-LSTM, and RNN, is made. Regarding Recall, Precision, Accuracy, and F-Measure. MATLAB is used to run the simulation.

4.1 Dataset description

The Giga Word Dataset contains headline generation on a corpus of article pairs from Giga Word, which contains approximately 4 million articles in English. Text file with a size of 4M. The dataset contains just over one billion words drawn from a variety of sources dating back to 1950. It is intended to be representative of text summarization, which includes texts published between the 1950 s and the present. The most recent significant change in written summarization was the gradual elimination of subject-verb agreement. Even though most spoken varieties had lacked this distinction for centuries, the written standard required the use of distinct singular and plural forms of verbs.

4.2 Performance analysis

The effectiveness of the categorization strategy is assessed utilizing the following statistical parameters such as accuracy, recall, precision, and F1 score. $Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$ (16) $Precision = \frac{TP}{TP + FP}$ (17) $Recall = \frac{TP}{TP + TN}$ (18) $F 1 score = 2 (\frac{Precision * Recall}{Precision + Recall})$ (19)

Where TN, FN depicts the true negative and false negatives and TP, FP depicts the true and false of the sample.

The proposed model exhibits good accuracy in both training and testing, as shown in Fig. 5. Raising the epoch value enhances the model’s performance. To demonstrate how the model loss lowers as the period size grows, the loss curve and epochs are displayed in Fig. 6. As a result, the model’s predictions of the outcomes were very precise.

Fig. 5

Training and Testing accuracy curve of NADAM-LSTM.

Fig. 6

Training and Testing accuracy curve of NADAM-LSTM.

For example, the recommended abstract size ratio should be 33–40%, but some abstracts have a size ratio of up to 80% of the specified text. About 40 items are divided into various sections, as shown in Fig. 7.

Fig. 7

Raw text line counts with line count output summaries.

Figure 8 shows the Sentence via Text Rank Score. The graph-based ranking method forms the foundation of the Text Rank algorithm. Google’s web searches are where it is most commonly utilized, but it has numerous other uses as well. To determine a vertex’s relevance, graph-based ranking algorithms consider data about the entire graph rather than just vertex-specific data.

Fig. 8

Sentence via Text Rank Score.

The knowledge of the connections (edges) between the vertices would be a typical piece of information. they must specify the vertices and edges we intend to use in the NLP scenario. In this instance, sentences will serve as the vertices and words as the edges of the connections. As a result, sentences containing words that are used in numerous other sentences are given more weight.

Figure 9 demonstrates the gradual reduction in training and test errors as well as the strong generalization of the qualified AE. Small quantities of random noise were added to the input representation to get rid of zeros and make ensembles easier. The outcomes show that simply eliminating ze-roes is ineffective; AE (Ltf) consistently outperforms Ltf-NAE. SKE data is included in the comparison because it is bigger and more varied. The average squared reconstruction error of the train and test sets for Ltf-NAE (Uniform) and AE(Ltf) for the SKE corpus, plotted against the number of epochs(iterations).

Fig. 9

Mean squared reconstruction error via the number of epochs.

4.3 Comparative analysis

A comparison of the suggested model and the present DL model is done in this part. We used accuracy, recall, F1 score, and performance comparisons of current approaches to demonstrate that the outcomes of the suggested strategy are more successful.

Case Study (i)

Figure 10 (a) shows the training and testing of Standard Input with the overall accuracy of the proposed method is 99.5%. A small paragraph is given to train the overall accuracy.

Fig. 10

(a) Case Study I Standard Input.

Case Study (ii)

Figure 11 (a) shows the training and testing of Standard Input with the overall accuracy of the proposed method is 89.5%. A small paragraph is given to train the overall accuracy.

Fig. 11

(b) Case II Customized training and testing.

Figure 12 depicts a ROUGE point with different epoch values. The epoch number specifies how many times the same data was utilized to train the algorithm. The number of epochs in this event ranges from 100 to 250. As the number of epochs grows, the efficacy of NADAM-LSTM improves. However, the efficiency increase from 100 to 200 epochs is significant and continues after that. When the number of epochs exceeds 200, the issue of overfitting arises. As a result, when the methods are trained for more than 200 epochs, their efficiency does not increase.

Fig. 12

Average performance of the proposed method on various epoch numbers.

4.4 Ablation study

An ablation study was made to assess the efficiency of the NADAM employed in the optimization stage. This experiment, with NADAM Optimization and without NADAM Optimization was illustrated in Fig. 13 according to a comparison of the optimization process of the accuracy, precision, and specificity. According to our findings, optimization utilizing ablation methods was typically less accurate than optimization using the NADAM method, which proves the usefulness of the NADAM method in the optimization process.

Fig. 13

Performance comparison in ablation study when removing optimization.

Initially, evaluated the effectiveness of the proposed NADAM-LSTM method and the ablation study was performed with and without NADAM. There is a possibility that the model without NADAM had the lowest accuracy in optimization. The optimization model the text summarization uses high-level optimization.

Figure 14 illustrates the Performance comparison in the ablation study. Evaluated the effectiveness of the proposed NADAM-LSTM technique and the ablation study is performed with and without Activation Layer. There is a possibility that the technique without sigmoid had the lowest accuracy in optimization. The technique for activation the text summarization uses a high-level Activation layer.

Fig. 14

Performance comparison in ablation study.

5 Conclusion

In this paper, a novel Nesterov-accelerated Adaptive Moment Estimation Optimization based on Long Short-Term Memory [NADAM-LSTM] has been proposed to summarize the text. The proposed NADAM-LSTM model involves three stages namely pre-processing, summary generation, and parameter tuning. Initially, the Giga word Corpus dataset is pre-processed using Tokenization, Word Removal, Stemming, Lemmatization, and Normalization. In the summary generation phase, the text is converted to the word-to-vector method. Further, the text is fed to LSTM to summarize the text. The parameter of the LSTM is then tuned using NADAM Optimization. The performance analysis of the proposed NADAM-LSTM is calculated based on parameters like accuracy, specificity, Recall, Precision, and F1 score. The suggested NADAM-LSTM achieves an accuracy range of 99.5%. The experimental result shows that the proposed NADAM-LSTM enhances the overall accuracy better than 12%, 2.5%, and 1.5% in CNN-LSTM, BERT, and RNN respectively. In the future consequences of expanding the extent of the model’s training data may be examined in further research. Using data from the same domain as the training data to train the fast Text shallow neural network and produce word embeddings for the same is another option that may be taken into account.

References

Mridha

M.F.

, Lima

A.A.

, Nur

, Das

S.C.

, Hasan

and Kabir

M.M.

, A survey of automatic text summarization: Progress, process and challenges, IEEE Access 9 (2021), 156043–156070.

Abu Nada

A.M.

, Alajrami

, Al-Saqqa

A.A.

and Abu-Naser

S.S.

, Arabic text summarization using arabert model using extractive text summarization approach, International Journal of Academic Information Systems Research (IJAISR) 4(8) (2020), 6–9.

Song

, Huang

and Ruan

, Abstractive text summarization using LSTM-CNN based deep learning, Multimedia Tools and Applications 78(1) (2019), 857–875.

Alomari

, Idris

, Sabri

A.Q.M.

and Alsmadi

, Deep reinforcement and transfer learning for abstractive text summarization: A review, Computer Speech & Language 71 (2022), 101276.

Srivastava

, Singh

, Rana

K.P.S.

and Kumar

, A topic modeled unsupervised approach to single document extractive text summarization, Knowledge-Based Systems 246 (2022), 108636.

Sun

, Fine-tuning transformers: assessments of (Doctoral Dissertation, California State University, Northridge), 2022.

Alshahrani

, A Hierarchical Extractive Text Summarization Approach (Doctoral dissertation, North Carolina Agricultural and Technical State University), 2021.

Choi

S.J.

, Choi

S.W.

, Kim

J.H.

and Lee

E.B.

, AI and text-mining applications for analyzing contractor’s risk in invitation to bid (ITB) and contracts for engineering procurement and construction (EPC) projects, Energies 14(15) (2021), 4632.

Souza

C.M.

, Meireles

M.R.

and Almeida

P.E.

, A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset, Scientometrics 126(1), 135–156.

10.

Yang

, Song

, Sun

and Wang

, Research on TPS Code Automatic Generation Technology of Signal-oriented ATS, In, 2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC) IEEE 6 (2022), 462–465.

11.

Kumari

, Text Mining and Pre-Processing Methods for Social Media Data Extraction and Processing, In Handbook of Research on Opinion Mining and Text Analytics on Literary Works and social media, IGI Global (2022), 22–53.

12.

Putra

I.M.S.

, Adiwinata

, Putri

D.P.S.

and Sutramiani

N.P.

, Extractive Text Summarization of Student Essay Assignment Using Sentence Weight Features and Fuzzy C-Means, International Journal of Artificial Intelligence Research 5(1) (2021), 13–24.

13.

Rathi

, Raj

, Mohan

and Singh

Y.V.

, A review of state-of-the-art automatic text summarisation, International Journal of Creative Research Thoughts 2022 (2022).

14.

Lima

, Brito

, Nascimento

, Valença

and Pedrosa

, Using natural language processing to improve transparency byenhancing the understanding of legal decisions, EGOV-CeDEM-ePart 2022 (2022), 173.

15.

Eichstaedt

J.C.

, Kern

M.L.

, Yaden

D.B.

, Schwartz

H.A.

, Giorgi

, Park

, Hagan

C.A.

, Tobolsky

V.A.

, Smith

L.K.

, Buffone

and Iwry

, Closed-and open-vocabulary approaches to text analysis: A review, quantitative comparison, and recommendations, Psychological Methods 26(4) (2021), 398.

16.

Mansoor

, ur Rehman

, Shaheen

, Khan

M.A.

and Habib

, Deep learning based semantic similarity detection using text data, Information Technology and Control 49(4) (2020), 495–510.

17.

Srivastava

N.D.

and Sharma

, Combating Online Hate: A Comparative Study on Identification of Hate Speech and Offensive Content in social media Text, In 2020 IEEE Recent Advances in Intelligent Computational Systems (RAICS), IEEE (2020), 47–52.

18.

Jain

, Arora

, Morato

, Yadav

and Kumar

K.V.

, Automatic text summarization for Hindi using real coded genetic algorithm, Applied Sciences 12(13) (2022), 6584.

19.

Thattinaphanich

and Prom-on

, Thai named entity recognition using Bi-LSTM-CRF with word and character representation. In 2019 4th International Conference on Information Technology (InCIT) IEEE (2019), 149–154.

20.

Song

, Huang

and Ruan

, Abstractive text summarization using LSTM-CNN based deep learning, Multimedia Tools and Applications 78 (2019), 857–875.

21.

Jiang

, Zhang

, Dai

, Zhao

, Feng

, Ji

and Ganchev

, Enhancements of attention-based bidirectional lstm for hybrid automatic text summarization, IEEE Access 9 (2021), 123660–123671.

22.

Rahman

M.M.

and Siddiqui

F.H.

, An optimized abstractive text summarization model using peephole convolutional LSTM, Symmetry 11(10) (2019), 1290.

23.

, Abdulwahid, Software solution for text summarisation using machine learning based Bidirectional Encoder Representations from Transformers algorithm, IET Software (2023).

24.

Al Munzir

, Rahman

M.L.

, Abujar

and Hossain

S.A.

, Text analysis for Bengali text summarization using deep learning. In 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), IEEE. (2019), 1–6.

25.

, Pan

, Wang

, Yang

and Cambria

, A generative model for category text generation, Information Sciences 450 (2018), 301–315.