ACOA: Archimedes conditional autoregressive optimization algorithm based RMDL for web data classification

Abstract

Web data classification has become a subject of great value due to the increase of premium data source in the web data and as the pointer to utilizing this source of data. Deep learning is a technique that requires a significant quantity of well-designated data that is difficult and high consumption time to gather explicitly. To bridge this gap, a hybrid approach named Archimedes conditional autoregressive optimization algorithm (ACOA) is established for web data classification. Firstly, the input web page is fed up with the Bidirectional Encoder Representations from Transformers (BERT) tokenization phase. Secondly, Aspect term extraction (ATE) is done by the tokens for the identification of phrases selected by opinion indicators in review sentences. Thirdly, feature extraction is performed by obtained suitable features. Lastly, web data classification is performed by Random Multimodel Deep Learning (RMDL) that is tuned using ACOA. ACOA is an incorporation of Archimedes optimization algorithm (AOA) with the Conditional Autoregressive Value at Risk (CAViaR) model. The presented approach ACOA is evaluated with metrics like, precision, recall and F1-Score, which acquires the maximum values as 91.6%, 94.7% and 93.1%.

Keywords

bidirectional encoder representations from transformers conditional autoregressive value at risk random multimodel deep learning web data classification archimedes optimization algorithm

1 Introduction

Natural language processing (NLP) is the investigation of how computers can understand and alter natural language text or speech for advantageous reasons. So that the appropriate tools and techniques may be developed to help computers comprehend and manipulate natural languages to do the needed tasks, NLP specialists study to learn more about how people interpret and utilize language. NLP is based on a variety of disciplines, including computer and information studies, linguistics, mathematics, electrical and electronic engineering, artificial intelligence and robots, psychology, and others. Machine translation, user interfaces, multilingual and cross-language information retrieval (CLIR), speech recognition, and expert systems are just a few of the areas of study where NLP is applied.¹ NLP has drawn a lot of interest from experts in artificial intelligence (AI) due to the exponential rise in text data produced over time.² An interdisciplinary discipline called NLP researches and creates algorithms and systems that let computers comprehend and carry out tasks involving human language. Computational linguistics, computer speech and language processing, or human language technology are other names for NLP.³ NLP is involved with analyzing spoken and written human language to derive instructions or practical knowledge.⁴ A ranking number between a query and text in the corpus is assigned in information retrieval using a similarity measure. Applications involving questions and answers demand the ability to identify comparable questions and answers. Given the variety of natural language phrases, it is very challenging to evaluate the sentimental terms.⁵

Many natural language tasks, including word sense disambiguation, language modeling, synonym extraction, and automatic thesaurus extraction, effectively use semantic similarity measures.⁶ The ability to quantify the sentiment similarity or the distance between two concepts in terms of ontology is known as the semantic similarity between concepts. In other words, concepts with comparable “characteristics” are found using semantic similarity. Humans can evaluate whether two concepts are related even though they are not aware of the formal definition of that relationship.⁷ Semantic similarity measures have recently gained importance due to the Web's rapid growth in many Web-related tasks. Semantic similarity metrics are crucial in the business spheres as well. According to the sponsored search model, interested companies pay search engines in showing advertisements for their websites with the search option. When a business bids on a keyword, its advertisement is shown when the term is entered into the search engine. “Keyword generation,” which is when a company, which is implicated in executing a campaign advocate keywords that are related to that campaign, is a significant issue in this process. Then, as potential bid suggestions, keywords with a high degree of semantic similarity to the company can be used.⁸ Several key tasks, including retrieval of structured data based on partial specification, data and schema integration based on the similarity of definitions in different sources, and similarity-based query answering on the integrated model, are made easier by the ability to compute meaningful measures of similarity between data and data models.⁹

It is important to keep in mind when central correspondence actions for semantic network statistics that this data is frequently characterized using ontological acquaintance that establishes inherent in sequence about the data. This implicit knowledge must also be taken into account when defining a meaningful similarity measure for such data.⁹ With the rise in popularity of the Internet over the past ten years, the number of web sites has increased exponentially.^3,10 Web data can be downloaded straight from the internet or gathered using different APIs and web crawlers that are made available by third parties.¹¹ The two main difficulties with using web info are as follows: Retrieving pictures and their tags from a compilation of web data is crucial like, truthfully and as many as possible. If not, the network source of data may not precisely represent the illustration theme or may only have a small amount. The two issues are brought on by the complexity of web image content and the potential difference between web image data spread and target dataset.¹² As a result, classifying data from various websites using the web is a crucial job. Many computer vision tasks, including general picture classification, object detection, and scene recognition, have significantly improved thanks to deep learning. A faster and easier method as an option is to use a network to gather a lot of images. Although some noise in online data is unavoidable, the abundance of network data can make up for this shortcoming.¹³

The primary goal of this paper is described in this segment. A Novel framework is introduced for web data classification named ACOA_RMDL. The input web data based on text is considered and it is allowed to the BERT tokenization process. After the tokenization process, the ATE is performed and the feature extraction is done by utilizing punctuation marks, negation, question marks, exclamation marks, bag of units, sentence length, numerical words, hashtags, all caps, emoticons, and semantic-based similarity to acquire the suitable vectors. Furthermore, the web data classification is obtained by RMDL, which is trained by ACOA.

✓ Proposed ACOA_RMDL: A Novel web data classification approach is established employing ACOA_RMDL. Here, ACOA is performed using the classifier RMDL. The ACOA is blended with the incorporation of AOA and CAViaR.

The remainder portion of this research is spitted as, probe 2, 3 4 and 5. In 2, the reviews and the difficulties of the existing techniques are explained. The proposed methodology of ACOA and the structure of RMDL and the training algorithm are deliberated in part 3. Segment 4 expresses the resultant of comparative and algorithmic examination of the established technique. Lastly, the conclusion of this research, efficaciousness and future work is described in probe 5.

2 Literature review

Several online methods for gathering data offer a variety of information and optimized generation. However, some noisy groups may share the same traits. Thus, the previous schemes of web data classification are investigated to encourage and to devise the new approach.

Baharlou and Aghamaleki,¹⁴ developed the Transfer Learning Approach. The noisy dataset was used to train two state-of-the-art networks, which resulted in excellent recognition accuracy. However, when applied to a bigger dataset with more classes, it was unable to accomplish the noise reduction. Gopianand and Jaganathan,¹⁵ devised an optimal neural network (ONN) classifier. This strategy enhanced search efficiency and classification accuracy for different groups. However, a more effective algorithm was not used to produce improved results. Xiaoping Wu, et al.¹⁶; designed bidirectional self-paced learning (BiSPL) framework. The experimental evaluation on more tasks and large-scale datasets was not done, despite the fact that this method reduced the impact of sound by learning from network data sources in a concise manner. Jia Li, et al.¹⁷; established Ubiquitous Reweighting Network (URNet). By reweighing the impact of various classes, their labels, huge occurrence clusters, undersized occurrence bags, and their assurance, each instance has the potential to make a positive contribution by reducing noise and bias. This allows for a gradual reduction in the impact of sound and bias in web source, which enhances URNet's performance over time.

Yang, et al.¹²; introduced the progressive filtering method. It was successful in retrieving useful data from online data. However, this model's selection ability was very constrained for the smallest size. Li, et al.³; devised a deep web data source classification method. High performance and a workable approach were achieved using this strategy. However, there was an increase in labor consumption. Liu, et al.¹⁸; developed the weighted frequency algorithm P-TF-IDF. Nevertheless, it gave a source base for the apparition of seismic crisis data. The earthquake alert network web papers were only partially cleaned by the data cleaning framework. Patel and Verma, et al.¹⁹; created supervised learning classifiers. Higher precision was offered by the Support Vector Machine (SVM) and Support Vector Regression (SVR) classifiers. However, it was discovered that the K-Nearest Neighbor (K-NN) method was very memory and processing intensive.

With the aim of rectifying the drawbacks of the existing data classification techniques, this paper introduces an ACOA_RMDL for web data classification. Here, BERT tokenization is used for pre-processing. BERT saves a lot of time while building the NLP-based model. Also, it is available and pre-trained in more languages than other models. Moreover, RMDL is used for data classification. RMDL can be utilized in any kind of data set for classification. Also, the RMDL model can use any kind of optimizer. Hence, the proposed method offers better results than the previous data classification methods.

3 Proposed archimedes conditional autoregressive optimization algorithm _ random multimodel deep learning for web data classification

Even though web data is inherently sparse, optimizing the class number for each classification is an especially important issue when it comes to the analysis of web pages. The primary intention of this work is to establish ACOA_RMDL for web data classification. Initially, the input web page which is specified in the dataset²⁰ and²¹ is given to the BERT²² tokenization, in order to break the sentences into tokens. After that, ATE²³ is performed by the tokens to identify the phrases targeted by opinion indicators in the review sentence. Moreover, feature extraction is done in regards to achieve the suitable features. Finally, the web data classification is sophisticated employing RMDL,²⁴ which is trained by ACOA. ACOA is obtained by the incorporation of AOA²⁵ with the CAViaR model.²⁶ In Figure 1, the schematic view of ACOA_RMDL for web data classification.

Figure 1.

Modeled diagram of archimedes conditional autoregressive optimization algorithm _ random multimodel deep learning for web data classification.

3.1 Data acquisition

Consider a database W for web data classification, which is formulated as,

W = {W_{1}, W_{2}, \dots W_{u}, \dots W_{v}}

(1)

where, W is considered as the database in which the total number of images is taken as

W_{v}

and

W_{u}

is taken as the input for the web data classification.

3.2 Bidirectional encoder representations from transformers tokenization

The input web page data based on text $W_{u}$ is considered as the input which is allowed to the BERT²² tokenization process in regards to breaking down the phrases into tokens. BERT²² is nothing but, it breaks the word or sentences into sub-words or tokens. Its two phases are pre-training and fine-tuning. During pre-training, the model is trained with unlabeled data over a variety of pre-training exercises. The BERT model is initially initialized with the pre-trained parameters, and each parameter is subsequently fine-tuned using labeled data from the downstream jobs. Each downstream job has its own fine-tuned models while being initialized with the same pre-trained parameters. It is illustrated as $B_{u}$ .

3.3 Aspect term extraction

The input $B_{u}$ of BERT tokenization is subjected to the ATE,²³ which is utilized to extract the aspect terms from a review sentence which users have expressed their opinions. There is some process followed to perform ATE which is described as follows.

The BERT shared layers are enriched with Local Context Feature Generator (LCFG) and Global Context Feature Generator (GCFG) which is formulated as,

H_{B E R T}^{a} = B E R T^{a} (O^{a})

(2)

where, the input of GCFG is

O^{a}

and the LCFG outcome is represented as

H_{B E R T}^{a}

The aspect polarity classifier (APC) performs a head-pooling on the acquired concatenated context features. After extracting the hidden states on the matching location of the first token in the input sequence using head-pooling, the sentiment polarity is predicted using a Softmax operation. It is formulated as,

\begin{matrix} ℑ_{p o o l}^{b a} = P O O L (H_{F I L}^{b a}) \end{matrix}

(3)

\begin{matrix} H_{F I L}^{b a} = κ (H_{d e n s e}^{b a}) \end{matrix}

(4)

\begin{matrix} H_{dense}^{b a} = J^{b a} \cdot H^{b a} + w^{b a} \end{matrix}

(5)

\begin{matrix} Z_{p o l} = \frac{e x p (ℑ_{p o o l}^{b a})}{\sum_{ε = 1}^{o} e x p (ℑ_{p o o l}^{b a})} \end{matrix}

(6)

where, the class of token is expressed as o, the polarity evaluated by APC is denoted as

Z_{pol}

and local and global context features are as

H^{b}

and

H^{a}

κ

signifies Multi-Head Self Attention (MHSA),

J^{b a}

and

w^{b a}

indicated as weight and bias vectors.

The aspect word extractor completes the token-level classification for each token, which is calculated as,

Z_{t e r m} = \frac{e x p (H_{ζ})}{\sum_{ε = 1}^{ξ} e x p (H_{ζ})}

(7)

where,

ξ

is the category of token which is reduced by APC is illustrated as

Z_{t e r m}

Therefore, the outcome of ATE is implied as $E_{u}$ .

3.4 Feature extraction

The ATE $E_{u}$ is the input for the feature extraction process. The following are the ones utilized for web data classification which is discussed in the below phases.

(a)
Punctuation Marks
It is referred to as the collection of symbols used to control texts and make their contents clear, primarily by separating or joining words, which is illustrated as $f_{1}$ . (b)
Negation
It is defined as the number of negation in the text which is constructed by interchanging the truth value of the statement and it is formulated as $f_{2}$ . (c)
Question Marks
It is an interrogation point used to punctuate interrogative sentences at the end of question tags, which is illustrated as $f_{3}$ . (d)
Exclamation Marks
It is commonly referred to as the exclamation point, is used to denote intense feelings and emotions. It is used with interjections and in exclamatory sentences. It is calculated as $f_{4}$ . (e)
Bag of words
It is a method used in natural language processing to describe an image feature. NLP is described as $f_{5}$ . (f)
Sentence Length
It has generated a great deal of interest in linguistic and literary studies because it is defined by the number of words in the phrase and it is signified as $f_{6}$ . (g)
Numerical Words
The numeral of various text or arithmetic digits utilized to deliver the values is represented as $f_{7}$ . (h)
Hash-tags
It is referred by symbol # and it is used to separate the contents and it have affirmative and pessimistic which counts and includes it as two aspects which is implied as $f_{8}$ . (i)
All Caps
It describes the overall capitalized words and it is denoted as $f_{9.}$ .
$f_{9} = \sum_{l = 1}^{n} N_{b b}^{l}$
(8)
where, $N_{b b}^{l}$ is denoted as all capital words. (j)
Emoticons
It is the representation of facial expressions using specific numbers, punctuations and alphabets which are specified as $f_{10}$ . (k)
Semantic based similarity
It is used to assess measures that are entirely text-based and make use of the proximity or context of words or terms, which is denoted as $f_{11}$ .²⁷ It is formulated as,
$B^{E} (ϖ_{1}, ϖ_{2}) = \frac{\sum_{δ = 1}^{\partial} ℓ_{ϖ_{1}, δ} ℓ_{ϖ_{2}, δ}}{\sqrt{\sum_{δ = 1}^{\partial} {(ℓ_{ϖ_{1}, δ})}^{2}} \sqrt{\sum_{δ = 1}^{\partial} {(ℓ_{ϖ_{2}, δ})}^{2}}}$
(9)
where, $ϖ_{1}$ and $ϖ_{2}$ are referred as cosine similarity, $B^{E}$ is the valuation between two words.

Finally, the feature vector is formulated as,
$F_{u} = {f_{1,} f_{2}, \dots \dots . . f_{11}}$
(10)
3.5 Web data classification

Web data can be easily gathered from the internet, but it might be cluttered with noise because of the way most search engines work. Therefore, choosing reliable data from noisy web data is the secret to web data learning. Additionally, some deep learning studies demonstrate that clean, hard examples benefit model training. As a result, the RMDL structure is explained below.

(i)
Structure of Random Multimodel Deep Learning
The feature vectors $F_{u}$ are the input for web data classification to categorize the data utilizing the RMDL classifier which is trained using ACOA. RMDL structure consists of deep neural networks (DNNs), Recurrent neural network (RNN) and Convolutional Neural Network (CNN) which are explained in the below phase. (a)
Deep Neural Networks
DNN are built with many connections between their layers, with each layer only receiving connections from the layer before it and only providing connections to the layer after it in the hidden portion. It is formulated as,
$\begin{matrix} g (y) = \frac{1}{1 + k^{- y}} \in (0, 1) \end{matrix}$
(11)

$\begin{matrix} g (y) = max (0, y) \end{matrix}$
(12)

$\begin{matrix} α (x) = \frac{k^{x_{z}}}{\sum_{q = 1}^{Q} k^{x_{q}}} \forall z \in {1, \dots . Q} \end{matrix}$
(13)
DNN output is signified as $I_{ℏ}$ . (b)
Recurrent Neural Networks
The Long Short-term Memory (LSTM) and Gated Recurrent Unit (GRU) are both used to resolve the tribulations with RNN. LSTM is good at remembering patterns over short periods of time, while GRU is good at dealing with difficult sequences. RNN assign additional importance to the preceding data points in succession, which makes it a great tool for text, string, and sequential data sorting. It can also be utilized for image cataloging.
$\begin{matrix} y_{ℏ} = G (y_{ℏ - 1}, μ_{ℏ}, β) \end{matrix}$
(14)

$\begin{matrix} y_{ℏ} = M_{r e c} α (y_{ℏ - 1}) + M_{i n} μ_{ℏ} + ƛ \end{matrix}$
(15)
where, $M_{r e c}$ is signified as the weight of recurrent matrix, $M_{i n}$ implied as input weight, the bias is represented as $ƛ$ and an element-wise function is denoted as $α$ .

Long Short-term Memory: LSTM helps you keep a strong connection between memories over time, which is helpful when trying to trounce the “vanishing gradient” issue. LSTM is more complex than RNNs, with multiple gates that help regulate how much information gets passed into each node. The input gates, memory cell values, forget gate activation, new memory cell value, and output gate values, which is illustrated below.
$\begin{matrix} j_{ℏ} = α (M_{j} [y_{ℏ}, i_{ℏ - 1}] + ƛ_{j}) \end{matrix}$
(16)

$\begin{matrix} \vec{d_{ℏ}} = \tanh (M_{d} [y_{ℏ}, i_{ℏ - 1}] + ƛ_{d}) \end{matrix}$
(17)

$\begin{matrix} g_{ℏ} = α (M_{g} [y_{ℏ}, i_{ℏ - 1}] + ƛ_{g}) \end{matrix}$
(18)

$\begin{matrix} d_{ℏ} = j_{ℏ} * \vec{d_{ℏ}} + g_{ℏ} d_{ℏ - 1} \end{matrix}$
(19)

$\begin{matrix} c_{ℏ} = α (M_{c} [y_{ℏ}, i_{ℏ - 1}] + ƛ_{c}) \end{matrix}$
(20)

$\begin{matrix} i_{ℏ} = c_{ℏ} \tanh (d_{ℏ}) \end{matrix}$
(21)
Gated Recurrent Unit: The GRU has no internal memory, and it doesn't follow a linear pattern, which is calculated as,
$\begin{matrix} i_{ℏ} = x_{ℏ} \circ i_{ℏ - 1} + (1 - x_{ℏ}) \circ α_{i} (M_{i} y_{ℏ} + S_{i} (t t_{ℏ} \circ i_{ℏ - 1}) + ƛ_{i}) \end{matrix}$
(22)
where, the output vector is referred as $i_{ℏ}$ , $t t_{ℏ}$ and $x_{ℏ}$ is represented as reset and update gate vector, the hyperbolic tangent function is specified as $α_{i}$ . (c)
Convolutional Neural Network
CNNs are used to classify documents or images. They are similar to the way the brain processes information, and they are often used to classify text. In RMDL, this technique is utilized on all of our datasets, which is implied as $N_{ℏ}$

The classified outcome of web data classification is formulated as $R_{u}$ . In Figure 2, the structure of RMDL is drawn. (ii)
Archimedes conditional autoregressive optimization algorithm

Figure 2.
Structure of random multimodel deep learning.

Every person in the population serves as the pre-occupied objects in the developed method called AOA. AOA also introduces items with random volumes, densities, and accelerations at the beginning of the search process. At this moment, a random spot in the fluid serves as the initialization point for each item. After evaluating the fitness of the starting population, AOA repeats iterations until the termination condition is satisfied. AOA changes each iteration's density and volume for every object. Based on whether an object collides with any other nearby objects, its acceleration is changed. The new location is computed by the upgraded density, volume, and acceleration. The optimally tuned algorithm of ACOA is discussed in the following steps.
3.5.1 Solution encoding

The uttermost outcome for the particular algorithm concern in a search space $(χ)$ is selected in order to provide the best result and to evaluate the features. The solution encoding is represented in equation (23).

χ = [1 \times ω]

(23)

where,

ω

represents the weights.

3.5.2 Fitness measure

This measure is employed to analyze the peak solution to obtain the greatest solution. It is formulated as,

A = \frac{1}{v} \sum_{u = 0}^{v} [τ_{o u t} - F_{u}]^{2}

(24)

where, the value of fitness is measured as A,

τ_{o u t}

defined the target of output and

F_{u}

exploits the outcome of the feature vector.

Step 1: Initialization

The position of all objects

C_{m}

, density

D e n s_{m}

, volume

V o l m_{m}

and acceleration

A c c_{m}

are initialized to select the best solution by using the following equations,

\begin{matrix} C_{m} = L_{m} + r a n d \times (U_{m} - L_{m}) \end{matrix}

(25)

\begin{matrix} D e n s_{m} = r a n d \end{matrix}

(26)

\begin{matrix} V o l m_{m} = r a n d \end{matrix}

(27)

\begin{matrix} A c c_{m} = L_{m} + r a n d \times (U_{m} - L_{m}) \end{matrix}

(28)

where,

U_{m}

and

L_{m}

are the upper and lower band of the object.

Step 2: Evaluate Fitness Function

In this process, the best outcome should be selected, so that the value of fitness can be examined on the solution by using equation (24). The value found to be exact is utilized for the further aspect.

Step 3: Upgrade volume and density

The volume and density of objects is computed by,

\begin{aligned} D e n s_{m}^{r + 1} = D e n s_{m}^{r} + r a n d (D e n s_{b e s t} - D e n s_{m}^{r}) \end{aligned}

(29)

\begin{aligned} V o l m_{m}^{r + 1} = V o l m_{m}^{r} + r a n d (V o l m_{b e s t} - V o l m_{m}^{r}) \end{aligned}

(30)

where,

D e n s_{b e s t}

and

V o l_{b e s t}

is defined as the best object of density and volume.

Step 4: Evaluate transfer operator and density factor

Once the time has passed since the first collision, the objects attempt to attain an equilibrium condition. With the aid of the transfer operator, this is implemented in AOA, changing search from exploration to exploitation and is signified as,

T f = e x p (\frac{r - r_{m a x}}{r_{m a x}})

(31)

D f^{r + 1} = e x p (\frac{r_{m a x} - r}{r_{m a x}}) - (\frac{r}{r_{m a x}})

(32)

where,

D f^{r + 1}

define the time decrease.

Step 5: Exploration phase

T f \leq 0.5

, the collision between objects occurs, select a random material which is represented as,

A c c_{m}^{r + 1} = \frac{D e n s_{o t} + V o l m_{o t} \times A c c_{o t}}{D e n_{m}^{r + 1} \times V o l m_{m}^{r + 1}}

(33)

where, the random material of acceleration, density and volume is represented as

D e n s_{o t}

V o l m_{o t}

and

A c c_{o t}

Step 6: Exploitation Phase

T f > 0.5

, there is no collision between objects and it is calculated as,

A c c_{m}^{r + 1} = \frac{D e n s_{b e s t} + V o l m_{b e s t t} \times A c c_{b e s t}}{D e n_{m}^{r + 1} \times V o l m_{m}^{r + 1}}

(34)

where,

A c c_{b e s t}

is the best object of acceleration.

Step 7: Compute acceleration

It is utilized to calculate the percentage of change, which is illustrated as,

A c c_{m - n o r}^{r + 1} = e \times \frac{A c c_{m}^{r + 1} - M i n (A c c)}{M a x (A c c) - M i n (A c c)} + 1

(35)

Step 8: Upgrade the position

The standard expression from AOA is followed for the upgrade process, which is formulated as,

\begin{matrix} Y_{m}^{r + 1} = Y_{m}^{r} + D_{1} \times r a n d \times A c c_{m - n o r}^{r + 1} \times D f (Y_{r a n d} - Y_{m}^{r}) i f T f \leq 0.5 \end{matrix}

(36)

\begin{matrix} Y_{m}^{r + 1} = Y_{m}^{r} (1 - D_{1} \times r a n d \times A c c_{m - n o r}^{r + 1} \times D f) + D_{1} \times r a n d \times A c c_{m - n o r}^{r + 1} \times D f \times Y_{r a n d} \end{matrix}

(37)

The equation from CAViaR is expressed as,

Y_{m}^{r} = λ_{0} + \sum_{o = 1}^{p} λ_{o} Y_{m}^{r - o} + \sum_{s = 1}^{t} h (Y_{m}^{r - s})

(38)

Let us consider,

p = t = 2

, the equation becomes,

Y_{m}^{r} = λ_{0} + λ_{1} Y_{m}^{r - 1} + λ_{2} Y_{m}^{r - 2} + λ_{1} h (Y_{m}^{r - 1}) + λ_{2} h (Y_{m}^{r - 2})

(39)

Substitute equation (39) in equation (37),

\begin{aligned} Y_{m}^{r + 1} & = (λ_{0} + λ_{1} Y_{m}^{r - 1} + λ_{2} Y_{m}^{r - 2} + λ_{1} h (Y_{m}^{r - 1}) + λ_{2} h (Y_{m}^{r - 2})) (1 - D_{1} \times r a n d \times A c c_{m - n o r}^{r + 1} \times D f) \\ + D_{1} \times r a n d \times A c c_{m - n o r}^{r + 1} \times D f \times Y_{r a n d} \end{aligned}

(40)

where,

Y_{m}^{r - 1}

and

Y_{m}^{r - 2}

represents the position of the solution at

(t - 1)^{t h}

and

(t - 2)^{t h}

iteration and

D_{1}

is the constant.

Step 9: Termination

This process will repeat until this approach acquires the uttermost outcome with better efficaciousness. The pseudo code of ACOA is discussed in Algorithm 1.

Algorithm 1.

Pseudo code of Archimedes conditional autoregressive optimization algorithm _ Random Multimodel Deep Learning.

SL. No.	Pseudo code of ACOA_RMDL
	Input: Population Size, Maximum iteration $r_{m a x}$ and random numbers
	Output: $Y_{m}^{r + 1}$
1	Begin
2	Initialize the population (weights of RMDL) utilizing equations (25), (26), (27) and (28)
3	Compute Fitness value by equation (24)
4	Assume $r = 1$
5	While $r \leq r_{m a x}$ do
6	for each object m do
7	Upgrade density and volume by equations (29) and (30)
8	Upgrade the density and transfer factors using equations (31) and (32)
9	if $T f \leq 0.5$ , then
10	Upgrade acceleration by equations (33) and (35)
11	Upgrade the position of the solution employing equation (40)
12	else
13	Upgrade acceleration by equations (34) and (35) and generate a new position
14	end if
15	end for
16	Return
17	Terminate

4 Results and discussion

This segment portrays the outcomes of the presented technique ACOA_RMDL and deliberated the effectiveness of this method. The experimentation is performed in the Personal Computer with the PYTHON tool.

4.1 Evaluation metrics

The metrics employed for ACOA_RMDL is deliberated as follows,

(a)
Precision
It displays the likelihood that all of the positives are genuine positives, which is formulated as,
$Pr = \frac{T}{T + Υ}$
(41)
where, the true positives and false positive is denoted as $T$ and $Υ$ , precision is signified as $P r$ . (b)
Recall
The ratio of accurately forecasted positives to all positives, which is calculated as,
$R e = \frac{T}{T + P}$
(42)
where, $P$ denotes the false negative and $R e$ signifies recall. (c)
F1- Score
Better evaluation is given, and the average precision and recall are calculated. It is calculated as,
$F s = 2 * (\frac{Pr * R e}{Pr + R e})$
(43)
where, $F s$ is the F1- Score.
4.2 Dataset description

4.2.1 Website classification dataset 1

Website classification dataset 1²⁰ is the database used for classifying online data. It was made by scraping various web pages and categorizing them according to the text that was extracted. This collection weighs in at 7.51 MB in size.

4.2.2 Website classification dataset 2

Website classification dataset 2²¹ is used for URL based classification. The Open Directory Project (ODP) was another name for the website and the group of people that kept it updated. Although a community of volunteer editors built and maintained it, AOL (now a subsidiary of Verizon Media) controlled it. The size of the dataset is 82.72 MB.

4.3 Performance analysis

This section presents the performance analysis of the proposed method using the metrics, such as precision, recall, and F1-score.

4.3.1 Performance analysis using website classification dataset 1

a.
Analysis by varying the number of hidden layers

Figure 3 presents the performance of the proposed ACOA_RMDL by varying the number of hidden layers. Figure 3(a) shows the precision of the proposed method. When the learning set is 60%, the precision of the proposed method with layers 2, 4, 6, and 8 is 0.809, 0.828, 0.838, and 0.865, respectively. The precision increases with the increase in the learning set. Figure 3(b) shows the recall of the proposed ACOA_RMDL. The proposed method has the recall of 0.889, 0.898, 0.918, and 0.936 with layers 2, 4, 6, and 8, respectively, for 90% of the training data. Similarly, the F1-score of the proposed method is shown in Figure 3(c). When the learning set is 90%, the proposed method has the F1-score of 0.874, 0.883, 0.902, and 0.921 with layers 2, 4, 6, and 8, respectively. b.
Analysis by varying the hidden neurons

Figure 3.
Performance analysis by varying the number of hidden layers for website classification dataset 1.

Figure 4 shows the performance of the proposed ACOA_RMDL by varying the number of hidden neurons. Figure 4(a) depicts the precision of the proposed method. When the learning set is 60%, the precision of the proposed method with hidden neurons 5, 10, 15, and 20 is 0.828, 0.839, 0.858, and 0.865, respectively. The recall of the proposed ACOA_RMDL is shown in Figure 4(b). For 90% of the learning set, the proposed method has the recall of 0.849, 0.868, 0.877, and 0.897 with hidden neurons 5, 10, 15, and 20, respectively. The F1-score of the proposed method is depicted in Figure 4(c). The proposed method has the F1-score of 0.838, 0.853, 0.867, and 0.881 for 90% of the learning set with hidden neurons 5, 10, 15, and 20, respectively.

Figure 4.
Performance analysis by varying the number of hidden neurons for website classification dataset 1.

From Figures 3 and 4, it is noted that the performance of the proposed method is high when the number of hidden layers and the hidden neurons is 8 and 20, respectively.
4.3.2 Performance analysis using website classification dataset 2

a.
Analysis by varying the number of hidden layers

The performance of the proposed ACOA_RMDL is analyzed by varying the number of hidden layers and the results are plotted in Figure 5. Figure 5(a) shows the precision of the proposed method. When the learning set is 60%, the precision of the proposed method with layers 2, 4, 6, and 8 is 0.788, 0.810, 0.818, and 0.838, respectively. Figure 5(b) illustrates the recall of the proposed ACOA_RMDL. The proposed method has the recall of 0.818, 0.829, 0.849, and 0.876 with layers 2, 4, 6, and 8, respectively, for 70% of the training data. Likewise, the F1-score of the ACOA_RMDL is shown in Figure 5(c). When the learning set is 80%, the ACOA_RMDL has the F1-score of 0.823, 0.839, 0.857, and 0.887 with layers 2, 4, 6, and 8, respectively. b.
Analysis by varying the hidden neurons

Figure 5.
Performance analysis by varying the number of hidden layers for website classification dataset 2.

Figure 6 demonstrates the performance of the proposed ACOA_RMDL by varying the number of hidden neurons. Figure 6(a) portrays the precision of the proposed method. When the learning set is 90%, the precision of the proposed method with hidden neurons 5, 10, 15, and 20 is 0.829, 0.859, 0.867, 0.889, respectively. The recall of the proposed ACOA_RMDL is shown in Figure 6(b). For 70% of the learning set, the proposed method has the recall of 0.818, 0.828, 0.847, and 0.876 with hidden neurons 5, 10, 15, and 20, respectively. The F1-score of the proposed method is depicted in Figure 6(c). The proposed method has the F1-score of 0.829, 0.843, 0.863, and 0.887 for 80% of the learning set with hidden neurons 5, 10, 15, and 20, respectively.

Figure 6.
Performance analysis by varying the number of hidden neurons for website classification dataset 2.

On seeing Figures 5 and 6, it is identified that the proposed ACOA_RMDL offers the good performance with high values of precision, recall, and F1-score. Particularly, the best performance is attained when the number of hidden layers is 8 and the number of hidden neurons in 20. From the analysis, one can know that the proposed method is a powerful tool for handling and classifying complex web data.
4.4 Comparative analysis

The evaluation based on the learning set and K-fold is examined and deliberated in the below segment. The strategies utilized for ACOA_RMDL are Transfer learning,¹⁴ ONN,¹⁵ BiSPL¹⁶ and URNet¹⁷ by varying learning set and K-fold.

Transfer Learning: The behaviour of several deep architectures trained on a noisy dataset is examined in the suggested system, which uses the transfer learning technique. Furthermore, the collected dataset's external noise is minimized through the utilization of an unsupervised technique known as Isolation Forest, and the updated training outcomes are scrutinized.

ONN: Here, several XML documents are gathered, from which the features are taken out. Following feature extraction, a probability-based feature selection method is used to choose the pertinent features. Next, the weighted fuzzy C means clustering algorithm (WFCM) is used to group the features that have been chosen. After that, the data quality in the XML web is examined by taking into account several error categories. To rank the web pages, the features are then fed into the optimal neural network classifier. Here, the whale optimization technique is used to choose the weights in the best possible way.

BiSPL: There are two crucial steps in the BiSPL framework. First, the web samples with short distances are sampled and merged to create a new training set, based on the distances determined between web samples and labelled source samples. Second, using the new training set, both hard and easy samples are used at first to train deep models for greater stability. As training goes on, hard examples are gradually removed to lower noise. Deep models converge to an optimal solution by alternating these phases iteratively.

URNet: It uses a lot of noisy input to develop an image classification model. Here, the assumption is made that each training instance has the capacity to make a positive contribution by reducing noise and bias in the data by reweighting each instance's influence based on factors such as labels, tiny instance bags, large instance clusters, confidence, and various class sizes. By doing this, the impact of bias and noise in the web data may be progressively reduced, which will enhance URNet's performance over time.

a.
Performance based on learning set for Website classification dataset1
Figure 7, exploits the valuation of ACOA_RMDL based on a learning set. In Figure 7(a) the ACOA_RMDL by means of precision is illustrated. If the training set = 90%, the ACOA_RMDL achieved 0.908, while the performance of conventional techniques namely, Transfer learning, ONN, BiSPL and URNet obtained precision with 10.133%, 6.584%, 5.483% and 3.550%. Figure 7(b) implies the ACOA_RMDL by means of recall. The presented scheme achieved recall as 0.936, while the previous models attained performance enhancement with 9.500%, 8.196%, 5.347% and 3.114%, by the learning set as 90%. In Figure 7(c), the F1- score of ACOA_RMDL is deliberated. With 90% of the learning set, the ACOA_RMDL of F1- score had 0.921, Transfer learning had 9.822%, ONN had 7.385%, BiSPL had 5.417% and URNet had 3.336%. b.
Performance based on K-fold for Website classification dataset1

Figure 7.
Valuation of archimedes conditional autoregressive optimization algorithm _ random multimodel deep learning for website classification dataset 1.

Figure 8, represents the evaluation of ACOA_RMDL based on K-fold. In Figure 8(a), the ACOA_RMDL utilizing precision is illustrated with the K-fold as 8. ACOA_RMDL attained precision as 0.916 and the performance improvement of existing methods such as, Transfer learning, ONN, BiSPL and URNet acquired 9.829%, 6.311%, 5.497% and 3.303%. Figure 8(b), the analysis of ACOA_RMDL in regard of recall is described. When the k-fold is 8, the ACOA_RMDL by means of recall observed 0.947 while comparing it to the conventional techniques like, Transfer learning obtained 9.388%, ONN achieved 8.591%, BiSPL attained 5.316% and URNet observed 3.390%., when the k-fold is considered as 8. In Figure 8(c), the F1-Score of ACOA_RMDL is deliberated. The previous approaches based on F1- score attained 9.612%, 7.446%, 5.408% and 3.346%, while the ACOA_RMDL acquired 0.931, when the k-fold =8. c.
Performance based on learning set for Website classification dataset 2

Figure 8.
Valuation of archimedes conditional autoregressive optimization algorithm _ random multimodel deep learning for website classification dataset 1.

Figure 9, exploits the valuation of ACOA_RMDL based on a learning set. In Figure 9(a), the ACOA_RMDL by means of precision is illustrated. If the training set = 90%, the ACOA_RMDL achieved 0.889, while the conventional techniques namely, Transfer learning, ONN, BiSPL and URNet obtained precision with 0.788, 0.808, 0.818, and 0.837. Figure 9(b) implies the ACOA_RMDL by means of recall. The presented scheme achieved recall as 0.908, while the previous models attained 0.817, 0.838, 0.858, and 0.866, by the learning set as 90%. In Figure 9(c), the F1- score of ACOA_RMDL is deliberated. With 90% of the learning set, the ACOA_RMDL of F1- score had 0.898, Transfer learning had 0.802, ONN had 0.823, BiSPL had 0.837 and URNet had 0.851. d.
Performance based on K-fold for Website classification dataset 2

Figure 9.
Valuation of archimedes conditional autoregressive optimization algorithm _ random multimodel deep learning for website classification dataset 2.

Figure 10, represents the evaluation of ACOA_RMDL based on K-fold. In Figure 10(a), the ACOA_RMDL utilizing precision is illustrated with the K-fold as 8. ACOA_RMDL attained precision as 0.908 and existing methods such as, Transfer learning, ONN, BiSPL and URNet acquired 0.799, 0.840, 0.849, and 0.866. Figure 10(b), the analysis of ACOA_RMDL in regard of recall is described. When the k-fold is 8, the ACOA_RMDL by means of recall observed 0.927 while comparing it to the conventional techniques like, Transfer learning obtained 0.858, ONN achieved 0.877, BiSPL attained 0.898 and URNet observed 0.908, when the k-fold is considered as 8. In Figure 10(c), the F1-Score of ACOA_RMDL is deliberated. The previous approaches based on F1- score attained 0.827, 0.858, 0.873, and 0.886, while the ACOA_RMDL acquired 0.917, when the k-fold =8.

Figure 10.
Valuation of archimedes conditional autoregressive optimization algorithm _ random multimodel deep learning for website classification dataset 2.
4.5 Algorithmic analysis

Various algorithmic approaches is provided for the evaluation namely, COOT,²⁸ SFO,²⁹ DOX,³⁰ AOA²⁵ and proposed ACOA_RMDL by changing population size.

a.
Performance based on Website classification dataset 1
Figure 11, dissipates the evaluation of ACOA_RMDL by varying population size. Figure 11(a) shows the precision of ACOA_RMDL. When the population size is 5, ACOA_RMDL with respect to precision obtained 0.858; the conventional techniques acquired 0.775, 0.787, 0.808 and 0.826. ACOA_RMDL obtained 0.908, while comparing, the performance gain achieved by former techniques namely COOT, SFO, DOX and AOA attained 10.244%, 9.152%, 6.706% and 4.773% by considering population size as 20. In Figure 11(b), the recall of ACOA_RMDL is represented. If the population size =5, then the ACOA_RMDL had the recall as 0.897 and the traditional strategies had 0.825, 0.847, 0.854 and 0.875. Figure 11(c) illustrates the F1- score of ACOA_RMDL. By considering the population size as 5, the ACOA_RMDL achieved the F1- score as 0.877 and the previous techniques like COOT obtained 0.799, SFO acquired 0.816, DOX achieved 0.830 and AOA attained 0.850. b.
Performance based on Website classification dataset 2

Figure 11.
Algorithmic valuation of archimedes conditional autoregressive optimization algorithm _ random multimodel deep learning for website classification dataset 1.

The evaluation of ACOA_RMDL is dispersed by changing the population size in Figure 12. Figure 12(a) illustrates the accuracy of ACOA_RMDL. With regard to precision, ACOA_RMDL obtained 0.889 when the population size was 20, but conventional approaches, such as; COOT + RMDL, SFO + RMDL, DOX + RMDL, AOA + RMDL obtained 0.788, 0.809, 0.829, and 0.848. The recall of ACOA_RMDL is shown in Figure 12(b)). The recall for the ACOA_RMDL was 0.908 when the population size was 20, while the recall for the conventional techniques, such as, COOT + RMDL, SFO + RMDL, DOX + RMDL, AOA + RMDL was 0.829, 0.848, 0.865, and 0.877. Figure 12(c)) shows the ACOA_RMDL's F1-score. With a population size of 20, the ACOA_RMDL obtained an F1-score of 0.898, compared to prior methods such as COOT + RMDL obtained 0.808, SFO + RMDL obtained 0.828, DOX + RMDL obtained 0.847, and AOA + RMDL obtained 0.862.

Figure 12.
Algorithmic valuation of archimedes conditional autoregressive optimization algorithm _ random multimodel deep learning for website classification dataset 2.
4.6 Comparative discussion

In Table 1, indicates the comparative discussion of ACOA_RMDL when compared to the former approaches. From this table, the evaluation metrics utilized for ACOA_RMDL namely, precision, recall and F1- score obtained with maximum values of 91.6%, 94.7% and 93.1% for Website classification dataset 1. Similarly, for website classification dataset 2, the ACOA_RMDL obtained maximum values of 90.8%, 92.7%, and 91.7% for precision, recall and F1- score.

Table 1.
Comparative discussion.

Dataset Metrics/Methods Transfer learning ONN BiSPL URNet Proposed ACOA_RMDL

Website classification dataset 1 Learning set

Precision 0.816 0.848 0.858 0.875 0.908

Recall 0.847 0.859 0.886 0.907 0.936

F1 score 0.831 0.853 0.872 0.891 0.921

K-fold

Precision 0.826 0.858 0.865 0.885 0.916

Recall 0.858 0.865 0.896 0.915 0.947

F1 score 0.841 0.862 0.881 0.900 0.931

Website classification dataset 2 Learning set

Precision 0.788 0.808 0.818 0.837 0.889

Recall 0.817 0.838 0.858 0.866 0.908

F1 score 0.802 0.823 0.837 0.851 0.898

K-fold

Precision 0.799 0.840 0.849 0.866 0.908

Recall 0.858 0.877 0.898 0.908 0.927

F1 score 0.827 0.858 0.873 0.886 0.917

Dataset	Metrics/Methods	Transfer learning	ONN	BiSPL	URNet	Proposed ACOA_RMDL
Website classification dataset 1	Learning set
Precision	0.816	0.848	0.858	0.875	0.908
Recall	0.847	0.859	0.886	0.907	0.936
F1 score	0.831	0.853	0.872	0.891	0.921
K-fold
Precision	0.826	0.858	0.865	0.885	0.916
Recall	0.858	0.865	0.896	0.915	0.947
F1 score	0.841	0.862	0.881	0.900	0.931
Website classification dataset 2	Learning set
Precision	0.788	0.808	0.818	0.837	0.889
Recall	0.817	0.838	0.858	0.866	0.908
F1 score	0.802	0.823	0.837	0.851	0.898
K-fold
Precision	0.799	0.840	0.849	0.866	0.908
Recall	0.858	0.877	0.898	0.908	0.927
F1 score	0.827	0.858	0.873	0.886	0.917

4.7 Training time

Table 2 shows the training time of the proposed and existing methods. The training time of the proposed method is 9.456 s, whereas the training time of the existing methods, such as, Transfer learning, ONN, BiSPL and URNet, are 12.432 s, 11.865 s, 10.986 s, and 10.356 s.

Table 2.
Training time.

Methods Training time(seconds)

Transfer learning 12.432

ONN 11.865

BiSPL 10.986

URNET 10.356

Proposed ACOA_RMDL 9.456

Methods	Training time(seconds)
Transfer learning	12.432
ONN	11.865
BiSPL	10.986
URNET	10.356
Proposed ACOA_RMDL	9.456

4.8 Statistical analysis

One of the most effective techniques for comparing experimental results is statistical hypothesis testing.³¹ This method involves making inferences about a hypothesis based on the observation of processes modeled through random variables. The primary goal is to determine whether a sample of results supports a specific hypothesis and if the conclusions drawn can be generalized beyond the tested scenarios. In statistical hypothesis testing, the P-value, or probability value, quantifies the likelihood that the observed results could occur under the null hypothesis. P-values assist in deciding whether to reject the null hypothesis. A smaller P-value indicates that the observed results are less likely to have occurred under the null hypothesis, suggesting that the null hypothesis should be rejected. P-values are typically reported as decimals ranging from 0 to 1, with a critical value, usually 0.05, considered significant. If the p-value is less than the critical value, the null hypothesis is rejected; if it is equal to or greater than the critical value, the null hypothesis is not rejected. Here, three tests, namely Analysis of Variance (ANOVA), Levene, and Shapiro-Wilks are conducted and the resultant P-values are provided in Table 3.

Table 3.
Statistical analysis.

Tests

Anova Levene Shapiro-Wilks

Metrics P-value

Dataset-1 Precision 0.000119 0.001568 0.006179

Recall 0.000026 0.00367 0.006455

F1 score 0.000053 0.003568 0.006664

Dataset-2 Precision 0.000027 0.000632 0.008829

Recall 0.00008 0.000848 0.007009

F1 score 0.000043 0.00381 0.008379

		Tests
Dataset-1	Precision	0.000119	0.001568	0.006179
Recall	0.000026	0.00367	0.006455
F1 score	0.000053	0.003568	0.006664
Dataset-2	Precision	0.000027	0.000632	0.008829
Recall	0.00008	0.000848	0.007009
F1 score	0.000043	0.00381	0.008379

ANOVA test is used to determine if there are significant differences between the means of three or more groups. The Levene test assesses whether multiple groups have equal variances within the population. It is used to test the null hypothesis that the variances of the samples being compared are equal. The Shapiro-Wilk test is a hypothesis test that examines whether a data set follows a normal distribution. It compares the sample data against the null hypothesis that the data set is normally distributed. A high p-value suggests that the data set is normally distributed, while a low P-value suggests it is not. On seeing Table 3, it is noted that the proposed method rejects the null hypothesis by obtaining the P-value less than 0.05.

4.9 Scalbility analysis

Scalability analysis is the evaluation of the classifier's performance as the dataset's size or complexity grows. This involves evaluating the classifier's ability to handle larger volumes of data, more features, and more complex patterns without a significant degradation in performance. Figure 13 shows the scalability of the proposed method and the existing methods for various data sizes. When the data size is 7 MB, the running time of the methods, such as Transfer learning, ONN, BiSPL and URNet is 74.745 s, 60.963 s, 41.173 s, 2 and 1.716 s On the other hand, the running time of the proposed method is 6.177 s The running time of all methods increases with the increase in the data size. When the data size is 78 MB, the running time of the comparative methods, such as Transfer learning, ONN, BiSPL, URNet, and ACOA_RMDL is 108.212sec., 92.736 s, 71.275sec., 53.761sec., ad 28.818 s, respectively. When compared to the other methods, the running time of the proposed method is low, which means that the proposed method is highly scalable in large size of the data.

Figure 13.

Scalability analysis.

5 Conclusion

Due to the Internet's increasing popularity, the number of web sites has recently increased dramatically. A promising method for tackling the issue of insufficient data when training DL is to use web data. However, incorrect tags are frequently found in web pictures, which could undermine the DL strategy. Hence, in this research, a hybrid framework ACOA-enabled RMDL is designed for web data classification. First, the input web data based on text is considered and it is allowed to the BERT tokenization. After that process, the aspect term extraction is done. Moreover, the feature extraction is done by utilizing punctuation marks, negation, question marks, and exclamation marks, bag of units, sentence length, numerical words, hashtags, all caps, emoticons, and semantic based similarity to acquire the suitable vectors. Furthermore, the web data classification is obtained by RMDL, which is trained by ACOA. Here, ACOA is blended with the integration of AOA and CAViaR. The performance metrics employed for ACOA_RMDL are precision, recall and F1-score. The metrics attained with maximum values of 91.6%, 94.7% and 93.1% respectively. In future, classification tools may be used to enhance the processing and handling of sensitive data will be extended.

Footnotes

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Appendix

References

Chowdhury

. Natural language processing. Annu Rev Inform Sci 2005; 37: 51–89.

Chandrasekaran

Mago

. Evolution of semantic similarity—a survey. ACM Comput Surv 2021; 54: 1–37.

Wang

. Deep web data source classification based on text feature extension and extraction. Infocommunications J 2019; 11: 42–49.

Sintoris

Vergidis

. Extracting business process models using natural language processing (NLP) techniques. In Proceedings of 2017 IEEE 19th conference on business informatics (CBI), Thessaloniki, Greece, 2017, vol. 1, pp.135–139.

Ali

Alfayez

Alquhayz

. Semantic similarity measures between words: a brief survey. Sci Int 2018; 30: 907–914.

Bollegala

Matsuo

Ishizuka

. Websim: a web-based semantic similarity measure. In Proceedings of 21st annual conference of the Japanese society of artitificial intelligence, 2007, pp.757–766.

Slimani

. Description and evaluation of semantic similarity measures approaches. arXiv preprint arXiv:1310.8059, 2013.

Luo

, et al. Measuring semantic similarity between words by removing noise and redundancy in web snippets. Concurr Comput 2011; 23: 2496–2510.

Stuckenschmidt

. A semantic similarity measure for ontology-based information. In Proceedings of international conference on flexible query answering systems, Springer, Berlin, Heidelberg, 2009; 406–417.

10.

Sanchez

Arrieta

Corchado

. Visual content-based web page categorization with deep transfer learning and metric learning. Neurocomputing 2019; 338: 418–431.

11.

Wang

Tong

VJC

Chin

. Enhancing machine-learning methods for sentiment classification of web data. In Proceedings of information retrieval technology: 10th Asia information retrieval societies conference, AIRS 2014, Kuching, Malaysia, December 3–5, 2014, proceedings 10, Springer International Publishing, 2014, pp.394–405.

12.

Yang

Sun

Lai

, et al. Recognition from web data: a progressive filtering approach. IEEE Trans Image Process 2018; 27: 5303–5315.

13.

Sun

Chen

Yang

. Learning from web data using adversarial discriminative neural networks for fine-grained classification. In Proceedings of the AAAI conference on artificial intelligence, 2019, vol. 33, pp.273–280.

14.

Baharlou

Aghamaleki

. Transfer learning approach for classification and noise reduction on noisy web data. Expert Syst Appl 2018; 105: 221–232.

15.

Gopianand

Jaganathan

. An effective quality analysis of XML web data using hybrid clustering and classification approach. Soft Comput 2020; 24: 2139–2150.

16.

Chang

Lai

, et al. Bispl: bidirectional self-paced learning for recognition from web data. IEEE Trans Image Process 2021; 30: 6512–6527.

17.

Song

Zhu

, et al. Learning from large-scale noisy web data with ubiquitous reweighting for image classification. IEEE Trans Pattern Anal Mach Intell 2019; 43: 1808–1814.

18.

Liu

Huang

, et al. An earthquake emergency web data cleaning and classification method based on word frequency and position weighting. Comput Intell Neurosci 2022; 2022: 1–10.

19.

Patel

Verma

. Data classification in web usage mining using SVM. SVR and K-NN. J Innov Eng Res 2021; 4: 18–22.

20.

Website Classification dataset will be taken from https://www.kaggle.com/datasets/hetulmehta/website-classification?resource=download accessed on March 2023.

21.

Website Classification dataset will be taken from https://www.kaggle.com/datasets/shaurov/website-classification-using-url accessed on April 2024.

22.

Devlin

Chang

Lee

, et al. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. 2018.

23.

Yang

Zeng

Yang

, et al. A multi-task learning model for Chinese-oriented aspect polarity classification and aspect term extraction. Neurocomputing 2021; 419: 344–356.

24.

Kowsari

Heidarysafa

Brown

, et al. Rmdl: random multimodel deep learning for classification. In Proceedings of the 2nd international conference on information system and data mining, 2018, pp.19–28.

25.

Hashim

Hussain

Houssein

, et al. Archimedes optimization algorithm: a new metaheuristic algorithm for solving optimization problems. Appl Intell 2021; 51: 1531–1551.

26.

Engle

Manganelli

. CAViar: conditional autoregressive value at risk by regression quantiles. J Bus Econ Stat 2004; 22: 367–381.

27.

Iosif

Potamianos

. Unsupervised semantic similarity computation between terms using web documents. IEEE Trans Knowl and Data Eng 2009; 22: 1637–1647.

28.

Naruei

Keynia

. A new optimization method based on COOT bird natural life model. Expert Syst Appl 2021; 183: 115352.

29.

Shadravan

Naji

Bardsiri

. The sailfish optimizer: a novel nature-inspired metaheuristic algorithm for solving constrained engineering optimization problems. Eng Appl Artif Intell 2019; 80: 20–34.

30.

Bairwa

Joshi

Singh

. Dingo optimizer: a nature-inspired metaheuristic approach for engineering problems. Math Probl Eng 2021; 2021: 1–12.

31.

Rodríguez-Fdez

Canosa

Mucientes

, et al. STAC: a web platform for the comparison of algorithms using statistical tests. In the proceedings of the 2015 IEEE international conference on fuzzy systems (FUZZ-IEEE), 2015.

		Tests
		Anova	Levene	Shapiro-Wilks
	Metrics	P-value
Dataset-1	Precision	0.000119	0.001568	0.006179
	Recall	0.000026	0.00367	0.006455
	F1 score	0.000053	0.003568	0.006664
Dataset-2	Precision	0.000027	0.000632	0.008829
	Recall	0.00008	0.000848	0.007009
	F1 score	0.000043	0.00381	0.008379

ACOA: Archimedes conditional autoregressive optimization algorithm based RMDL for web data classification

Abstract

Keywords

1 Introduction

2 Literature review

3 Proposed archimedes conditional autoregressive optimization algorithm _ random multimodel deep learning for web data classification

3.3 Aspect term extraction

4.1 Evaluation metrics

4.2.1 Website classification dataset 1

4.2.2 Website classification dataset 2

4.3 Performance analysis

4.3.1 Performance analysis using website classification dataset 1

Table 2. Training time. Methods Training time(seconds) Transfer learning 12.432 ONN 11.865 BiSPL 10.986 URNET 10.356 Proposed ACOA_RMDL 9.456

Footnotes

Funding

Declaration of conflicting interests

Appendix

References

Table 2.
Training time.

Methods Training time(seconds)

Transfer learning 12.432

ONN 11.865

BiSPL 10.986

URNET 10.356

Proposed ACOA_RMDL 9.456