Effective community detection with topic modeling in article recommender systems using LS-SLM and PCC-LDA

Abstract

This paper introduces an innovative approach, the LS-SLM (Local Search with Smart Local Moving) technique, for enhancing the efficiency of article recommendation systems based on community detection and topic modeling. The methodology undergoes rigorous evaluation using a comprehensive dataset extracted from the “dblp. v12.json” citation network. Experimental results presented herein provide a clear depiction of the superior performance of the LS-SLM technique when compared to established algorithms, namely the Louvain Algorithm (LA), Stochastic Block Model (SBM), Fast Greedy Algorithm (FGA), and Smart Local Moving (SLM). The evaluation metrics include accuracy, precision, specificity, recall, F-Score, modularity, Normalized Mutual Information (NMI), betweenness centrality (BTC), and community detection time. Notably, the LS-SLM technique outperforms existing solutions across all metrics. For instance, the proposed methodology achieves an accuracy of 96.32%, surpassing LA by 16% and demonstrating a 10.6% improvement over SBM. Precision, a critical measure of relevance, stands at 96.32%, showcasing a significant advancement over GCR-GAN (61.7%) and CR-HBNE (45.9%). Additionally, sensitivity analysis reveals that the LS-SLM technique achieves the highest sensitivity value of 96.5487%, outperforming LA by 14.2%. The LS-SLM also demonstrates superior specificity and recall, with values of 96.5478% and 96.5487%, respectively. The modularity performance is exceptional, with LS-SLM obtaining 95.6119%, significantly outpacing SLM, FGA, SBM, and LA. Furthermore, the LS-SLM technique excels in community detection time, completing the process in 38,652 ms, showcasing efficiency gains over existing techniques. The BTC analysis indicates that LS-SLM achieves a value of 94.6650%, demonstrating its proficiency in controlling information flow within the network.

Keywords

Recommender Systems (RS)BagofWords (BoW)Pearson Correlation Co-efficient based Latent Dirichlet Allocation (PCC-LDA)Linear Scaling based Smart Local Moving (LS-SLM)Time Frequency and Inverse Document Frequency (TF-IDF)Community detection

1 Introduction

There is a formation of a fresh generation of services, namely wikis, blogs, Google+, Facebook, YouTube, Wikipedia, Twitter, Amazon, along with Flickr due to the development in technology. These services are provided with numerous information, which causes information overload problems owing to various formats and thereby limits their usage [1]. Also, the Archival materials are also digitized and made available online to people for free or by paying a fee. Despite conveying a significant improvement, such circumstances also create the information overload problem specifically in the academic world, which allows people to easily access more knowledge [2]. Thus, to address this problem, an RS is developed by removing the relevant data together with providing items that are nearer to the user’s interest [3]. For several sorts of items like newspapers, research papers, and emails, various RS has been designed and implemented; also, they are wielded in several fields, namely economics, education, and scientific research [4]. For dealing with the information overload issue of the modern era, the information retrieval community is regarded as an alternative technique to retrieve information grounded on a community of users RS [5].

Usually, a few mostly deployed digital libraries by researchers are Google Scholar, IEEE Digital Library, along with Science Direct [6]. Accessing details as of digital sources is also augmenting gradually with the increased number of digital libraries and their support of universities to deploy digital resources. However, the development in science has made researchers struggle to detect the articles they search for from a vast number of articles [7].

The different types of recommendation techniques utilized are categorized into ‘4’ key divisions: (A) Content-Based Filtering (CBF), (B) Collaborative Filtering (CF), (C) Graph-Based method (GB), along with (D) Hybrid recommended method. The other techniques of RS are the latent factor together with the topic regression matrix factorization system [8]. Nevertheless, owing to the nature of the data being used, it undergoes more weaknesses including (A) cold start, (B) sparsity of data, and (C) overspecialized recommendation. Thus, community detection-centric RS is deployed. Currently, in the academic field, paper RS have become an indispensable tool [9]. In the scientific community, recommending similar scientific articles for researchers is termed a scientific paper recommendation. Concerning the latent interests in the researchers’ publication profiles, the community detection-based RS provides relevant information to the researchers; also, it aids them to get a better idea in a particular research area. Further, the recommendation algorithms are updated constantly, thus, the accuracy is also enhanced over time [10]. However, the major disadvantage is recommending the same articles to all researchers. Thus, an efficient article RS based on community detection with topic modelling using LS-SLM and PCC-LDA is proposed.

Contributions of the Proposed Algorithm:

Domain-Specific Article Selection:

Addressing the crucial aspect of relevance in scientific article recommendations by leveraging domain-specificity. Achieved through the strategic partitioning of the citation graph into communities, ensuring that selected articles align with user interests.

Scalability Improvement:

Tackling scalability challenges prevalent in existing algorithms for community detection. The proposed Linear Scale-Local Smart Moving (LS-SLM) algorithm focuses on enhancing scalability by optimizing computational time while maintaining a high degree of accuracy, modularity, Normalized Mutual Information (NMI), and betweenness centrality.

Efficient Preprocessing Techniques:

Introducing advanced preprocessing strategies to mitigate network size and complexity issues without compromising essential information. Techniques such as graph coarsening and node sampling are employed to streamline the citation graph, paving the way for more efficient community detection.

Performance Superiority:

Demonstrating the superior performance of the proposed algorithm compared to existing counterparts. Notable improvements are observed in terms of relevance, as the algorithm excels in providing communities that yield meaningful and pertinent topics, influencing the quality of recommended articles.

Impact on Topic Modelling:

Integrating the PCC-LDA (Pearson Co-Correlation Coefficient) algorithm for effective topic modelling. PCC-LDA contributes by offering an explicit measure of association between terms and topics, particularly through Pearson’s Correlation Coefficient. This strategic choice enhances the algorithm’s capability to identify and rank strongly associated terms under each topic.

Enhanced Computational Efficiency:

Prioritizing computational time as a key contribution. The proposed algorithm significantly improves computational efficiency, addressing a critical concern in scientific article recommendation systems. This efficiency gain is essential for user satisfaction and system responsiveness.

Holistic Improvement Before Recommendation:

Offering a holistic enhancement approach before the actual recommendation of scientific articles. The algorithm achieves a balance between computational efficiency and result quality, ensuring that the recommended articles are not only timely but also strongly associated with relevant topics.

Identification of Strong Associations:

Unveiling a noteworthy contribution in the identification of strongly associated terms under each topic. This outcome is critical for enhancing the depth and specificity of the recommendations, contributing to a more refined and personalized user experience.

2 Problem definition

In the quest to improve article recommendation systems, the paper identifies the need for advanced methodologies that effectively leverage community detection and topic modelling. Although the existing research methodologies provide various advantages in the field of article RS, existing solutions lack the desired accuracy and efficiency, prompting the formulation of the LS-SLM technique. There exist certain drawbacks, which are enlisted as follows,

In the RS field, the most successful technique is CF. Grounded on the neighbor’s preferences, suggestions were created by CF. However, it suffers from poor accuracy, scalability, along with cold start problems.

CF suffers as of ‘sparsity’ and ‘new user’ problems.

The item’s feature representation is hand-engineered to a few extents in CBF; this methodology necessitates a huge domain knowledge.

CBF recommendation system doesn’t consider what others think of the item, thus, lower-quality item recommendations may occur.

Existing Unigram, and Bigram-centric citations are only based on the count of words. It does not concentrate on the articles’ content similarity.

To alleviate the aforementioned issues, this paper proposes an efficient LS-SLM and PCC-LDA-based article RS.

The remaining part is arranged as: relevant related works are elucidated in Section 2; the details and fundamental concepts of the proposed methodology are delineated in Section 3; the efficiency of the proposed work compared to some of the existing methods are given in Section 4; the paper is wrapped up with the future work in Section 5.

3 Literature survey

Jelodar et al., 2021 presented a semantic mining system grounded on topic modelling for constructing a recommendation system to extract scholars-interested research fields as of conference publications. The natural language processing-based Latent Dirichlet Allocation (NLP-based LDA)’s outcomes were more distinctive for elucidating the topics when analogized to describing the single words for a semantic description of a set of papers. For professional recommendation systems, deploying the fusion of machine learning along with topic modelling approaches might be effectual. Nevertheless, the system did not use it [11].

Chaudhuri et al., 2021 explored a hidden feature identification scheme to design an effectual research article recommendation system. For depicting a research article, ‘4’ indirect features like (A) keyword diversification, (B) text complexity, (C) citation analysis over time, along with (D) scientific quality measurement were conducted. As per the outcomes, a research article was defined by the technique’s indirect features when contrasted with the direct features. But, it was performed by a few hidden features only [12].

Dai et al., 2020 recommended a neural network system for context-aware citation recommendation by fusing Stacked De-noising Auto Encoders (SDAE) and Bi-directional Long Short-Term Memory (Bi-LSTM). By employing the attentive information from the citation context, SDAE was extended into Attentive SDAE (ASDAE) for acquiring an effectual embedding for cited papers, which enhanced the actual SDAE’s learning ability. Citations were suggested; also, suitable citation contexts were extracted for the long term. However, just a few effectual link functions were wielded [13].

Wang et al., 2019 introduced an automatic title generation methodology, which fuses personalized recommendations along with topic trend analysis techniques. For detecting the users’ interests in a topic structure together with its representative keywords in the prevailing study, hierarchical latent tree analysis was deployed. As per the experimental outcomes, the topic recommendations’ performance was augmented by adding Google Trend indicators along with personal factors. The research scope was narrowed within the journal papers; thus, every word and phrase was wielded prevailingly. Yet, attaining the maximal degree of novelty and innovation was not possible [14].

Habib & Afzal, 2019, explored a system for a paper recommendation, which extended the prevailing bibliographic coupling by integrating in-text citation analysis and their availability in the research paper’s logical sections. As per the experimental outcomes, this system had a considerable augmentation over the prevailing bibliographic coupling and content-centric research paper suggestion. But, the weights were not allocated automatically [15].

Z. Ali et al., 2022, presented a heterogeneous network embedding system, which jointly learns node representations by exploiting semantics equivalent to the (A) author, (B) time, (C) context, (D) field of study, (E) citations, along with (F) topics. This system depicted enhancements over the DBLP datasets of 10% for Mean Average Precision (MAP) along with 12% for normalized discounted cumulative gain metrics. Nevertheless, this system performed with just a few factors along with contextual information by presenting attention mechanisms [16].

Liu et al., 2020, developed a keyword-driven and popularity-aware paper recommendation grounded on an undirected paper citation graph called PR_{keyword +pop}. An undirected paper citation graph was constructed; also, the users’ keyword query was considered as the Steiner tree problem. In searching for a set of satisfactory papers, this system’s advantages were proved when analogized to the other competitive techniques. When contrasted with the prevailing techniques, this methodology acquired desirable outcomes. However, the system faced the sparsity issue of the prevailing paper citation graph [17].

Nassar et al., 2020 recommended a multi-criteria CF grounded on 15 deep learning. ‘2’ parts were available in this system: the user’s and items’ features were acquired; also, it was inputted into the criteria ratings of a deep neural network that detected the criteria ratings. This system was general, easy to implement, as well as model-independent. A few deep learning techniques were deployed; also, they didn’t employ more complex deep networks or else other representation techniques [18].

Sharma et al., 2022 explored a hybrid system-centric book recommendation system, which anticipates recommendations. This system was a fusion of CF and CBF that was elucidated in the ‘3’ phases. In building successful e-commerce businesses, developers and other stakeholders were aided by this system. It was performed on offline data sets. However, this system performed just in offline data sets, where it didn’t permit monitoring the real user acceptance level for the given recommendations [19].

S. Ali et al., 2022 introduced an architecture, which designs semantic recommendations with the assistance of virtual agents grounded on user requirements along with preferences; thus, helping in seeking suitable courses in a real-world setting. The E-learning Recommendation Architecture (ELRA) augmented the skills and accomplishments, and learning success (by more than 90%). But, in an online learning environment, this system had low course quality [20].

Pradhan & Pal, 2020b developed Content and Network-centric Academic VEnue Recommender system (CNAVER). By deploying rank-centric mixture of the Paper-Paper Peer Network (PPPN) along with the Venue-Venue Peer Network (VVPN) systems, an incorporated network was offered. When analogized to other best-in-class methodologies, this system exhibited higher scores of precision, accuracy, Mean Reciprocal Rank (MRR), and diversity. When weighed against cutting-edge methodologies, top-notch venues were displayed by CNAVER as far as the H5-index. Only a few machine learning methodologies were deployed that enforced the random walker not to go too far as of interest primary venue [21].

Pradhan et al., 2020 recommended a unified architecture, which included Bi-LSTM and Hierarchical Attention Network (HAN). This system, which only necessitates the abstract, title, keywords, field of study, along with author of a fresh paper for recommending scholarly venues, is also called the modularized Hierarchical Attention-centric Scholarly Venue Recommender system (HASVRec). For making venues’ precise predictions, modularized structure information and attention were useful. Just a few meta-path features were deployed; also, for combining the embedding of multiple paths, a few deep-learning techniques were deployed [22].

Jain et al., 2018 explored a Journal Recommendation System (JRS) that resolved the publication issue for several authors. Here, CBF was deployed. As per the outcomes, the system assisted the authors in detecting suitable journals; also, fasten their submission process; then, augment user experience. For the dimensionality reduction, LDA was wielded; also, semantic analysis was done. However, to construct a recommendation system, only a few basic similarity measuring methodologies were deployed [23].

Pradhan & Pal, 2020a developed a Diversified yet Integrated Social network analysis and Contextual similarity-centric scholarly Venue Recommender (DISCOVER). (A) Centrality measure calculation, (B) citation and co-citation analysis, (C) topic modeling-centric contextual similarity, and (D) key-route identification cantered key path analysis of a bibliographic citation network were included in the suggested system. When analogized to the prevailing methodologies, higher-quality venues were recommended by DISCOVER. Here, a few disciplines were only deployed; also, it is performed with the specified dataset [24].

Yu et al., 2018 recommended a Personalized Academic Venue Recommendation Exploiting (PAVE) co-publication networks. With a restart system on a co-publication network, this system ran a random walk that encompasses ‘2’ sorts of associations, namely coauthor, and author-venue relations. In suggesting academic venues for researchers with rarer publications, that is junior researchers, this system performed better. In co-publication networks, just ‘3’ academic factors were exploited. The features like citation relations could be encompassed [25].

Z. Ali et al., 2021 elucidated a network embedding system called Global Citation Recommendation by deploying a Generative Adversarial Network (GCR-GAN). For generating personalized citation recommendations, the Heterogeneous Bibliographic Network (HBN) was exploited. By attaining 11% Mean Average Precision (MAP) along with 12% normalized Discounted Cumulative Gain (nDCG) metrics enhancements, it surpasses the prevailing methodologies. For relevant labels, this system didn’t perform well [26].

[30] Abdelrazek et al. (2023): Abdelrazek et al.’s comprehensive survey on topic modeling algorithms and applications provides valuable insights into various techniques and their effectiveness in different contexts. This survey particularly enriches our understanding of topic modeling approaches, aiding in the refinement of our PCC-LDA method. Their review of diverse topic modeling algorithms has enabled us to draw parallels and highlight the strengths of our approach in handling the intricacies of article recommendation systems.

[31] Jiang et al. (2022): In their study, Jiang et al. explore user interest community detection on social media using collaborative filtering. This work is particularly relevant to our LS-SLM method as it shares the underlying principle of detecting communities based on user interests, albeit in a different application domain. The collaborative filtering approach they discuss provides us with additional perspectives on community detection, allowing us to further validate the effectiveness of our method in identifying and recommending relevant academic articles.

[32] Vanchinathan et al. (2022): Although Vanchinathan et al.’s work on numerical simulation and experimental verification of a fractional-order controller using a whale optimization algorithm is in a different technical domain, their methodological rigor and the use of optimization algorithms offer valuable insights. Their application of advanced optimization techniques in engineering systems has parallels in our use of the LS-SLM method for optimizing community detection in large datasets. This connection enriches our discussion on the methodological advancements we bring to the field of recommender systems.

Critical analysis of literature review

Diversity of Techniques:

The references cover a wide range of recommendation techniques, including collaborative filtering, content-based filtering, hybrid methods, and novel algorithms like Fuzzy Routing-Forwarding (FCNS) and Fuzzy Reasoning Routing-Forwarding (FRRF).

Hybrid models, such as those presented in [1, 6], and [24], leverage multiple recommendation strategies for improved accuracy.

Context-Aware Approaches:

Some references, like [2, 24], and [27], delve into context-aware personalized recommendation systems, acknowledging the importance of considering user context and preferences.

Community and Network Analysis:

References [4 , 26], and [25] explore community-based and network embedding approaches, recognizing the influence of social networks and community structures in article recommendations.

Incorporation of Advanced Technologies:

The papers [13] and [9] introduce advanced technologies like stacked denoising autoencoders and knowledge-guided article embedding refinement, showcasing the integration of cutting-edge technologies in recommendation systems.

Domain-Specific Solutions:

Some references, like [3, 11], and [21], propose domain-specific solutions, tailoring recommendation algorithms for specific fields such as research articles, academic venues, and e-learning.

Evaluation Metrics and Validation:

Several papers, including [6 , 22], and [26], present comprehensive evaluations using various metrics such as accuracy, precision, and F-score, emphasizing the importance of thorough validation.

Challenges and Opportunities:

Some references, like [15] and [5], acknowledge challenges in citation recommendation and propose novel methods to overcome these challenges.

Innovation in Routing Algorithms:

References [27] and [28] propose innovative fuzzy routing algorithms, showcasing a broader application of fuzzy reasoning in mobile edge computing-based opportunistic social networks.

4 Proposed article recommender system

Usually, researchers have to read certain scientific articles that are associated with their interests more intensively for the generation of a novel study idea and to write a good article. Nevertheless, analysing the various number of articles and selecting the one among them is complicated and time-consuming work. Thus, this paper proposes an efficient article RS, and the structure are displayed in Fig. 1.

Fig. 1

Block diagram of the proposed methodology.

4.1 Convert JSON to CSV

Initially, the input sources are collected from the DBLPv12 dataset. Generally, the DBLPv12 dataset is provided with numerous files in JSON format (J^s). Details regarding the Id, title, authors, venue, year, keywords, references, Date of Issue (DoI), Abstract, volume, issue, and n_citation are encompassed in the DBLPv12 dataset. Since the processing of JSON files consumes more time, it is converted into CSV format (ς^s) for further processing.

4.2 Attribute extraction

Then, the essential attributes are extracted to aid in a better recommendation process. The process of extracting significant information as of a huge amount of available data from the document is termed attribute extraction. The id, title, references, year, and Abstract are some of the handcrafted attributes extracted from the CSV file.

4.3 Construction of citation graph

Here, the extracted attributes are graphically represented. Certain attributes such as ID, title, reference, and year are represented graphically into nodes and edges. Let g (n, e) represents the graph with n nodes and e edges.

4.4 Abstract filtering

Now, standard abstracts ( $\bar{c}$ ) are filtered from the extracted attributes. Usually, the abstract is composed of a minimum number of words shortlisting the objective of the proposed work within a specific length. For the effective filtering of the standard abstract, the following condition is checked. $α^{a} = {\begin{matrix} \bar{c}, & 900 < c < 1500 \\ 0, & otherwise \end{matrix}$ (1)

Where, α^a implies the abstract filtering process, and c signifies the word extracted from the filtered abstract.

4.5 Pre-processing

Here, the obtained standard abstracts are pre-processed. The process of the effective transformation of the raw data into an understandable format for further analysis is termed pre-processing. Steps like (A) Tokenization, (B) Stop word removal, (C) Stemming, along with (D) Lemmatization are performed. Such pre-processing steps are discussed for the pre-processing of the filtered abstract.

a. Tokenization

Effective processing of complex data is quite difficult since the filtered abstract is in the form of a paragraph. Thus, Tokenization is utilized. Breaking a stream of text into words, phrases, symbols, or else other meaningful elements termed tokens are involved in tokenization. Hence, for further processing purposes, the list of tokens becomes the input, which is depicted as ( $δ^{\bar{c}}$ ).

b. Stop word Removal

Then, stop word removal takes place. Stop word removal is carried out to remove the most frequently occurring meaningless words. And, are, this, etc. are some of the most commonly occurring stop words in the context or textual documents. The presence of such stop words acts as an obstacle to the effective article recommendation system and affects the proposed technique’s performance. Thus, the output obtained after the stop word removal process is explicated as ( $χ^{\bar{c}}$ ).

c. Stemming and Lemmatization

Next, to remove numerous suffixes and condense them under the same root word, the stemming process is carried out. For instance, the words like “continue”, “continuously” and “continued” to one word “continue”. Hence, the document size is minimized by the stemming process; thus, improving the computation efficiency. Hence, the output obtained after stemming and the Lemmatization process is denoted as ( $S^{\bar{c}}$ ). The pre-processing (ρ^p) stage is mathematically expressed as $ρ^{p} = {δ^{\bar{c}}, χ^{\bar{c}}, S^{\bar{c}}}$ .

4.6 Feature extraction with TF-IDF

Then, features are extracted from the pre-processed output. The process of extracting crucial features as of the pre-processed data is termed feature extraction. Here, by employing TF-IDF technique, feature extraction is carried out. By evaluating the importance of a word to the whole corpus, TF-IDF extracts the crucial features from the document. The TF-IDF operation is detailed below.

The probability of occurrence of a term in the document is defined in Term Frequency (TF) in the TF-IDF algorithm. Here, the bias in the lengthy document is usually reduced by normalizing the term in the range [0, 1]. TF(t) = (Number of times term t appears in a document) / (Total number of terms in the document).

Moreover, by using Inverse Document Frequency (IDF), the specificity of a particular term or a word in the document is computed. The most frequently occurring words are not considered for analysis in IDF because those words that occur more frequently act as stop words. Thus, uncommon words are only considered. The IDF (ℑ (ρ, φ)) process is explained as [14],

ℑ (ρ, φ) = log \frac{| φ |}{| δ^{'} \in φ : ρ \in δ^{'} |}

(2)

Here, φ implies the document corpus, and |φ| signifies the total number of documents. Thus, the set of features (X_I) extracted is X_I = X₁, X₂, …… , X_N. Here, N detects the number of features extracted.

4.7 Word2Vector using BoW

Word2Vector is the next step. The features extracted are fed as input to the Word2Vector process and produce a corresponding vector representation for each unique word. It is the process of transforming words into vectors by considering the syntax and semantic features of words and is mainly used as an effective prediction model for learning word embedding. For effective Word2Vector representation, BoW is the most commonly used approach. The BoW is nothing but the representation of textual information (histogram representation) retrieved from the visual scenes. The working procedure of BoW is described below,

Vocabulary generation: It is the primary step in BoW methodology. Here, the extracted features are quantized or clustered in the form of a visual term. These terms are nothing but the codes in the codebook.

Defining terms: Here, by employing the Nearest Neighbour (NN) technique, features or vectors are assigned to the terms generated in the previous step.

Generating term vector: Finally, by utilizing the appearance (number) of times each term appears in the image, the term vector is represented in the form of a histogram. Thus, the output obtained using the BoW method, which is defined as H (F^e).

4.8 Similarity score via LSE

Thus, for the vector representation obtained, the similarity score is determined. The Levenshtein Similarity Evaluation (LSE) approach is used in the proposed work for the similarity score measurement. By performing certain operations like Insertion, Deletion, and Substitution, LSE measures the similarity between the two strings in different abstracts. Thus, LSE is otherwise called edit distance. The process involved in LSE-based similarity score measurement is detailed further down,

Step: 1 Let M and N be the two strings of length u and v. Initially, the edit distance between the two strings is calculated through the creation of a matrix ℓ [u, v] using the forthcoming Equation (3) [29]. $ℓ (M, N) = {\begin{matrix} 0, u = 0, v = 0 \\ v, u = 0, v > 0 \\ u, u > 0, v = 0 \\ min, i > 0, j > 0 \end{matrix}$ (3)

Here, min implies the minimum distance computed using the formula given below [29] $min = MIN {ℓ (u - 1, v) + 1, ℓ (u - 1, v - 1) + F (u, v)}$ (4)

Where, F (u, v) implies the similarity parameter, which is obtained as follows [29], $F (u, v) = {\begin{matrix} 1, M^{(u)} \neq N^{(v)} \\ 0, Otherwise \end{matrix}$ (5)

Where, M^(u) is the u^th word of M, and N^(ν) refers to the v^th word of N.

Step 2: Then, the similarity between the two strings M and N is evaluated and is expressed as follows [29], $S (M, N) = 1 - \frac{ℓ^{d}}{MAX (u, v)}$ (6)

Here, (S (M, N)) implies the similarity between two strings. As the Levenshtein distance (ℓ ^d) increases, the similarity becomes smaller. Thus, the similarity score measured using LSE is modelled as χ^s.

4.9 Community detection by LS-SLM

For effective detection of community, the graph constructed (g (n, e)) and the similarity score evaluated (χ^s) is fed into LS-SLM. The most widely used technique for the effective detection of optimal communities based on the merging of community and movement of individual nodes is termed the Smart Local Moving (SLM) algorithm. Moreover, the adapted local moving heuristic technique increased the modularity through the community splitting process. But, the SLM fails to detect the hidden relations among the nodes in the network. Thus, in the conventional SLM, the Linear Scaling (LS) technique is employed. This alteration in traditional SLM is termed LS-SLM and the process is described below.

Primarily, an improved community structure is obtained by moving every node to its neighbouring community through the execution of the modularity function (ℵ), which is expressed in (7).

ℵ = \frac{1}{2 e} \sum_{pq} (E_{pq} - W_{pq}) β ((E_{pq} - ς^{p}) / (ς^{p} - ς^{q}))

(7)

β = {g (n, e), χ^{s}}

(8)

Here, pq implies the vertices, E_pq signifies the number of edges lying between p and q, W_pq denotes the number of expected edges, β (·) depicts the hidden relations among the nodes in the network, and ς^p, ς^q elucidates the community with vertex p and q. Hence, the node will be moved back to its previous community or maintained in its own community depending on the quality of the modularity function obtained.

Next, the local moving heuristic step is applied to the community structure. Here, each node in the subnetwork is assigned to its own singleton community.

The network is reduced by detecting the nodes (communities) in the subnetwork after the community structure construction process. Thus, the detected communities are placed in the corresponding subnetworks. The reduced network thus obtained is applied as input to the recursive call and the process continues until a final network structure that cannot be reduced further is obtained. Hence, the optimal community detected is denoted as ζ^(d). Finally, all the detected communities can be visualized and topic modelled using PCC-LDA. The pseudocode for the proposed PCC-LDA is displayed further,

PSEUDO CODE FOR PCC-LDA:
Input: Detected Communities
Output: Visualized and Topic modelled communities
Begin
Initialize multinomial distribution for topic (η^(τ)), multinomial distribution for document (η^(δ)), multinomial distribution for word (η^{ω^a}), maximum iteration (l_max)
Setl = 1
While (l ≤ l_max) do
Select multinomial distribution for the topic (η^(τ)),
Select multinomial distribution for the document (η^(δ))
Select a multinomial distribution for the word (η^{ω^a})
If (τ ∈ { 1, 2, 3, . . . , t })
Select a topic x^a from η^(δ)
} Else {
Repeat
} End if
If (a ∈ { 1, 2, 3, . . . , A^δ }){
Select a word ω^a from (η^{ω^a})
} Else {
Repeat
} End if
Evaluate probability of the Pearson Correlation Co-efficient (PCC)
$P (ω \| s, t) = \int P (υ \| s) (\frac{(\prod_{a = 1}^{A} P (x^{a} \| s) - ℏ) (\prod_{a = 1}^{A} P (ω \| υ, t) - ξ)}{\sqrt{\prod_{a = 1}^{A} P {(x^{a} \| s)}^{2} P {(ω \| υ, t)}^{2}}}) ds$
If (PCC ≠ satisfied){
Setl = l + 1
} Else
Terminate
} Else if
End while
Return Visualized and Topic modelled communities
End

4.10 Visualization and Topic Modeling by PCC-LDA

Visualization: Next, the detected communities are visualized. Various techniques are used for visualization. Graph-based representation, matrix-based representation, and the combination of both graph-based and matrix-based representation are some of the most commonly used visualization techniques. Despite various advantages, each representation has its own pros and cons like edge crossing problems, and the superposition of node and edge. Thus, for an effective visualization process, the networkx/Gephi/iGraph PYTHON simulation tool is utilized in the proposed work.

Topic Modeling by PCC-LDA:

Also, by employing PCC-LDA, the topics are recognized or extracted from the detected communities. One among the famous topic modelling methodologies, which represents each document as a random mixture over a set of latent topics, is termed Latent Dirichlet Allocation (LDA); also, each topic is depicted as a distribution over a vocabulary. Topics are modelled through word probabilities in LDA. The words with greater probabilities provide a better idea about the particular topic. Moreover, each topic is represented in the form of a probabilistic distribution function of words in LDA. It functions by looking at the word co-occurrences within documents, which might affect the modeling accuracy. Thus, to alleviate the above-mentioned issues, by employing Pearson Correlation Coefficient (PCC), the correlation between the words in the topic is determined, and the steps in PCC-LDA are described below.

Primarily, a multinomial distribution η^(τ) is selected with parameters φ from the Dirichlet distribution for the topic τ (τ ∈ { 1, 2, 3, . . . , t }).

Then, select a multinomial distribution η^(δ) for the document δ (δ ∈ { 1, 2, 3, . . . , T }) with a parameter φ.

Next, the multinomial distribution η^{(ω^a)} for selecting a word is ω^a (a ∈ { 1, 2, 3, . . . , A^δ }). From the first document, select the topic x^a from η^(δ) and then select a word ω^a from η^{x
^a}. Thus, the probability of the corpus is modelled in the upcoming formula [11],

$\begin{matrix} P (ω | s, t) = \int P (υ | s) \\ (\frac{(\prod_{a = 1}^{A} P (x^{a} | s) - ℏ) (\prod_{a = 1}^{A} P (ω | υ, t) - ξ)}{\sqrt{\prod_{a = 1}^{A} P {(x^{a} | s)}^{2} P {(ω | υ, t)}^{2}}}) ds \end{matrix}$ (9)

Where, s, t implies the corpus-level parameters, v signifies the variables in the document, and the term inside the bracket signifies the correlation determined using PCC. Hence, the topics are efficiently modelled and the pseudocode for PCC-LDA is elucidated below.

Standard pseudo code for LS-SLM

Procedure LS-SLM

Initialize modularity function, hidden relations among nodes, number of expected edges, maximum iteration, final network structure, and reduced network

Set iteration counter = 1

While iteration counter< =maximum iteration

For each node in the graph

Move the node to its neighboring community

Execute the modularity function

If modularity increases

Maintain the current community

Else

Move the node back to the previous community

End If

End For

Construct community structure

Reduce the network

Place detected communities in the corresponding subnetworks

If convergence criteria are met

Terminate the algorithm

Else

Increment iteration counter

End If

End While

Return detected optimal communities

End Procedure

PROPOSED PSEUDO CODE FOR LS-SLM:
Input: Constructed graph and similarity score
Output: Detected optimal community
Begin
Initialize modularity function (ℵ), hidden relation among the nodes β (), number of expected edges (W_pq), maximum iteration (j_max), final network structure (F_s), reduced network (R_n)
Setj = 1
While (j ≤ j_max) do
Move every node to its neighbouring community
Execute the modularity function (ℵ),
$ℵ = \frac{1}{2 e} \sum_{pq} (E_{pq} - W_{pq}) β ((E_{pq} - ς^{p}) / (ς^{p} - ς^{q}))$
Obtain an improved community structure
If (ℵ = = goodquality){
Maintain own community
} Else {
Move back to the previous community
} End if
Construct community structure process
Reduce the network
Place detected communities in the corresponding subnetworks.
If (F_s ≠ Rⁿ) {
Terminate
} Else {
Setj = j + 1
} End if
End while
Return detected optimal communities
End

5 Results and discussion

Here, the proposed technique’s performance analysis is done. In the working platform of PYTHON, the proposed methodology is employed. In Figs. 2 and 3, the simulation graph for the whole community detection analysis for the proposed methodology is shown.

Fig. 2

Citation or direct graph.

Fig. 3

Simulation output of the community detection for various methods (a) SLM (b) SBM (c) FGA (d) LA and (e) the proposed method.

The output of the citation graph or direct graph for the proposed article recommender system is depicted in Fig. 2. A directed graph, which describes the citations within a collection of documents, is termed a citation graph. By employing id, title, reference, and year, the citation graph was constructed. From all abstracts, the standard abstracts are filtered by the number of characters less than 1500 and greater than 900.

In Fig. 3 (e), the community detection output for the proposed LS-SLM method is shown. The input for the community detection is the presented citation or direct graph and similarity score. For revealing the hidden relations among the nodes in the network, community detection was done by using the LS-SLM technique.

5.1 Database description

“dblp. v12.json”, which is a citation network dataset, is deployed. From (A) DataBase systems and Logic Programming (DBLP), (B) Association for Computing Machinery (ACM), (C) Microsoft Academic Graph (MAG), along with (D) other sources, the citation data is extracted. 629,814 papers along with 632,752 citations are encompassed in the 1^st version. Every paper is related to (A) abstract, (B) authors, (C) year, (D) venue, and (E) title. This data set can be used for clustering of network side information, and studying the influence of citation networks. It is also utilized for detecting the utmost influential papers and analysing the topic modeling, etcetera.

5.2 Performance analysis

Concerning a few performance metrics, the proposed LS-SLM technique’s performance is analogized to the available Louvain Algorithm (LA), Stochastic Block Model (SBM), Fast Greedy Algorithm (FGA), and Smart Local Moving (SLM).

The proposed LS-SLM’s performance analysis cantered on accuracy and precision metrics are depicted in Fig. 4. For accuracy, the LS-SLM obtained 96.083%, which is 3.26% higher than the existing SLM and 16% higher than the existing LA. The existing LA shows a low accuracy value and the proposed LS-SLM technique shows the highest accuracy value. For precision, the LS-SLM attained 6.3265%, which is 8.6% higher than the existing FGA. Thus, for accuracy and precision, the LS-SLM attains superior results.

Fig. 4

Performance analysis based on accuracy and precision.

In Table 1, the proposed LS-SLM technique’s modularity performance is shown. For modularity, the LS-SLM acquires 95.6119%; while the prevailing SLM, FGA, SBM, and LA attain values of 92.7733%, 87.2515%, 85.9740%, and 81.5393%, respectively. Thus, when compared to the other existing methods, the LS-SLM technique attains better modularity value.

Table 1

Modularity (MOD) analysis for the proposed technique with the existing techniques

Techniques	Modularity (%)
Proposed LS-SLM	95.6119
SLM	92.7733
FGA	87.2515
SBM	85.9740
LA	81.5393

In Fig. 5, the proposed LS-SLM’s sensitivity analysis is shown. The proposed LS-SLM technique obtains the highest sensitivity value of 96.5487% and the existing LA technique obtains the lowest sensitivity value of 82.3256%. Thus, the proposed LS-SLM technique achieves a good sensitivity value.

Fig. 5

Sensitivity analysis for the proposed LS-SLM technique with the existing techniques.

In Fig. 6, the graphical representation of the proposed LS-SLM technique based on specificity and recall is shown. For specificity, the LS-SLM acquired 96.5478%, which are 7.6% higher than the existing FGA technique and 16% higher than the existing LA technique. For recall, the LS-SLM attained 96.5487%, which is 13% superior to the existing SBM technique. Hence, the proposed LS-SLM technique attains a better result for both the specificity and recall metrics.

Fig. 6

Demonstration of the proposed technique with the existing techniques based on specificity and recall.

The normalized mutual information analysis for the proposed LS-SLM technique is depicted in Table 2. The normalized mutual information value for the proposed LS-SLM is 96.6365% and the value for the existing technique SLM is 93.5668%, which is 6% higher than the existing FGA technique, but it is 3% lower than the proposed LS-SLM technique. Thus, when analogized to the prevailing methodologies, the LS-SLM achieves superior results.

Table 2

Analysis of the proposed technique with existing techniques based on Normalized Mutual Information (NMI)

Techniques	NMI (%)
Proposed LS-SLM	96.6365
SLM	93.5668
FGA	90.8520
SBM	87.5772
LA	83.3958

In Fig. 7, the performance analysis for the proposed LS-SLM technique based on F-Score is displayed. For f-score, the LS-SLM acquired 95.4512%; while, the prevailing SLM, FGA, SBM, and LA attained 91.3769%, 88.4143%, 84.5976%, and 83.2021%, respectively. Thus, the LS-SLM achieves better results for the f-score performance metrics.

Fig. 7

Performance analysis based on F-Score.

The graphical representation of community detection time analysis for the proposed LS-SLM technique is depicted in Fig. 8. For community detection, the LS-SLM attained 38652 ms, which is 14895 ms lower than the existing SBM, 19995 ms lower than the existing LA technique, 8562 ms lower than the existing FGA, and 3504 ms lower than the existing SLM technique. Thus, the proposed technique achieves lower community detection time.

Fig. 8

Graphical representation of the proposed LS-SLM technique with existing techniques based on community detection time.

In Table 3, the BTC analysis for the proposed LS-SLM technique is shown. Regarding BTC, the proposed LS-SLM obtains a value of 94.6650%, which are 5.74% higher than the existing FGA technique and 15.1% higher than the existing LA technique. Thus, the LS-SLM attains a good BTC value.

Table 3

Betweenness Centrality (BTC) analysis

Techniques	BTC (%)
Proposed LS-SLM	94.6650
SLM	91.9825
FGA	89.5220
SBM	86.6836
LA	82.2351

The communities detected along with the topics generated with corresponding article IDs are depicted in Fig. 9. Various topics like “context-dependent automatic textile image annotation employing networked knowledge” are generated with ID number 24270 under the image processing community. Likewise, various other topics are also covered under various communities. The detected communities are then ranked in which the high-ranked communities are preferred for topic modeling which in turn is utilized for future reference.

Fig. 9

Performance measurement based on the communities detected.

5.3 Comparative analysis with the literature papers

The proposed LSSLM’s performance is weighed against the prevailing Global Citation Recommendation employing Generative Adversarial Network (GCR-GAN) [26], and Citation Recommendation wielding Heterogeneous Bibliographic Network Embedding (CR-HBNE) [16] are done, which is taken from the literature papers.

In Table 4, the proposed LS-SLM technique is compared with the existing techniques with respect to literature papers based on precision. For precision, the LS-SLM acquires 96.32%, which is 56.11% higher than the existing GAN-HBN. Thus, the LS-SLM achieves a better result for precision.

Table 4
A comparative analysis of the proposed LS-SLM technique based on precision

Techniques Precision (%)

Proposed LS-LSM 96.32

GAN-HBN 61.7

CR-HBNE 45.9

Techniques	Precision (%)
Proposed LS-LSM	96.32
GAN-HBN	61.7
CR-HBNE	45.9

Benchmarking Against Established Methods: Our LS-SLM and PCC-LDA methods have been rigorously compared with renowned techniques such as the Louvain Algorithm, Stochastic Block Model, Fast Greedy Algorithm, and Smart Local Moving. These methods are benchmarks in the field, and our comparison aligns with practices seen in related studies. For example, the approach mirrors that of Pradhan and Pal [Reference 21], where a content and network-based academic venue recommender system is evaluated against leading methodologies.

Evaluation Using Key Performance Metrics: The evaluation of our methods hinges on multiple performance metrics, namely accuracy, precision, sensitivity, specificity, recall, modularity, Normalized Mutual Information (NMI), and community detection time. This multi-faceted approach is crucial for a holistic assessment of algorithmic performance, as also emphasized in the study of Nassar, Jafar, and Rahhal [Reference 18], where they assess a deep multi-criteria collaborative filtering model.

Superiority of Proposed Methods: Our comparative study has conclusively shown that the LS-SLM and PCC-LDA methods outshine existing techniques in several key areas, including accuracy and precision, which are critical for reliable recommender systems. This is in line with the findings in Z. Ali et al. [Reference 16], who underscore similar metrics in their citation recommendation study. Furthermore, our methods exhibit excellence in modularity and NMI, akin to the metrics prioritized in Yu et al. [Reference 25].

Efficiency in Community Detection Time: A notable aspect where our methods demonstrate superiority is in community detection time, which is pivotal for processing large datasets efficiently. This echoes the scalability concerns addressed in Deebak and Al-Turjman [Reference 4], highlighting the relevance of our methods in big data contexts.

“FCNS: A Fuzzy Routing-Forwarding Algorithm Exploiting Comprehensive Node Similarity in Opportunistic Social Networks” [27]

Innovation and Contributions: FCNS proposes a fuzzy routing-forwarding algorithm considering both mobile and social similarities for relay node selection in Opportunistic Social Networks (OSNs). It excels in achieving an average delivery ratio of 0.85, with minimized routing delay and network overhead. FCNS contributes by improving delivery ratios and minimizing routing delay and network overhead in OSNs.

“FRRF: A Fuzzy Reasoning Routing-Forwarding Algorithm Using Mobile Device Similarity in Mobile Edge Computing-Based Opportunistic Mobile Social Networks” [28]

Innovation and Contributions: FRRF introduces a fuzzy reasoning routing-forwarding algorithm that prioritizes transmission based on movement and social similarity in Mobile Edge Computing-Based Opportunistic Mobile Social Networks (MEC-based OMSNs). Its innovation lies in achieving efficiency in energy consumption, delay, and transmission efficiency. The algorithm significantly contributes to enhancing transmission efficiency and energy efficiency in MEC-based OMSNs.

Linear Scale-Local Smart Moving (LS-SLM) Algorithm:

Innovations:

Linear Scale Optimization:

LS-SLM incorporates linear scale optimization, ensuring computational efficiency in large-scale citation networks.

Local Smart Moving (LSM) Technique:

Introduces LSM for uncovering hidden relationships, enhancing accuracy and precision in community detection.

Modularity Emphasis:

Prioritizes modularity for robust community structures in scientific citation networks.

Applicability and Limitations Addressed:

Problem-Specific Adaptation:

LS-SLM tailors its approach to the unique characteristics of scientific citation graphs, addressing the limitations of generic similarity computation methods.

Network Size and Complexity:

Linear scale optimization in LS-SLM adapts to the challenges posed by large-scale citation networks.

Hidden Relationship Exploration:

LSM in LS-SLM captures local dynamics, addressing the limitation of generic similarity measures in uncovering hidden relationships.

The LS-SLM algorithm’s innovations make it a powerful tool for community detection, addressing limitations observed in existing strategies applied to opportunistic social networks and mobile edge computing-based networks, as evidenced by the advancements introduced in FRRF and FCNS.

Enhanced Comparative Analysis

This enhanced analysis is crucial for demonstrating the effectiveness and superiority of LS-SLM in the realm of community detection and topic modelling.

Modularity and Algorithm Comparison (Fig. 4 and Table 1): Our method’s comparison with established techniques like the Louvain Algorithm, Stochastic Block Model, Fast Greedy Algorithm, and Smart Local Moving, particularly focusing on modularity, aligns with the approach adopted by Nassar et al. in their exploration of deep multi-criteria collaborative filtering models (Reference [18]). Though their study concentrates on a different application area, the principle of evaluating sophisticated algorithms using key metrics like modularity is a common thread.

Sensitivity, Specificity, and NMI Analysis (Figs. 5, 6, and Table 2): The comparison on sensitivity, specificity, and Normalized Mutual Information mirrors the methodologies employed in studies like that of Z. Ali et al. (Reference [16]) in their examination of citation recommendation using heterogeneous bibliographic network embedding. The comparison’s essence lies in the meticulous evaluation of these complex algorithms on these critical performance metrics, a methodological approach shared across different research contexts.

F-Score and Community Detection Time (Figs. 7, 8, and Table 3): Further, the extension of our comparison to include F-Score and community detection time is conceptually parallel to the work of H. Li (Reference [8]) on personalized recommendation systems. Like Li’s study, our approach emphasizes rigorous algorithmic evaluation through a range of metrics, underpinning the robustness and versatility of the LS-SLM technique.

This comprehensive and enhanced comparative analysis underscores the LS-SLM technique’s potential and efficiency. It not only affirms the effectiveness of our proposed method but also contributes significantly to advancing the field of community detection and topic modelling, providing a reliable benchmark for future research.

6 Conclusions

This paper proposes an efficient article RS based on community detection with topic modelling using LS-SLM and PCC-LDA. Hence, the performance is evaluated for the proposed method. Regarding accuracy, precision, specificity, recall, F-Score, Community Detection time, Modularity, Normalized Mutual Information, and Betweenness Centrality, the performance of the proposed LS-SLM technique is analysed with the prevailing SLM, FGA, SBM, and LA. The paper focuses on the challenge of recommending scientific articles to users based on the relevance of articles within domain-specific communities. The proposed algorithm aims to improve upon existing methods by addressing scalability issues and providing better results in terms of modularity, accuracy, NMI, and betweenness centrality.

Significance of the Problem: The problem is significant because existing algorithms face scalability issues in community detection, particularly in large networks. Efficient article recommendation requires identifying relevant topics within communities, and scalability problems hinder this process. Addressing these issues can significantly enhance the quality of article recommendations.

Appropriateness of the Method: The proposed Linear Scale-Local Smart Moving (LS-SLM) algorithm is deemed appropriate because it specifically targets scalability issues in community detection. By focusing on factors like modularity, accuracy, NMI, and betweenness centrality, the algorithm aims to provide efficient and accurate community detection, thus improving the quality of article recommendations.

Outcomes: The outcomes include improved accuracy (96.32%), precision (96.3265%), modularity (95.6119%), and sensitivity (96.5487%) compared to existing algorithms. Additionally, the proposed LS-SLM technique demonstrates superior community detection time (38652 ms) and normalized mutual information (96.6365%).

Implications of Outcomes: The improved performance metrics imply that the LS-SLM algorithm can significantly enhance the quality and efficiency of scientific article recommendation systems. This has implications for researchers and practitioners who rely on such systems for staying updated on relevant literature.

References

Aslan

and Kaya

, A Hybrid recommendation system in co-Authorship networks, in IDAP, 2019 International Conference on Artificial Intelligence and Data Processing Symposium 2019 (2019), 1–5. doi: 10.1109/IDAP.2019.8875989.

Renjith

, Sreekumar

and Jathavedan

, An extensive study on the evolution of context-aware personalized travel recommender systems, Inf. Process. Manag 57(1) (2020), 102078. https://doi.org/10.1016/j.ipm.2019.102078.

Chaudhuri

, Sarma

, Samanta

Advanced Feature Identification towards Research Article Recommendation: A Machine Learning Based Approach, in TENCON 2019–2019 IEEE Region 10 Conference (TENCON), (2019), 7–12. doi: 10.1109/TENCON.2019.8929386.

Deebak

B.D.

and Al-Turjman

, A novel community-based trust aware recommender systems for big data cloud service networks, Sustain. Cities Soc 61 (2020), 102274. https://doi.org/10.1016/j.scs.2020.102274.

Liang

, et al. A Community-Based Collaborative Filtering Method for Social Recommender Systems, in, 2019 IEEE International Conference on Web Services (ICWS) (2019), 159–162. doi: 10.1109/ICWS.2019.00036.

Waheed

, Imran

, Raza

, Malik

A.K.

and Khattak

H.A.

, A hybrid approach toward research paper recommendation using centrality measures and author ranking, IEEE Access 7 (2019), 33155–33158. doi: 10.1109/ACCESS.2019.2900520.

Magara

M.B.

, Ojo

and Zuva

, Toward Altmetric-Driven Research-Paper Recommender System Framework, in, 2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) (2017), 63–68. doi: 10.1109/SITIS.2017.21.

, Research on Personalized Recommendation System Based on Big Data Mining Technology, in, 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE) (2020), 203–206. doi: 10.1109/ICMCCE51767.2020.00052.

Sheu

H.-S.S.

, Chu

, Qi

and Li

, Knowledge-guided article embedding refinement for session-based news recommendation, IEEE Trans. Neural Networks Learn. Syst 33(12) (2022), 7921–7927. doi: 10.1109/TNNLS.2021.3084958.

10.

De Medio,

, Limongelli,

, Sciarrone

, Temperini,

, MoodleREC: A recommendation system for creating courses using the moodle e-learning platform, Comput. Human Behav 104 (2020), 106168. https://doi.org/10.1016/j.chb.2019.106168.

11.

Jelodar

, et al. Recommendation system based on semantic scholar mining and topic modeling on conference publications, Soft Comput 25 (2021), 3675–3696.

12.

Chaudhuri

, Sinhababu

, Sarma

and Samanta

, Hidden features identification for designing an efficient research article recommendation system, Int. J. Digit. Libr 22(2) (2021), 233–249.

13.

Dai

, Zhu

, Wang

and Carley

K.M.

, Attentive stacked denoising autoencoder with bi-lstm for personalized context-aware citation recommendation, IEEE/ACM Trans. Audio, Speech, Lang. Process 28 (2019), 553–568.

14.

Wang

H.-C.C.

, Hsu

T.-T.T.

and Sari

, Personal research idea recommendation using research trends and a hierarchical topic model, Scientometrics 121(3) (2019), 1385–1406. doi: 10.1007/s11192-019-03258-x.

15.

Habib

and Afzal

M.T.

, Sections-based bibliographic coupling for research paper recommendation, Scientometrics 119(2) (2019), 643–656.

16.

Ali

, Qi

, Muhammad

, Bhattacharyya

, Ullah

and Abro

, Citation recommendation employing heterogeneous bibliographic network embedding, Neural Comput. Appl 34(13) (2022), 10229–10242.

17.

Liu

, Kou

, Yan

and Qi

, Keywords-driven and popularity-aware paper recommendation based on undirected paper citation graph, Complexity 2020 (2020).

18.

Nassar

, Jafar

and Rahhal

, A novel deep multi-criteria collaborative filtering model for recommendation system, Knowledge-Based Syst 187 (2020), 104811.

19.

Sharma

, Rana

and Malhotra

, Automatic recommendation system based on hybrid filtering algorithm, Educ. Inf. Technol (2022), 1–16.

20.

Ali

, Hafeez

, Humayun

, Jamail

N.S.M.

, Aqib

and Nawaz

, Enabling recommendation system architecture in virtualized environment for e-learning, Egypt. Informatics J 23(1) (2022), 33–45.

21.

Pradhan

and Pal

, CNAVER: A content and network-based academic venue recommender system, Knowledge-Based Syst 189 (2020), 105092.

22.

Pradhan

, Gupta

and Pal

, Hasvrec: A modularized hierarchical attention-based scholarly venue recommender system, Knowledge-Based Syst 204 (2020), 106181.

23.

Jain

, Khangarot

and Singh

, Content-Based Filtering. Springer Singapore, (2018). doi:10.1007/978-1-4939-7131-2_100201.

24.

Pradhan

and Pal

, A hybrid personalized scholarly venue recommender system integrating social network analysis and contextual similarity, Futur. Gener. Comput. Syst 110 (2020), 1139–1166.

25.

, et al. PAVE: Personalized Academic Venue recommendation Exploiting co-publication networks, J. Netw. Comput. Appl 104 (2018), 38–47.

26.

Ali

, Qi

, Muhammad

, Kefalas

and Khusro

, Global citation recommendation employing generative adversarial network, Expert Syst. Appl 180 (2021), 114888.

27.

Liu

, Chen

, Wu

and Wang

, FCNS: A fuzzy routing-forwarding algorithm exploiting comprehensive node similarity in opportunistic social networks, Symmetry 10(8) (2018), 338.

28.

Zhang

, Chen

, Wu

and Liu

, FRRF: A fuzzy reasoning routing-forwarding algorithm using mobile device similarity in mobile edge computing-based opportunistic mobile social networks, IEEE Access 7 (2019), 35874–35889.

29.

Zhang

, Hu

, Bian

Research on string similarity algorithm based on Levenshtein Distance. In 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) (pp. 2247–2251). IEEE (2017).

30.

Abdelrazek

, Eid

, Gawish

, Medhat

and Hassan

, Topic modeling algorithms and applications: A survey, Information Systems 112 (2023), 102131.

31.

Jiang

, Shi

, Liu

, Yao

and Ali

M.E.

, User interest community detection on social media using collaborative filtering, Wireless Networks (2022), 1–7.

32.

Vanchinathan

, Valluvan

K.R.

, Gnanavel

and Gokul

, Numerical simulation and experimental verification of fractional-order PI λ controller for solar PV fed sensorless brushless DC motor using whale optimization algorithm, Electric Power Components and Systems 50(1–2) (2022), 64–80.

Effective community detection with topic modeling in article recommender systems using LS-SLM and PCC-LDA

Abstract

Keywords

1 Introduction

2 Problem definition

3 Literature survey

4 Proposed article recommender system

4.2 Attribute extraction

4.3 Construction of citation graph

4.4 Abstract filtering

4.6 Feature extraction with TF-IDF

4.8 Similarity score via LSE

5.2 Performance analysis

Table 4 A comparative analysis of the proposed LS-SLM technique based on precision Techniques Precision (%) Proposed LS-LSM 96.32 GAN-HBN 61.7 CR-HBNE 45.9

References

Table 4
A comparative analysis of the proposed LS-SLM technique based on precision

Techniques Precision (%)

Proposed LS-LSM 96.32

GAN-HBN 61.7

CR-HBNE 45.9