A cognitive domain specific framework integrating large language model for COVID-19 vaccine sentiment analysis

Abstract

In this cognitive era, vast amount of data are accumulated every day. Analysing such unstructured information and obtaining insights will be challenging. To address this, Large language models have been developed to support analysis of extensive data corpora. However, it tends to cause hallucination due to a lack of proper knowledge sources. If the analysis has to be performed with respect to the health care domain or finance domain, the challenge is raised because of the lack of domain specificity. COVID-19 sentiment analysis is one of the complex responsibilities of the government since it needs to know the opinions of people to take necessary measures. This paper presents COVID-19 retrieval augmented and fine-tuning (RAFT), a novel framework that includes the analysis of COVID-19 vaccine tweets through retrieval augmented-based approaches. This integrated domain-specific knowledge through a retrieval-augmented generation-based approach with external knowledge sources. We employed a transformer-based semantic approach in embedding generation via vector database. Furthermore, this framework exhibited generalizability when integrated with domain knowledge. It uses parameter efficient fine tuning with quantization to use a large language model with a reduced number of parameters, which will allow a model to be used in low-resource-constrained devices. This framework achieved an accuracy of 0.886 on the Twitter dataset containing tweets specific to Indian region and 0.912 on the Twitter dataset with tweets from Global region.

Keywords

Textual sentiment analysis language model retrieval augmented generation fine tuning

1. Introduction

1.1 Background

Comprehending public attitudes regarding COVID-19 vaccines is essential for Government, healthcare organizations and researchers to evaluate vaccine hesitancy, the dissemination of misinformation, and the level of public confidence in vaccination initiatives.Textual sentiment analysis (TSA) of COVID-19 vaccines provides valuable insights into the perceptions of the public, which reveals a mix of hesitancy, positivity and spread of misinformation.^1,2 Nevertheless, there are a number of difficulties in evaluating such unstructured large-scale text data such as bias, inconsistent facts and a lack of domain-specific expertise.

Currently, large language models (LLMs) such as GPT³ and LLaMA⁴ play a pivotal role in shaping the retrieval of information from large amounts of massive data. Many of the LLMs performed significantly well in language processing-related downstream tasks.Zero-shot and one-shot learning are some of the notable ways in which these models offer remarkable results. However,directly applying these models on these social media data may lead to Hallucination.⁵ To maximize the performance gains from these models, domain-specific data and fine-tuning are essential.

Two important strategies are being investigated extensively to reduce hallucinations and enhance factual consistency:

Fine-tuning LLMs on domain-specific data to match task-specific specifications.⁶

Retrieval-Augmented Generation (RAG), which incorporates retrieval of external knowledge to firmly establish LLM responses in reliable sources such as government-approved data,⁷ WHO reports and CDC reports.

For the COVID Vaccine sentiment analysis, fine-tuning enables the model to understand vaccine-related discussions from social media. But depending only on fine-tuning could lead to inaccurate information and restricted generalization. On the other hand, if RAG is used without any fine tuning, domain relevance may be diminished since the model might place too much emphasis on external knowledge and ignore the context found in user-generated content. In order to address these issues, we introduce a framework called Retrieval-Augmented Fine-Tuning (RAFT), which combines the two methods to improve domain-specific accuracy while preserving the consistency of the model. To enhance decision-making, our RAFT approach integrates sentiment categorization of tweets with retrieval from reliable sources, including government health recommendations and WHO reports. We also improve the RAG mechanism by adding confidence-weighted retrieval, which gives highly relevant and reliable information.

A vector database is used to store the generated embedding from documents. The question, i.e., prompt embedding, was compared with the matched embedding from the chunks. This was fed to the LLM to generate answers. Our RAG model was instructed to provide answers of either ‘Positive’ or ‘Negative’ or ‘Neutral’. By striking a balance between domain based retrieval and user sentiment extraction, this hybrid technique maximizes the model's performance.

Although domain-specific datasets can be used for sentiment analysis of COVID-19 vaccinations, depending just on user-generated content presents a number of difficulties, such as bias, inaccuracy and a lack of scientific context. Conventional fine-tuning on these datasets runs the risk of overfitting and not generalizing to fresh conversations about vaccines. We proposed a hybrid COVID Retrieval-Augmented Fine-Tuning (RAFT) framework, which combines model fine-tuning and external knowledge retrieval to address these problems. By allowing the model to learn from reliable sources like the WHO and scientific literature. As RAFT improves factual consistency and adaptability in contrast to conventional fine-tuning techniques, this method makes sentiment classification more dependable for practical uses in public health and policy analysis by ensuring that it stays strong, explicable, and impervious to false information.

1.2 Our contribution

In this proposed investigation, our main contributions can be distilled as follows:

▪
We propose a novel RAFT framework customized for textual sentiment analysis of COVID-19 vaccines. By providing external domain-specific resources, the hallucinations of LLMs were reduced
▪
The inclusion of fine-tuning LLMs with parameter-efficient tuning and Quantized LoRa⁸ to analyse social media tweets enabled the learning model to understand public perceptions of the COVID-19 vaccine.
▪
Inclusion of Retrieval augmented Generation with customized semantic sentence similarity
▪
RAFT for better domain adaptation which enhances the reliability of sentiment classification.

1.3 Organization of paper

The remaining section of the paper is organized as follows: Section 2 discusses the related works of the approach. Section 3 discusses the proposed system design. Section 4 discusses the evaluation approach for the system. Section 5 presents the conclusion and summary of the work.

2. Related work

This section discusses domain specific and LLM based sentiment analysis approaches in three categories. The first category includes approaches that use domain-specific knowledge from different domains, such as health and finance. The second category includes approaches that use deep learning and transfer learning-related methods. The third category includes approaches that use LLM as a pretrained model,involving fine tuning or RAG.

2.1 Domain specific approaches

Aye Hninn Khine et al. proposed a new word embedding technique that learns the context from medical-related⁹ sources and generates a domain-specific embedding, which is then passed to the convolution layer to learn the features. It achieves an accuracy maximum of 99.1% when the concatenated embeddings are tested with the medical review dataset. A new method to address word sense disambiguation in the Portuguese language was proposed¹⁰ by using the fine-tuned version of BERT, which was trained on a task- and language-specific dataset and achieved 84% accuracy. To improve the domain-specific task, packages such as RSENTIMENT and STNZA have been developed, and the related evaluation measures were discussed by Amin Mahmoudi et al.¹¹ Bi-LSTM with capsule networks trained on a specific knowledge base performed well in hating speech identification and achieved an F1 score of 0.94.¹² A GUI-based tool was developed to support the coreference resolution of entities and events in multiple languages via node-based special annotations.¹³ To increase the robustness of sentiment analysis, perturbations must be handled to leverage quantification. The method used to address this problem involves a fusion attention mechanism with graph embedding, which interprets the alteration and improves the accuracy by 93%.¹⁴ Semantic linguistic rules also play a vital role in addition to semantic embedding via an attention mechanism, and they increase the performance of the system.¹⁵ Hongtu Xie et al.¹⁶ proposed a dialogue generation model based on an external knowledge base by preserving entity aware intention in dialogs with a two-stage training method and achieved 94% accuracy. To enhance the contextual understanding of ecommerce Chinese text, a unique embedding Attention capsule network¹⁷ was developed, and it outperformed with 82% accuracy. The context of the sentiment changed with evolving time. To meet this challenge, a method was proposed to utilize knowledge graph embedding, which was concatenated with BiLSTM embedding and further processed with attention, and it obtained 84.72% accuracy with the REST14 dataset.¹⁸ The perception capture system has also focused on specific cases, such as depression detection and vulnerability detection. In today's AI landscape, there is a significant demand for enhancing depression detection in social media forums. Detection, particularly in Weibo comments, was demonstrated by using domain knowledge-enhanced lexicons and an ensemble of eight machine learning models and achieved 94% accuracy (Z.¹⁹ During the COVID-19 pandemic, depression rates were high due to constraints such as social isolation and financial instability. A multimodal framework was developed to detect depression by using extrinsic features taken from URLs and a visual feature vector generated by mining a user's post and related tweets, resulting in 91% accuracy.²⁰ A new lexicon generation model was created to incorporate vocabulary extension and porting to improve bilingual emotion detection with 73.24 accuracy.²¹ The financial domain significantly contributes to sentiment analysis by providing insights into market trends and economic indicators. This case demands financial domain-specific lexicons or models. Prompt fine-tuning based on a lexicon was crafted instead of using an external knowledge base. In addition, the model also uses a pretrained model for fine tuning, yielding the lowest MAE of 0.384.²² To handle domain-specific texts, emotion tagging with multiple labels,²³ the weighted voting technique,²⁴ and explainable specific lexicons²⁵ were established.

Extracting valuable information from COVID-19 pandemic or vaccine-related tweets demands the use of specialized domain-specific techniques. The mask attention model²⁶ was designed to identify the presence of personal health-related words. The epidemic season was analysed, and an information diffusion model using a Markov chain was developed to understand the impact of vaccination by Cong Li et al.²⁷ The importance of sentiment analysis in epidemic periods is crucial.²⁸ Language-specific medical entities were infused into the traditional Unified Medical Language System to enhance sentiment detection.²⁹ Xiangpeng Wan et al. discussed the evolving conversation around COVID-19 through topic modeling and analysed American digital news via the BART algorithm.³⁰

Other domain-related aspects were also prominently featured in the research, shedding light on their significance and implications. Noor Afiza Mat Razali et al. created a methodology employing hybrid lexicons and machine learning models for predicting political security threats.³¹ A DNA fine-tuned language model³² was built to determine the sequence specificity of DNA-binding proteins since the majority of DNA dependencies were eliminated via standard one-hot encoding. The Holy Books-specific model³³ and language-specific models³⁴ have garnered attention for their specialized approach. The Portuguese language-specific perception capture system yielded an F1 score of 0.91. The modified switch-based transformer³⁵ was designed to address Arabic sentiment analysis and attained an F1 score of 0.83.³⁶

2.2 Transformer-based and deep learning-based approaches

Table 1 highlights the related methods that use deep learning and transformer-based models employed in sentiment analysis.

Table 1.
Related methods based on transformer based models.

Ref Model Dataset Significant factor Evaluation Measures

³⁷ Offensive Language Detection Crawled from many sources Fine tuned BERT
handled bilingual data 91% accuracy

³⁸ Neutrosophic Logic (NL) for handling ambiguity Dataset which contains Tweets related to World Football club uncertainty classifiers are tested with Type 1 and Type 2 Fuzzy logic 16.02% low level ambiguity recorded in NL

³⁹ sarcasm identification in Arabic language 4 dataset which contains the tweet in Arabic language Hybrid DL model consists of BERT and Word2Vec Highest F1-score 0.867

⁴⁰ Threat detection Crawled data from different sources Transformer based Deep learning model 97% accuracy with BERT

⁴¹ Fake review Detection YelpZIP,YelpChi,OTT dataset sentiment intensity was analysed from semi supervised positive unlabeled data 87% accuracy

⁴² Method to handle Catastrophic forgetting in neural network SemEval14,HL5Domain,Ding9Domain dataset Tuning method and sentiment polarity voting method was combined 91% accuracy

⁴³ Hyperparameter tuning KOSHA classification dataset achieved using Grid search and cross validation 78% accuracy

⁴⁴ Deep sentiment analysis Kaggle dataset which contains product reviews RNN with decision based classification includes aspect and priority 89% accuracy

⁴⁵ Aspect level sentiment detection SemEval-2014, SemEval-2015, SemEval-2016 BILSTM model with dependency weights Recall 0.94

⁴⁶ context vector construction Crawled dataset two levels of fine tuning which can able to handle coreference resolution 90% accuracy

⁴⁷ Deal with Neural Coreference resolution OntoNotes dataset ELECTRA-Large model was employed Average F1-score 81.2

⁴⁸ Few shot learning for Stance detection P-Stance dataset used multimodal embedding with graph embedding Average F1 score 0.71

⁴⁹ Sentiment analysis for Image – text pair MVSA dataset Memory enhanced cross attention was employed 78.27% accuracy

⁵⁰ Classifying sentiment in Political Communication ISEAR dataset Find tuned BERT with Hungarian language 0.71 F1-score

⁵¹ Enhanced sentiment detection by handling lengthy words Crawled dataset unique lexicon was crafted 0.96 F1-score

⁵² Solution for coreference resolution MUC,B3 dataset CNN with GRU, attention mechanism was used 72.8 F1-score

⁵³ traceability links prediction CodeSearchNet dataset Find tuned CodeBERT F1-score 0.769

⁵⁴ Patent Classification USPTO-3M dataset – crawled from Patent sources fine tuned BERT 81.75 Precision

⁵⁵ Analyse tweet related to energy prices crawled data source, SemEval-14 BERTopic and Deep learning model BiLSTM and CNN was used 81.94 with semEval14

⁵⁶ Text Analysis tool for Arabic Dataset for basic NLP tasks in Arabic Linguistic words related to the language was constructed Lexical density was measured

⁵⁷ Topic enhanced sentiment classification semEval 2014 dataset Fine tuned BERT 85.65% accuracy

⁵⁸ COVID vaccine tweet analysis 4 publicly available Kaggle dataset was used deep learning models 91.25% accuracy

⁵⁹ COVID vaccine tweet analysis across countries twitter and Weibo posts Across country, sentiment was analysed Changes in emotion level was measured

⁶⁰ Detection of COVID 19 related Rumors in Arabic ArCov dataset CNN and Bi-LSTM models with attention was employed 91.5% accuracy

⁶¹ Emotional analysis on COVID 19 news Crawled dataset (REACT dataset) Topic modelling and deep leaning models were used Emotion was measured at different aspects.

⁶² COVID 19 vaccine tweet analysis in Italy crawled tweets from Jan 2021 to Feb 2022 Used lexicon based approach sentiment trends were analysed

⁶³ COVID 19 vaccine tweet analysis Kaggle COVID 19 vaccine dataset LDA with Clustering algorithm Topics were clustered

²⁸ Health related topic analysis during pandemic Twitter dataset Review of related techniques Mental health during pandemic was discussed

Ref	Model	Dataset	Significant factor	Evaluation Measures
³⁷	Offensive Language Detection	Crawled from many sources	Fine tuned BERT handled bilingual data	91% accuracy
³⁸	Neutrosophic Logic (NL) for handling ambiguity	Dataset which contains Tweets related to World Football club	uncertainty classifiers are tested with Type 1 and Type 2 Fuzzy logic	16.02% low level ambiguity recorded in NL
³⁹	sarcasm identification in Arabic language	4 dataset which contains the tweet in Arabic language	Hybrid DL model consists of BERT and Word2Vec	Highest F1-score 0.867
⁴⁰	Threat detection	Crawled data from different sources	Transformer based Deep learning model	97% accuracy with BERT
⁴¹	Fake review Detection	YelpZIP,YelpChi,OTT dataset	sentiment intensity was analysed from semi supervised positive unlabeled data	87% accuracy
⁴²	Method to handle Catastrophic forgetting in neural network	SemEval14,HL5Domain,Ding9Domain dataset	Tuning method and sentiment polarity voting method was combined	91% accuracy
⁴³	Hyperparameter tuning	KOSHA classification dataset	achieved using Grid search and cross validation	78% accuracy
⁴⁴	Deep sentiment analysis	Kaggle dataset which contains product reviews	RNN with decision based classification includes aspect and priority	89% accuracy
⁴⁵	Aspect level sentiment detection	SemEval-2014, SemEval-2015, SemEval-2016	BILSTM model with dependency weights	Recall 0.94
⁴⁶	context vector construction	Crawled dataset	two levels of fine tuning which can able to handle coreference resolution	90% accuracy
⁴⁷	Deal with Neural Coreference resolution	OntoNotes dataset	ELECTRA-Large model was employed	Average F1-score 81.2
⁴⁸	Few shot learning for Stance detection	P-Stance dataset	used multimodal embedding with graph embedding	Average F1 score 0.71
⁴⁹	Sentiment analysis for Image – text pair	MVSA dataset	Memory enhanced cross attention was employed	78.27% accuracy
⁵⁰	Classifying sentiment in Political Communication	ISEAR dataset	Find tuned BERT with Hungarian language	0.71 F1-score
⁵¹	Enhanced sentiment detection by handling lengthy words	Crawled dataset	unique lexicon was crafted	0.96 F1-score
⁵²	Solution for coreference resolution	MUC,B3 dataset	CNN with GRU, attention mechanism was used	72.8 F1-score
⁵³	traceability links prediction	CodeSearchNet dataset	Find tuned CodeBERT	F1-score 0.769
⁵⁴	Patent Classification	USPTO-3M dataset – crawled from Patent sources	fine tuned BERT	81.75 Precision
⁵⁵	Analyse tweet related to energy prices	crawled data source, SemEval-14	BERTopic and Deep learning model BiLSTM and CNN was used	81.94 with semEval14
⁵⁶	Text Analysis tool for Arabic	Dataset for basic NLP tasks in Arabic	Linguistic words related to the language was constructed	Lexical density was measured
⁵⁷	Topic enhanced sentiment classification	semEval 2014 dataset	Fine tuned BERT	85.65% accuracy
⁵⁸	COVID vaccine tweet analysis	4 publicly available Kaggle dataset was used	deep learning models	91.25% accuracy
⁵⁹	COVID vaccine tweet analysis across countries	twitter and Weibo posts	Across country, sentiment was analysed	Changes in emotion level was measured
⁶⁰	Detection of COVID 19 related Rumors in Arabic	ArCov dataset	CNN and Bi-LSTM models with attention was employed	91.5% accuracy
⁶¹	Emotional analysis on COVID 19 news	Crawled dataset (REACT dataset)	Topic modelling and deep leaning models were used	Emotion was measured at different aspects.
⁶²	COVID 19 vaccine tweet analysis in Italy	crawled tweets from Jan 2021 to Feb 2022	Used lexicon based approach	sentiment trends were analysed
⁶³	COVID 19 vaccine tweet analysis	Kaggle COVID 19 vaccine dataset	LDA with Clustering algorithm	Topics were clustered
²⁸	Health related topic analysis during pandemic	Twitter dataset	Review of related techniques	Mental health during pandemic was discussed

Recent advancements in transformer architectures have significantly influenced both vision and language domains, driven by their ability to model long-range dependencies, dynamic context interactions, and hierarchical attention patterns. In the visual domain, WaveFormer⁶⁴ incorporates wavelet-based decomposition into transformer attention to enhance noise resilience in video inpainting, while ViGT⁶⁵ uses learnable tokens that dynamically attend to key regions to exhibit proposal-free video grounding. BVINet⁶⁶ takes this further by achieving blind video inpainting with zero annotations, leveraging masked token modeling similar to BERT-style self-supervised learning. In order to condition transformer-based image editing, Prompt-Aware Controllable Shadow Removal⁶⁷ incorporates textual prompts into the vision pipeline, demonstrating cross-modal flexibility.

Additionally, transformers can capture complicated temporal patterns and refine ambiguous samples using class prototypes, as demonstrated by hybrid temporal modeling for repetitive action counting⁶⁸ and prototypical calibration for micro-action detection.⁶⁹ When taken as a whole, these pieces show a developing trend in transformer adaptation for context-sensitive, fine-grained tasks in both spatial and temporal dimensions.

Parallel to these developments in vision, the textual domain has seen complementary innovations that similarly enhance transformer capabilities through retrieval, pretraining, and few-shot adaptation. While KILT⁷⁰ created a benchmark suite for knowledge-intensive activities to standardize evaluation, REALM⁷¹ and RETRO⁷² offered retrieval-augmented generation paradigms that integrate external knowledge into language models to increase factual accuracy and domain generalization. RAG frameworks for claim verification in the scientific realm were expanded by SciFact-RAG,⁷³ which demonstrated the value of integrating retrieval with generative reasoning in specific contexts.

DeBERTaV3 with prompt tuning⁷⁴ also showed that soft prompt-based transformers can perform well in few-shot sentiment categorization, particularly in domain-specific or low-resource settings. Together with the architectural patterns seen in vision, these pieces show a path toward convergence where token-level control, context-aware embedding, and prototype-informed inference are advantageous for both modalities. By combining retrieval-aware representation learning with semantic generation specifically designed for textual data, our work builds a domain-specific RAG transformer for sentiment analysis based on these common ideas.

2.3 LLM- and RAG-related approaches

With the boom of large language models, there has been a significant transformation in various fields, including NLP and artificial intelligence. As these models are continuously evolving, the integration of these into various aspects of society will lead to further innovation and shape the future of technology.

Llama is a model released by Meta, which was employed with PEFT to review the code automatically by Junyi Lu et al.⁷⁵ Since it is a generation task, it outperforms the evaluation measure BLEU score by 81.87. Mohaimenul Azam Khan Raiaan et al. discussed⁷⁶ the trends and challenges of large language models from BERT to Llama, such as the complexity of fine tuning, contextual constraints and complexity in evaluation. Zhe Li et al. developed(Z.⁷⁷ a fine-tuned language model for understanding agglutinative languages. The performance of LLMs such as GPT and FLAN-T5 was analysed across different benchmark datasets in sentiment detection,⁷⁸ which revealed the significance of prompt learning in the NLP field. Zero-shot learning performance was discussed with various language models across crypto-related tweets, from which DistillBERT achieved 91.87% accuracy.⁷⁹ LLMs also play significant roles in energy modelling.⁸⁰ A fine-tuned version of LoRA with a language model was employed to create a remarkable impact on tasks such as question answering and chat.⁸¹ Large language models can be fine-tuned with biomedical-related sources such as PubMed⁸² to perform domain-specific downstream tasks. Xinjie Sun et al. constructed a domain-specific knowledge base by means of pretrained language models and incorporated it for aspect-based sentiment detection of product reviews, which yielded an accuracy of 88.6%.⁸³ For low-resource languages, data augmentation was performed via language models and achieved significant performance improvement.⁸⁴ Rui Cao et al. designed a dictionary-based prompt tuning method using pretrained language models to emphasize the detection of domain-specific words.⁸⁵ Prompt-based learning has demonstrated improved performance in GLEU evaluation results when fine-tuned for specific languages.⁸⁶ This approach also excelled in code generation tasks across various programming languages, such as Go, Ruby, and Python, particularly when parameter-efficient fine-tuning was employed in low-resource settings.⁸⁷ Apart from the performance of LLM in summarization or generation,⁸⁸ it also assists in finding domain-related specific topic importance detection. ShaoBo Sun et al. used prompt learning to detect topic importance with respect to sentiment for financial news. For this purpose, the system⁸⁹ used Chinese financial news with a specific prompt-templated-based learning method involving multiple tasks and yielded 88.74% accuracy. Clinical nursing notes provide more domain-specific information. LLMs such as Llama performed well, with 89% accuracy in testing with these nursing records, which included ICD-10 codes and other patient information.⁹⁰ Konstantinos I. Roumeliotis et al. employed models such as Llama and GPT for training with e-commerce-related tweets and analysed their impact.⁹¹ Fine-tuned models are employed across a diverse range of fields, highlighting their versatility and adaptability. In the cybersecurity domain,⁹² they are used to detect specific terms related to threats and vulnerabilities, aiding in sentiment analysis and topic modelling to better understand and anticipate potential risks. In the financial sector,⁹³ these models analyse data to identify patterns indicative of crises, providing critical insights for preventive measures and strategic planning. Legal professionals⁹⁴ utilize fine-tuned models to evaluate and interpret complex regulations and rules, streamlining the analysis process and ensuring compliance. In the education sector, these models assess student responses,⁹⁵ generating scores that reflect understanding and performance, thereby contributing to more personalized and effective learning experiences. This broad application demonstrates the significant impact of fine-tuned models in enhancing decision-making and efficiency across various industries.

According to earlier research, transformer-based models require a lot of labeled data and have trouble in adapting to specific domains. By including domain-specific retrieval, RAFT lessens this problem and enables improved adaptation with fewer fine-tuning parameters.

3. Proposed COVID-19 vaccine textual sentiment analysis

Our proposed system for the COVID-19 vaccine TSA consists of three main modules : One was the fine-tuning LLM, the second was the RAG with custom similarity, and the third was the combination of both fine-tuning and the customized RAG. The proposed work is shown in Figure 1.

Figure 1.

Proposed framework for COVID-19 vaccine TSA.

3.1. Dataset

The dataset used for this research is collected from the X platform (known as ‘twitter’) via COVID-19 vaccine-related keywords (TD dataset). Developers must provide detailed use case examples and explain how tweets will be accessed for the chosen use case. Tweets were collected from July 2020 to March 2022 in India via keywords such as ‘COVID-19’, ‘Corona’, ‘COVID-19 vaccine’, ‘vaccinated’, ‘vaccination’, ‘pandemic’, ‘corona outbreak’, ‘corona wave’, and ‘vaccination campaign’. A total of 2,00,940 tweets were collected based on these keywords.

The TD dataset contains tweets from the Indian region. We also included a second dataset from Kaggle,⁹⁶ which included 228,207 tweets from different parts of the world, to examine the sentiment of COVID-19 immunizations globally (KG dataset). The majority of the tweets in the dataset are in English.Both datasets have a target column that depicts the polarity of tweets: ‘Positive’, ‘Negative’ and ‘Neutral’. Table 2 lists the complete information of the dataset.VADER (Valence Aware Dictionary and sEntiment Reasoner),a popular lexicon-based sentiment analysis tool,⁶ was utilized to accomplish the annotation. VADER is appropriate for social media text analysis since it uses linguistic heuristics to assign sentiment scores.

Table 2.
Details of the dataset.

Dataset Quantity of tweets Quantity of Positive tweets Quantity of Negative tweets Quantity of Neutral Tweets Source

TD 2,00,940 88,149 44,768 68,023 Crawled from X

KG 2,28,207 84,025 35,885 1,08,297 Kaggle

Dataset	Quantity of tweets	Quantity of Positive tweets	Quantity of Negative tweets	Quantity of Neutral Tweets	Source
TD	2,00,940	88,149	44,768	68,023	Crawled from X
KG	2,28,207	84,025	35,885	1,08,297	Kaggle

3.2. Fine-tuning LLMs

Fine tuning allows the model to perform task-specific analysis. Since the proposed system focused on the COVID-19 vaccine TSA, it is an essential step in the learning process. It consists of the following steps:

Preprocessing the dataset

Base LLM initialization

Creating a prompt template

Configuration of Parameter Efficient Fine Tuning with Quantization

Fine-tuning process

3.2.1 Preprocessing the dataset

The dataset has columns such as id, name of the user, location of the user, description of the user, number of followers of the user, number of user friends, user_favorites, verified user or not, date, tweeted text, used hashtags, relevant source, retweet information, favourites, retweets or not and sentiment of the tweet. From this, only two columns were taken for mining: ‘text’, which contains the text content of the tweet, and ‘sentiment’, which contains the target label. From the text column, hyperlinks and emojis were removed since the base LLM chosen was trained to analyse the unstructured text content. These preprocessing was implemented using NLTK package. Figure 2 represents the word cloud form of positive, negative and neutral classes from the Global Tweets dataset. This pre-processed data was given as input to the base language model for training after the completion of the necessary configuration and prompt template creation.

Figure 2.

Word cloud representation of Tweets for Twitter dataset.

3.2.2 Base LLM initialization

Owing to the abundance of large language models, the proposed framework uses the llama-3-8B model for fine-tuning and the RAG framework. This choice capitalized on the immense capacity of the llama-3-8B model to understand and process complex languages, providing a strong starting point for fine-tuning the model towards sentiment analysis of COVID-19 vaccines. By utilizing the RAG framework, the fine-tuning process can be further enhanced, potentially leading to more accurate sentiment analysis. This can be particularly useful for understanding language or references that the model might not have encountered in its pretraining data.

3.2.3 Creating a prompt template

This function created a consistent and well-defined prompt template for COVID-19 vaccine TSA. It leverages predefined instructions and incorporates specific text and sentiment labels from the training data to guide the large language model during the training process. An illustration of the creation of a prompt template from the dataset is shown in Figure 3. A prompt template was formed to instruct the LLM to perform the specific task. It was created with static strings such as the Introduction_string, Instruction, Input and Response and End key. The instruction_string provided the basic idea to the base LLM. Instruction offered the task specificity, and it needed to be combined with the text column where the response has to be integrated with the target column. End depicted the end of the prompt.

Figure 3.

Workflow illustrating the creation of prompt templates derived from the dataset.

3.2.4 Configuration of parameter efficient fine tuning with quantization

Training a massive number of parameters increases training time and energy consumption. This makes the model difficult to train on devices with low constraints. QloRA addresses this issue by introducing a small set of trainable “adapters” that capture task-specific information. This keeps the number of trainable parameters minimal. PEFT with QLoRA allows the powerful Llama3 model to be fine-tuned on specific tasks even with limited resources. This approach strikes a balance between efficiency and performance, making large language models more accessible and manageable. By training only the adapter layers, QLoRA focuses on learning sentiment-related patterns instead of retraining everything. This can help mitigate the influence of biases and misinformation present in the training data.

3.2.5 Fine-tuning process

After completing the necessary configurations, the language model was trained with the supervised fine-tuning module. It iterates through batches of text snippets from the dataset. Each snippet passes the text through the pretrained model and then through the adapter layers. It calculates the loss on the basis of the predicted sentiment and the labeled sentiment in the dataset. It uses the loss to backpropagate and update the weights of the adapter layers and the quantized weights. This iterative process continues until the model converges or a predefined number of training epochs is completed. As the RAFT framework is employed, fine tuning is integrated with the RAG for the final loss calculation.

3.2.6 Naïve RAFT framework

The system works with a naïve RAG framework for enhancing domain-specific knowledge from external sources. The following are the crucial components of this framework: knowledge sources, embedding generation and RAFT retrieval.

3.2.6.1 External knowledge sources

To understand the facts related to COVID-19 vaccines, the system depends on the domain-specific external knowledge sources as listed in Table 3. For this case, PubMed articles are not considered since the objective is to analyze the public opinion which does not require the extensive scientific literature.The main sources of the information includes the following:

Table 3.
External knowledge sources.

Source Document Title Pages

CDC (Centers for Disease Control and Prevention) 2024–2025 COVID-19 Immunization Schedule 10

https://www.cdc.gov/vaccines/covid-19 2024–2025 Formula Moderna COVID-19 Vaccine Standing Orders 5

2024–2025 Formula Moderna COVID-19 Vaccine At-A-Glance 3

2024–2025 Formula Pfizer-BioNTech COVID-19 Vaccine At-A-Glance 3

2024–2025 Formula Pfizer-BioNTech COVID-19 Vaccine Standing Orders 4

MoHFW (Ministry of Health and Family Welfare, India) COVID-19 Vaccination Operational Guidelines 148

https://www.mohfw.gov.in Guidelines for Adverse Events Following Immunization (AEFI) 316

WHO (World Health Organization) COVID-19 Vaccines: Safety Surveillance Manual 26

https://www.who.int Global COVID-19 Vaccination Strategy 36

At a Glance Report: Assessing Country Readiness for COVID-19 Vaccine 5

Evaluation of COVID-19 vaccine effectiveness 70

EMA (European Medicines Agency) Comirnaty (Pfizer-BioNTech) COVID-19 Vaccine: EPAR 140

https://www.ema.europa.eu/ Spikevax (Moderna) COVID-19 Vaccine: EPAR 169

Vaxzevria (AstraZeneca) COVID-19 Vaccine: EPAR 181

COVID-19 Vaccine Janssen: EPAR 218

Nuvaxovid (Novavax) COVID-19 Vaccine: EPAR 168

Source	Document Title	Pages
CDC (Centers for Disease Control and Prevention)	2024–2025 COVID-19 Immunization Schedule	10
https://www.cdc.gov/vaccines/covid-19	2024–2025 Formula Moderna COVID-19 Vaccine Standing Orders	5
	2024–2025 Formula Moderna COVID-19 Vaccine At-A-Glance	3
	2024–2025 Formula Pfizer-BioNTech COVID-19 Vaccine At-A-Glance	3
	2024–2025 Formula Pfizer-BioNTech COVID-19 Vaccine Standing Orders	4
MoHFW (Ministry of Health and Family Welfare, India)	COVID-19 Vaccination Operational Guidelines	148
https://www.mohfw.gov.in	Guidelines for Adverse Events Following Immunization (AEFI)	316
WHO (World Health Organization)	COVID-19 Vaccines: Safety Surveillance Manual	26
https://www.who.int	Global COVID-19 Vaccination Strategy	36
	At a Glance Report: Assessing Country Readiness for COVID-19 Vaccine	5
	Evaluation of COVID-19 vaccine effectiveness	70
EMA (European Medicines Agency)	Comirnaty (Pfizer-BioNTech) COVID-19 Vaccine: EPAR	140
https://www.ema.europa.eu/	Spikevax (Moderna) COVID-19 Vaccine: EPAR	169
	Vaxzevria (AstraZeneca) COVID-19 Vaccine: EPAR	181
	COVID-19 Vaccine Janssen: EPAR	218
	Nuvaxovid (Novavax) COVID-19 Vaccine: EPAR	168

COVID-19 vaccine regulations:

This source contained documents related to COVID-19 vaccine regulations released by the WHO. It provides internal guidelines and regulatory information including key aspects such as dosage and booster doses.

Vaccine approval:

These documents were released by the Ministry of Health and Family Welfare, India. It contains information related to all approved vaccines, including their dosage information and possible side effects

COVID-19 vaccine operational guidelines:

This source was released by the Ministry of Health and Family Welfare, India. This covers information about communication and social mobilization before and after vaccination, adverse events following immunization, details of the Co-WIN platform and the overall vaccination process. They also highlight the preventions and measures to be taken during pandemic. In addition to these sources,relevant literature on COVID-19 vaccines was incorporated into the RAG framework. All these documents contained the factual information that was used when the LLM was fine-tuned with a retrieval framework.

3.2.6.2 Embedding generation

The naïve RAG framework uses cosine similarity or the Euclidean distance for semantic search. In this approach, we improve the semantic search by generating an embedding through sentence BERT. SBERT is a pretrained model that outperforms many SOTA results.⁹⁷Our RAG framework leverages SBERT for embedding generation. The generated embeddings were stored in a vector database, which is a popular storage space for vectors in the LLM retrieval process. In our framework, we use Chroma DB as the vector database. Documents from the knowledge sources were split into chunks, and for each chunk, embedding was generated and stored in the ChromaDB which inturn enable efficient retrieval.

3.2.6.3 RAFT retrieval

Embeddings were generated for both the query and the retrieved documents. These embeddings are dense vector representations that capture the semantic meaning of the text. During the fine-tuning process, the model uses the embeddings generated from the Chroma DB to retrieve the result. It enhances the ability of the model to learn from a large amount of domain-related data for improved performance on downstream tasks such as sentiment analysis. An attention mechanism was used to determine the most important and relevant portions of the retrieved documents. The internal parameters of the model were updated on the basis of the input data and the content of the retrieved documents, which was guided by the attention weights. RAFT retrieval allows the model to eliminate the limitations of pretrained data by accessing and leveraging information from the ChromaDB. This approach benefits tasks that require domain-specific knowledge.

Compared with retrieval-based approaches, RAFT achieves reduced factually incorrect outputs by integrating relevant knowledge directly into the model during fine-tuning, which eliminates the need for external document lookups during generation.

3.3 Algorithmic framework

The proposed fine-tuned COVID RAFT for domain-specific knowledge is depicted in Algorithm 1. The inputs were queries, domain-specific documents and data, which were used to fine-tune the base LLM model. Here, the base LLM model was chosen as Llama 3.

Equation (1) represents the initial query embedding, which is generated with 768 dimensions represented as d embedded in the $R^{d}$ vector space. Equation (2) represents the document embedding, which is generated as 768 dimensions for n documents. Here, n is the number of documents considered for adapting the COVID-19 vaccine domain knowledge. Equations (3)–(4) represent the process of vector database indexing and retrieval. Documents are indexed in the Chroma vector database for the efficient retrieval process. The top k documents were retrieved from this vector store, which was most similar to the generated query embedding. Equation (5) represents the concatenation process, where query embedding and document embedding are concatenated to be projected in the $R^{(k + 1) \times d}$ vector space by stacking query embedding as the first row followed by k document embeddings. Since the column size is d, the resulting dimension leads to (k 1) $\times d$ . Equations (6)-(7) highlight the parameter tuning with the quantization process. PEFT minimizes the number of trainable parameters by optimizing the model performance, which can be described as follows:

$θ_{f t u n e d} = a r g m i n_{θ} L (θ)$ where L ( $θ$ ) represents the loss function

The quantization process fine-tunes the model to optimize computational and memory efficiency. This involves the quantization of model parameters by means of applying low-rank factorization, which can be highlighted as follows:

$θ_{Q L o R a} = a r g m i n_{θ} L (θ)$ Regularization ( $θ$ )

where Regularization ( $θ$ ) involves quantization with low-rank constraints.

After the integration of the RAG, the process moved to response generation from the fine-tuned and optimized LLM, which is depicted in Equation (8). The response generated will be within the 3 possibilities since the prompt input was designed such a way to generate classification as positive or negative or neural. Equation (9) and Equation (10) represent the result fetching and the evaluation with respect to the test data, respectively.

4. Experimental results and discussions

In this section, we compared the proposed COVID-19 RAFT system with various other language models with fine-tuning and RAG methods. The base LLM for this approach was Llama-3-8B. It was trained with an NVIDIA A40 GPU with quantization hyperparameters he number of training epochs was fixed at 10, and the total training time was approximately 65 min. The model weights are subsequently stored and used for the inference process. The hyperparameter values are listed in Table 4.

Table 4.
Hyperparameter configurations applied in the model training phase.

Hyperparameter Value

Learning rate 2 $e^{- 4}$

Epoch 10

LoRA rank 16

LoRA Alpha 64

beta1 of adam 0.9

beta2 of adam 0.99

epsilon in adam 1.00 × 10−8

Batch Size 5

gradient accumulation steps 4

Hyperparameter	Value
Learning rate	2 $e^{- 4}$
Epoch	10
LoRA rank	16
LoRA Alpha	64
beta1 of adam	0.9
beta2 of adam	0.99
epsilon in adam	1.00 × 10−8
Batch Size	5
gradient accumulation steps	4

Some of the sample test cases are as follows :

Consider this tweet : “I got my COVID-19 vaccine today! Feeling great and happy to be protected. #VaccinesWork”. For this case retrieved documents are :

WHO – COVID-19 Vaccines: Safety Surveillance Manual which confirms vaccines are safe and effective.

CDC – 2024–2025 COVID-19 Immunization Schedule which lists recommended vaccines and schedules.

MoHFW – COVID-19 Vaccination Operational Guidelines which Provides guidance on vaccine administration.

Fine tuned LLM with the help of these Retrieved documents, the sentiment was evaluated as Positive.

For this tweet: the COVID-19 vaccine contains microchips to track people! Don’t trust the government!”

Retrieved documents are:

WHO – Global COVID-19 Vaccination Strategy which addresses vaccine misinformation.

EMA – Comirnaty (Pfizer-BioNTech) COVID-19 Vaccine: EPAR which provides scientific details on vaccine composition (no microchips).

MoHFW – Guidelines for Adverse Events Following Immunization which Describes real side effects, not conspiracy claims.

With this, Fine tuned LLM evaluate this as Negative.

We compared our model with many other LLMs with fine tuning and with RAG separately. For comparison, we used Mistral 7B model, the Llama 2–7B model and BART. Table 5 depcited the evaluation metrics, including accuracy and F1 score. The evaluation was performed under 3 categories. First, the models are evaluated after fine tuning on the dataset. Second, the model was integrated with a RAG and tested with user prompts. Finally, the model was integrated with the RAFT framework. Table 5 shows that Llama 3 and Llama 2 performed better in fine tuning and RAG than the other models did. Within the RAFT framework, Llama 3 outperformed the other language models, which showcases the effectiveness among all evaluated models.

Table 5.

Performance comparison of the proposed model with other learning models using fine-tuning and retrieval-augmented generation techniques.

		TD Dataset		KG dataset
Model	Category	Accuracy	F1-Score	Accuracy	F1-Score
Llama2-7B	Fine Tuning	0.601	0.531	0.776	0.642
Mistral – 7 B		0.723	0.543	0.821	0.762
Llama3		0.834	0.801	0.834	0.801
BART		0.783	0.696	0.812	0.791
Llama2-7B	RAG	0.712	0.771	0.801	0.712
Mistral-7B		0.705	0.760	0.760	0.760
Llama3		0.795	0.768	0.806	0.800
Llama2-7B	Fine tuning with RAG	0.831	0.781	0.812	0.728
Mistral-7B		0.713	0.634	0.831	0.804
COVID RAFT		0.886	0.871	0.912	0.890

Since the framework was integrated with a quantized low-rank adapter for efficient fine tuning, we can train our model with an NVIDIA A40 GPU. For the evaluation of RAG, metric RAGAS (Shahul Es, 2023) was used. In particular, we make use of Context Precision (CP), Answer Relevance (Ansel), Answer Correctness(Ans correct), Answer Similarity(Ans_Sim), and Context Recall(CR). The cosine similarity is used to produce the Answer Relevance measure, which evaluates how relevant the generated response is to the query. Table 6 shows the evaluation of RAG with COVID RAFT framework.

Table 6.

Evaluation results of the retrieval-augmented generation framework.

Model	Domain	Dataset	RAGAS_metric	Score
COVID RAFT	COVID Vacccine	COVID Vaccine Tweet Datasets and External knowledge sources	Ans_Sim	0.892
			Ans_Correct	0.913
			Ans_Rel	0.791
			CR	0.812
			CP	0.928

Ans_Sim = $\frac{1}{n} \sum_{i = 1}^{n} c o s i n e_s i m (E m b_{a n s}, E m b_{q u e s})$ Where n denotes the number of generated responses

$E m b_{a n s}$ denotes the embedding of generated respons e $E m b_{q u e s}$ denotes the embedding of input query

Ans_correct = $α \times S e m a n t i c S i m + (1 - α) \times f a c t_s i m$ Where $f a c t_s i m$ -the overlap of factual data between the generated response and the ground trut h $S e m a n t i c S i m -$ The cosine similarity between the generated answer's embeddings and the ground truth

RAG shows less performance in all the models since it has only factual knowledge. While it encountered the tweet, it failed to obtain the context at appropriate. Figure 4 shows the loss curve of COVID RAFT.

Figure 4.

Comparison of training and validation loss curves for the Twitter_Indian and Twitter_Global datasets.

Plots of the model's performance over epochs reveal a moderate drop in loss and a little amount of overfitting following epoch 35. Figure 5 shows the RAFT framework results for the language models. The proposed COVID-19 RAFT framework outperformed the remaining models. Comparatively, the KG dataset yielded better performance than the TD dataset since the TD dataset had minimal language complexity. Since our model focused on a single language English, it skipped the code mixed data. To address code mixed data, a specific language model must be developed with additional low-resource language-related knowledge bases.

Figure 5.

Language model comparison with RAG.

5. Conclusion

This work presented COVID-19 RAFT, a framework that uses LLM fine-tuning for COVID-19 vaccine TSA on Twitter. In addition to standard fine-tuning data, COVID RAFT leverages a RAG method for incorporating domain-specific knowledge. External sources such as documents released by the WHO, the Ministry of Health and Family Welfare and literary documents focused on the COVID-19 vaccine were used as knowledge bases for the RAG framework. Fine tuning leveraged the KG and TD datasets, which focused on COVID-19 vaccine tweeting analysis. After developing a fine-tuned LLM that integrates SFTs with PEFTs, the model uses the embeddings from the Vector Database. Query tweet embeddings were generated and compared with vector DB embeddings via BERT-based semantic search rather than cosine similarity to enhance the contextual understanding of the COVID vaccine domain. We compared our approach with other language models in terms of fine-tuning, RAG and RAFT aspects. On the TD dataset, it achieved an accuracy of 0.886 and a robust F1 score of 0.871. The KG dataset yielded an accuracy of 0.912 and an F1 score of 0.89. The first approach used for the COVID-19 vaccine domain was leveraging domain knowledge for the sentiment analysis task. We achieved the greater accuracy (0.912) attained on the Twitter_Global dataset as opposed to the Twitter_Indian dataset (0.886). Language diversity, dialectal variances, or regionally distinctive expressions in Indian tweets that might not be as well-represented in the underlying knowledge base could be the cause of this disparity. Furthermore, worldwide tweets might have more uniform linguistic patterns, which would facilitate more precise classification. In Future, improving retrieval algorithms to better capture regional language nuances will helps to boost the accuracy. Moreover, this RAFT can be enhanced in future to check the fact and misinformation detection.

Footnotes

ORCID iDs

L Prasika

S Edward Rajan

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

This work presents a dataset of COVID-19 vaccine tweets related to both India and global regions. This dataset is available upon reasonable request from the corresponding author. Additionally, developers can collect similar data using the Twitter API, accessible through a Twitter developer account, available at

References

Hussain

Tahir

Hussain

, et al. Artificial intelligence-enabled analysis of public attitudes on Facebook and twitter toward COVID-19 vaccinations in the United Kingdom and the United States. J Med Internet Res 2021; 23: e26627.

Lyu

Han

Luli

. COVID-19 vaccine–related discussion on twitter: topic modeling and sentiment analysis. J Med Internet Res 2022; 24: e31726.

Brown

Mann

Ryder

, et al. Language models are few-shot learners. Adv Neural Inf Process Syst 2020; 33: 1877–1901.

Touvron

Martin

Stone

, et al. LLaMA: Open and efficient foundation language models. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2023. https://doi.org/10.48550/arXiv.2302.13971

Lee

Frieske

, et al. Survey of hallucination in natural language generation. ACM Comput Surv 2023; 55: 1–38.

Gao

, et al. Knowledge-enhanced retrieval-augmented generation. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023, pp. 2027–2039. https://doi.org/10.48550/arXiv.2305.13242

Lewis

Perez

Piktus

, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv Neural Inf Process Syst 2020; 33: 9459–9474.

Shen

Wallis

, et al. LoRA: low-rank adaptation of large language models. Adv Neural Inf Process Syst 2021; 34: 3457–3470.

Khine

Wettayaprasit

Duangsuwan

. A new word embedding model integrated with medical knowledge for deep learning-based sentiment classification. Artif Intell Med 2024; 148: 102758.

10.

Do Nascimento

Garcia

Araujo

RdA

. A word sense disambiguation method applied to natural language processing for the Portuguese language. IEEE Open J Comput Soc 2024; 5: 268–277.

11.

Mahmoudi

Jemielniak

Ciechanowski

. Assessing accuracy: a study of lexicon and rule-based packages in R and python for sentiment analysis. IEEE Access 2024; 12: 20169–20180.

12.

Kamal

Anwar

Sejwal

, et al. Bicapshate: attention to the linguistic context of hate via bidirectional capsules and hatebase. IEEE Trans Comput Soc Syst 2024; 11: 1781–1792.

13.

Xia

Wan

, et al. CDCAT: A multi-language cross-document entity and event co reference annotation tool, http://dh.fbk.eu/resources/cat-content-annotation-tool (n.d.).

14.

Tang

Liao

Shen

, et al. Confidence-aware sentiment quantification via sentiment perturbation modeling. IEEE Trans Affective Comput 2023; 15: 736–750.

15.

Khan

Ahmad

Khalid

, et al. Sentiment and context-aware hybrid DNN with attention for text sentiment classification. IEEE Access 2023; 11: 28162–28179.

16.

Xie

Chen

Lin

, et al. External knowledge document retrieval strategy based on intention-guided and meta-learning for task-oriented dialogues. Adv Eng Inf 2023; 56: 102020.

17.

Zhang

Zhu

Xie

. Fine-Grained sentiment analysis of cross-domain Chinese E-commerce texts based on SKEP-gram-CDNN. IEEE Access 2023; 11: 74058–74070.

18.

Liu

, et al. Graph augmentation networks based on dynamic sentiment knowledge and static external knowledge graphs for aspect-based sentiment analysis. Expert Syst Appl 2024; 251: 123981.

19.

Guo

Ding

Zhai

, et al. Leveraging domain knowledge to improve depression detection on Chinese social Media. IEEE Trans Comput Soc Syst 2023; 10: 1528–1536.

20.

Anshul

Pranav

Rehman

MZU

, et al. A multimodal framework for depression detection during COVID-19 via harvesting social Media. IEEE Trans Comput Soc Syst 2024; 11: 2872–2888.

21.

Araque

Gatti

Staiano

, et al. Depechemood : a bilingual emotion lexicon built through simple yet powerful techniques. Trans Affect Comput 2022; 13: 496–507.

22.

Lin

Liao

. Lexicon-based prompt for financial dimensional sentiment analysis. Expert Syst Appl 2024; 244: 122936.

23.

Chan

SWK

. Multilabel emotion tagging for domain-specific texts. IEEE Trans Comput Soc Syst 2022; 9: 1197–1210.

24.

Matrane

Benabbou

Banou

. Wevote: a weighted voting technique for automatic sentiment annotation of Moroccan dialect comments. IEEE Access 2024; 12: 16276–16298.

25.

Rizinski

Peshov

Mishev

, et al. Sentiment analysis in finance: from transformers back to eXplainable lexicons (XLex). IEEE Access 2024; 12: 7170–7198.

26.

Luo

Wang

. Identifying COVID-19 personal health mentions from tweets using masked attention model. IEEE Access 2022; 10: 59068–59077.

27.

Dai

. Imperfect vaccination evolutionary game incorporating individual social difference and subjective perception. IEEE Trans Comput Soc Syst 2024; 11: 2369–2382.

28.

Arias

Zambrano Nunez

Guerra-Adames

, et al. Sentiment analysis of public social Media as a tool for health-related topics. IEEE Access 2022; 10: 74850–74872.

29.

Chen

, et al. TeaBERT: an efficient knowledge infused cross-lingual language model for mapping Chinese medical entities to the unified medical language system. IEEE J Biomed Health Inform 2023; 27: 6029–6038.

30.

Wan

Lucic

Ghazzai

, et al. Topic modeling and progression of American digital news Media during the onset of the COVID-19 pandemic. IEEE Trans Technol Soc 2021; 3: 111–120.

31.

Razali

NAM

Malizan

Hasbullah

, et al. Political security threat prediction framework using hybrid lexicon-based approach and machine learning technique. IEEE Access 2023; 11: 17151–17164.

32.

Zhang

Wang

, et al. Predicting the sequence specificities of DNA-binding proteins by DNA fine-tuned language model with decaying learning rates. IEEE/ACM Trans Comput Biol Bioinf 2023; 20: 616–624.

33.

Chandra

Kulkarni

. Semantic and sentiment analysis of selected Bhagavad Gita translations using BERT-based language framework. IEEE Access 2022; 10: 21291–21315.

34.

Barrows

Haig

Conduit

. Sentiment and objectivity in Iranian state-sponsored propaganda on twitter. IEEE Trans Comput Soc Syst 2024; 11: 2359–2368.

35.

Goularte

Martins

, et al. SentPT: a customized solution for multi-genre sentiment analysis of Portuguese-language texts. Expert Syst Appl 2024; 245: 123075.

36.

Shah

SMAH

Shah

SFH

Ullah

, et al. Arabic sentiment analysis and sarcasm detection using probabilistic projections-based variational switch transformer. IEEE Access 2023; 11: 67865–67881.

37.

El-Alami

Ouatik El Alaoui

En Nahnahi

. A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model. J King Saud Univ Comput Inf Sci 2022; 34: 6048–6056.

38.

Essameldin

Ismail

Darwish

. An opinion mining approach to handle perspectivism and ambiguity: moving toward neutrosophic logic. IEEE Access 2022; 10: 63314–63328.

39.

Galal

Hassan Yousef

Zayed

, et al. Arabic sarcasm detection: an enhanced fine-tuned language model approach. Ain Shams Eng J 2024; 15: 102736.

40.

Kumbale

Singh

Poornalatha

, et al. BREE-HD: a transformer-based model to identify threats on twitter. IEEE Access 2023; 11: 67180–67190.

41.

Shunxiang

Aoqiang

Guangli

, et al. Building fake review detection model based on sentiment intensity and PU learning. IEEE Trans Neural Networks Learn Syst 2023; 34: 6926–6939.

42.

Chen

Huang

Wen

, et al. CAT: continual adapter tuning for aspect sentiment classification. Neurocomputing 2024; 580: 127423.

43.

Kumi

Jeong

. Data-driven automatic classification model for construction accident cases using natural language processing with hyperparameter tuning. Autom Constr 2024; 164: 105458.

44.

Durga

Godavarthi

. Deep-Sentiment: an effective deep sentiment analysis using a decision-based recurrent neural network (D-RNN). IEEE Access 2023; 11: 108433–108447.

45.

Zhang

Cai

, et al. Detecting dependency-related sentiment features for aspect-level sentiment classification. IEEE Trans Affect Comput 2023; 14: 196–210.

46.

Le Thi

Tran

Phan

. Deep learning using context vectors to identify implicit aspects. IEEE Access 2023; 11: 39385–39393.

47.

Gargiulo

Minutolo

Guarasci

, et al. An ELECTRA-based model for neural coreference resolution. IEEE Access 2022; 10: 75144–75157.

48.

Khiabani

Zubiaga

. Few-Shot learning for cross-target stance detection by aggregating multimodal embeddings. IEEE Trans Comput Soc Syst 2024; 11: 2081–2090.

49.

Xiao

Zhou

, et al. Collaborative fine-grained interaction learning for image–text sentiment analysis. Knowl Based Syst 2023; 279: 110951.

50.

Uveges

Ring

. HunEmBERT: a fine-tuned BERT-model for classifying sentiment and emotion in political communication. IEEE Access 2023; 11: 60267–60278.

51.

Kukkar

Mohana

Sharma

, et al. Improving sentiment analysis in social Media by handling lengthened words. IEEE Access 2023; 11: 9775–9788.

52.

Bianbadroma

Zhao

Wang

, et al. Multi-Level attention based co reference resolution with gated recurrent unit and convolutional neural networks. IEEE Access 2023; 11: 4895–4904.

53.

Majidzadeh

Ashtiani

Zakeri-Nasrabadi

. Multi-type requirements traceability prediction by code data augmentation and fine-tuning MS-CodeBERT. Computer Standards and Interfaces 2024; 90: 103850.

54.

Lee

Hsiang

. Patent classification by fine-tuning BERT language model. World Pat Inf 2020; 61: 101965.

55.

Kastrati

Imran

Daudpota

, et al. Soaring energy prices: understanding public engagement on twitter using sentiment analysis and topic modeling with transformers. IEEE Access 2023; 11: 26541–26553.

56.

Himdi

Assiri

. Tasaheel: an arabic automative textual analysis tool - all in one. IEEE Access 2023; 11: 139979–139992.

57.

Zhou

Liao

Gao

, et al. TopicBERT: a topic-enhanced neural language model fine-tuned for sentiment classification. IEEE Trans Neural Networks Learn Syst 2023; 34: 380–393.

58.

Jayasurya

Kumar

Singh

, et al. Analysis of public sentiment on COVID-19 vaccination using twitter. IEEE Trans Comput Soc Syst 2022; 9: 1101–1111.

59.

Cai

, et al. Exploring public sentiment during COVID-19: a cross country analysis. IEEE Trans Comput Soc Syst 2023; 10: 1083–1094.

60.

Almars

Almaliki

Noor

, et al. HANN: hybrid attention neural network for detecting COVID-19 related rumors. IEEE Access 2022; 10: 12334–12344.

61.

Oliveira

Haque

Mougouei

, et al. Investigating the emotional response to COVID-19 news on twitter: a topic modelling and emotion classification approach. IEEE Access 2022; 10: 16883–16897.

62.

Catelli

Pelosi

Comito

, et al. Lexicon-based sentiment analysis to detect opinions and attitude towards COVID-19 vaccines on twitter in Italy. Comput Biol Med 2023; 158: 106876.

63.

Faizah

Lin . Visualizing change and correlation of topics with LDA and agglomerative clustering on COVID-19 vaccine tweets. IEEE Access 2023; 11: 51647–51656.

64.

Peng

Guo

, et al. Repetitive action counting with hybrid temporal relation modeling. IEEE Trans Multimed 2025. DOI: 10.48550/arXiv.2412.07233

65.

Guo

Wang

. ViGT: Proposal-Free Video Grounding with a Learnable Token in the Transformer. Proc Int Conf Smart Comput Inf Sci (SCIS) 2023; 66(10).

66.

Wang

, et al. Wave former: wavelet transformer for noise-robust video inpainting. Proceedings of the AAAI Conference on Artificial Intelligence 2024; 38: 6180–6188.

67.

Shi

, et al. Prompt-aware controllable shadow removal. Axiver, https://arxiv.org/abs/2501.15043 (2025).

68.

Chen

, et al. BVINet: Unlocking Blind Video Inpainting with Zero Annotations, Axiver, (2025).

69.

Rao

, et al. Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition. In: in Proc. AAAI Conf. on Artificial Intelligence, 2025.

70.

Petroni

, et al. KILT: A benchmark for knowledge-intensive language tasks. In Proc. 2021 Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021.

71.

Guu

, et al. REALM: Retrieval-Augmented Language Model Pre-Training. In: Proc. 37th Int. Conf. on Machine Learning (ICML), 2020.

72.

Borgeaud

, et al. Improving language models by retrieving from trillions of tokens (RETRO). In Advances in Neural Information Processing Systems (NeurIPS), Vol. 35, 2022.

73.

Wadden

, et al. Fact or fiction: Verifying scientific claims. In Proc. 2020 Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2020.

74.

Liu

, et al. Pre-train prompt tuning for few-shot sentiment analysis. In Proc. 60th Annual Meeting of the Association for Computational Linguistics (ACL), 2022.

75.

, et al. LLaMA-reviewer: Advancing code review automation with large language models through parameter-efficient fine-tuning, http://arxiv.org/abs/2308.11148 (2023).

76.

Raiaan

MAK

Mukta

MSH

Fatema

, et al. A review on large language models: architectures, applications, taxonomies, open issues and challenges. IEEE Access 2024; 12: 26839–26874.

77.

Sheng

, et al. Agglutifit: efficient low-resource agglutinative language model fine-tuning. IEEE Access 2020; 8: 148489–148499.

78.

Liu

. Efficient utilization of pre-trained models: a review of sentiment analysis via prompt learning. Knowl Based Syst 2024; 283: 111148.

79.

Wahidur

RSM

Tashdeed

Kaur

, et al. Enhancing zero-shot crypto sentiment with fine-tuned language model and prompt engineering. IEEE Access 2024; 12: 10146–10159.

80.

Jiang

Zhang

, et al. EPlus-LLM: a large language model-based computing platform for automated building energy modeling. Appl Energy 2024; 367: 123431.

81.

Gao

Liu

, et al. FashionGPT: LLM instruction fine-tuning with multiple LoRA-adapter fusion. Knowl Based Syst 2024; 299: 112043.

82.

Tinn

Cheng

, et al. Fine-tuning large neural language models for biomedical natural language processing. Patterns 2023; 4: 100729.

83.

Sun

Zhang

Liu

, et al. Harnessing domain insights: a prompt knowledge tuning method for aspect-based sentiment analysis. Knowl Based Syst 2024; 298: 111975.

84.

Silva

Barbosa

. Improving dense retrieval models with LLM augmented data for dataset search. Knowl Based Syst 2024; 294: 111740.

85.

Cao

Wang

Gao

, et al. Dictprompt: comprehensive dictionary-integrated prompt tuning for pre-trained language model. Knowl Based Syst 2023; 273: 110605.

86.

Kao

. Masked siamese prompt tuning for few-shot natural language understanding. IEEE Trans Artif Intell 2024; 5: 624–633.

87.

Wang

Yang

Gao

, et al. Prompt tuning in code intelligence: an experimental evaluation. IEEE Trans Software Eng 2023; 49: 4869–4885.

88.

Guo

Qiu

Leroy

, et al. Retrieval augmentation of large language models for lay language generation. J Biomed Inform 2024; 149: 104580.

89.

Sun

Pan

Yang

, et al. STID-Prompt: prompt learning for sentiment-topic-importance detection in financial news. Knowl Based Syst 2024; 284: 111347.

90.

Vaid

Landi

Nadkarni

, et al. Using fine-tuned large language models to parse clinical notes in musculoskeletal pain disorders. The Lancet Digital Health 2023; 5: e855–e858.

91.

Roumeliotis

Tselikas

Nasiopoulos

. LLMs in e-commerce: a comparative analysis of GPT and LLaMA models in product review evaluation. Nat Lang Process J 2024; 6: 100056.

92.

Okey

Udo

Rosa

, et al. Investigating ChatGPT and cybersecurity: a perspective on topic modeling and sentiment analysis. Comput Secur 2023; 135: 103476.

93.

Ardekani

Bertz

Bryce

, et al.

FinSentGPT: a universal financial sentiment engine?

Int Rev Financ Anal 2024; 94: 103291.

94.

Liga

Robaldo

. Fine-tuning GPT-3 for legal rule classification. Comput Law Secur Rev 2023; 51: 105864.

95.

Latif

Zhai

. Fine-tuning ChatGPT for automatic scoring. Comput Educ Artif Intell 2024; 6: 100210.

96.

Preda

. All COVID-19 vaccines Tweets dataset. Kaggle, https://www.kaggle.com/datasets/gpreda/all-covid19-vaccines-tweets/data (2021).

97.

Reimers

Gurevych

. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. arXiv preprint arXiv:1908.10084, https://arxiv.org/abs/1908.10084 (2019).

A cognitive domain specific framework integrating large language model for COVID-19 vaccine sentiment analysis

Abstract

Keywords

1. Introduction

1.1 Background

1.2 Our contribution

2. Related work

2.1 Domain specific approaches

2.2 Transformer-based and deep learning-based approaches

3. Proposed COVID-19 vaccine textual sentiment analysis

Table 2. Details of the dataset. Dataset Quantity of tweets Quantity of Positive tweets Quantity of Negative tweets Quantity of Neutral Tweets Source TD 2,00,940 88,149 44,768 68,023 Crawled from X KG 2,28,207 84,025 35,885 1,08,297 Kaggle

3.2.1 Preprocessing the dataset

3.2.3 Creating a prompt template

3.2.5 Fine-tuning process

3.2.6 Naïve RAFT framework

3.2.6.1 External knowledge sources

3.2.6.3 RAFT retrieval

3.3 Algorithmic framework

4. Experimental results and discussions

Table 4. Hyperparameter configurations applied in the model training phase. Hyperparameter Value Learning rate 2 e − 4 Epoch 10 LoRA rank 16 LoRA Alpha 64 beta1 of adam 0.9 beta2 of adam 0.99 epsilon in adam 1.00 × 10−8 Batch Size 5 gradient accumulation steps 4

Footnotes

ORCID iDs

Ethical approval

Funding

Declaration of conflicting interests

Data availability statement

References

Table 2.
Details of the dataset.

Dataset Quantity of tweets Quantity of Positive tweets Quantity of Negative tweets Quantity of Neutral Tweets Source

TD 2,00,940 88,149 44,768 68,023 Crawled from X

KG 2,28,207 84,025 35,885 1,08,297 Kaggle

Table 4.
Hyperparameter configurations applied in the model training phase.

Hyperparameter Value

Learning rate 2 $e^{- 4}$

Epoch 10

LoRA rank 16

LoRA Alpha 64

beta1 of adam 0.9

beta2 of adam 0.99

epsilon in adam 1.00 × 10−8

Batch Size 5

gradient accumulation steps 4