Abstract
The use of social media is becoming increasingly important in our day-to-day activities. Platforms for social media are utilised on a daily basis, and it has been seen that young adults make regular use of social media, even while they are in the midst of an emergency scenario. For the purpose of communication, individuals, businesses, and governments all make use of various social media platforms. Through their efforts to establish communication with their loved ones who are residing in regions that have been impacted by disasters, a great number of people are demonstrating their profound concern for the well-being of those individuals. The individuals are looking for a variety of necessities, including food, help, pharmaceuticals, lodging, transportation, and other necessities. It is possible for telecommunication networks to experience a breakdown or become incapable of adequately accommodating a sudden spike in the number of users attempting to connect to the network during times of crisis. There is a widespread use of short messaging service (SMS) mobile text messages in the modern communication landscape. Platforms for social media websites that are accessible online have the potential to effectively regulate the flow of communication. The existence of social media networks that are technologically scalable makes it possible for this to be a feasible option. The usage of a platform that enables communication in both directions has the potential to outperform the efficiency of conventional channels that only transmit information in one direction, such as radio and television, when it comes to crisis situations. The proliferation of network technologies has resulted in an increased emphasis on the examination of the features of network components, the mitigation of the affects of these components, and the rapid restoration of operations in the event of disasters. It is possible to improve the efficiency, dependability, and participatory nature of emergency communication by making use of various social media platforms. Consequently, it is possible to make the observation that crises have become an integral part of the ecosystem of social media in the modern day.
Keywords
Introduction
We now consider social media to be an indispensable component of our day-to-day life. Facebook has approx-imately 2.92 billion users as of the end of January 2022, YouTube has approximately 2.6 billion users, WhatsApp has approximately 2 billion users, Instagram will have approximately 1.38 billion users, Twitter has approximately 229 million users, and LinkedIn has approximately 830 million users, according to data released by individ-ual platforms on their global user bases (with respect to India the user base for YouTube is 467 million, Facebook is 330 million, Twitter is 23.6, WhatsApp is 390 million and Instagram is 230 million). Not only do people use social media in their day-to-day lives, but they also use it in times of crisis and calamity. In their day-to-day lives, individuals, particularly young adults, utilise social media on a consistent or frequent basis, as stated by Jin et al. (2014) [29]. When there is a crisis, whether it is natural or caused by humans, its use increases. When it comes to communicating with the general public, individuals, corporations, and government agencies all make use of social media. Those who are concerned about their family members and friends who live in areas that have been impacted by natural disasters make an effort to communicate with them and convey their worries about their condition and safety. The procedure of finding or exchanging information pertaining to numerous necessities, such as sustenance, accommodation, transportation, and additional necessities, is currently being carried out by individuals. In times of crisis, it has been demonstrated that when telecommunication networks see a rapid spike, the network either fails or is unable to handle the surge, which consists of thousands of persons attempting to join at the same time. The same is true for short messaging service (SMS) text messages. The use of online social media, on the other hand, can fulfil this function and help manage an increase in the volume of communications. Due to the fact that these social media platforms are expandable from a technology standpoint, this is the situation. Due to the fact that the platform allows for contact in both directions, it has the potential to overcome the barrier that is created by communication channels that only allow for one-way communication, such as radio and television, in the event of an emergency. The stoppage or disconnection of electricity, particularly in the context of natural catastrophes, can result in the disruption of radio and television services. This is especially true in the case of catastrophic events. It is possible for users of social media platforms to engage in information sharing in real time by utilising a variety of devices, including smartphones, tablets, and other technological gadgets that are otherwise comparable [12]. People have the opportunity to engage in social contacts with their friends and family in order to discuss issues that pertain to their health and locations by engaging in social interactions. In the event of a crisis, it is essential to remain cognizant of the fact that every single person possesses the capability to act as a useful source of vital information, given that they are able to collect and disseminate information [2]. Since this is the case, one may claim that the utilisation of online social media platforms is of greater significance in the context of emergency and crisis situations [17]. A prominent catalyst that has been highlighted as enabling individuals to quickly access social media sites is the widespread use of smartphones, which has been cited as a crucial catalyst. Over the course of the past few years, Twitter has emerged as a key venue for the dissemination of a wide variety of information, including updates on fatalities and devastation, notifications, and fundraising initiatives. Furthermore, Twitter makes it easier to share multimedia content, which includes media such as movies and photographs [4, 8]. The many social media platforms act as a never-ending source of information that has the potential to be of assistance in the field of emergency management [10, 26]. In recent times, researchers have moved their attention to the examination of the characteristics of network components and the development of ways to reduce the impact of such components while simultaneously increasing the rate at which they recover from severe accidents [21, 22]. Through its inherent capacity, reliability, and interactive nature, social media has emerged as a potential and quickly developing medium of communication, enabling better capacities for emergency communication [6]. This is due to the fact that social media already possesses these characteristics. The consequence of this is that crises have turned into an unavoidable, intrinsic, and everlasting component of the contemporary social media ecosystem, which has resulted in a reduction in the rarity and randomness of the phenomenon.
Related work
A comprehensive research on the use of social media in crisis management has been carried out. This investigation includes a comprehensive analysis of scholarly works that have been published as well as those that are available to the public. Social media platforms are utilised by users in order to convey their opinions or thoughts online. The categorization of texts into positive, negative, or neutral categories is the primary idea that underpins the field of emotion analysis [30]. In addition, sentiment analysis and opinion mining are both similar terms that are used to refer to emotion analysis. Lexicon-based strategies entail calculating the orientation of texts based on the semantic orientation of the words or phrases contained within the document [11]. It provides a quantitative analysis of the frequency with which particular types of words are utilised in a wide range of written works, such as emails, speeches, poetry, and everyday conversations. The construction of a lexicon dictionary is the first step in the process of using a lexicon-based classifier, which was utilised by Wang et al. (2014) [31]. Linguistic Inquiry and Word Count (LIWC) was utilised by the authors in order to construct an improved sense classification method for the purpose of identifying irregularities in the sentiment patterns of social media [13]. First, they separated the tweets into their respective categories using eight annotators who were in agreement in eighty percent of the cases. Following that, the authors utilised their methodology to carry out automated categorization. The use of social media in the aftermath of natural catastrophes was the subject of research conducted by Velev and Zlateva (2012) [19]. During times of crisis, they discovered elements of social media platforms that encourage people to use them, particularly in situations where traditional ways of communication are rendered ineffective. Following the classification of the ways in which users communicate on social media during a crisis, the authors made suggestions about the ways in which social media may be of assistance to crisis management in terms of elevating the dissemination of information, improving planning, and reducing the impact of the disaster. The purpose of the review that Dufty et al. (2016) [5] carried out was to determine the principal application of Twitter in disaster management over the course of the past ten years by analysing the published literature in Australia Elsevier. Twitter is mostly utilised in the field of disaster management as an additional communication channel, for the purpose of mapping the situation for feedback, comprehending the spirit of those who have been affected, and delivering real-time information between the community and emergency management. Using the Circumplex model, Hassan et al. [7] were able to determine the emotions that were present in Twitter interactions. The Circumplex model is able to record emotional experience along two dimensions, namely valence and arousal. Linguistic Inquiry and WordCount (LIWC3) is the source of the terms that are used in the compilation of an emotional lexicon dictionary by the writers. They used the features that they retrieved, which included unigrams, emoticons, negations, and punctuation, to build supervised machine learning classifiers. They achieved a percentage of accuracy of eighty percent on tweets. The investigators in [9] conducted an inquiry that was very comparable. Through the use of Twitter, the research investigates how people’s feelings have changed in response to the COVID-19 outbreak. A random sample of 18,000 tweets is evaluated to determine whether they include positive or negative sentiment, as well as eight different emotions: anger, anticipation, disgust, fear, joy, sorrow, surprise, and trust. Both positive and negative sentiment are analysed. As a result of the fact that the majority of tweets contained both fearful and reassuring wording, the statistics suggested that there was approximately the same amount of positive and negative sentiments. Palen [26] provided a detailed account of how individuals made use of social media in order to acquire information that was more specific and localised on the wildfires. This was in contrast to the information that was available through the formal channels, which was more general and might be deceptive. Using a variety of distinct terms, Kankanamge et al. and Yigitcanlar et al. [1, 16] conducted a search and analysis of social data. The primary terms that are investigated in this research are “earthquakes and social media,” “floods and social media,” “tsunamis and social me-dia,” “landslides and social media,” “cyclones and social media,” “natural disasters and social media,” “volcanic eruptions and social media,” “natural hazard and social media,” “man-made disaster and social media,” and other such phrases. Within the realm of social data, Anita S. et al. (2020) [3] classified it into three fundamental categories: location, emotion, and tweet time. They went into more depth in each area and utilised a variety of methods in order to extract information that was pertinent and actionable. An approach to the categorization of tweets and the detection of their locations was proposed by Singh et al. (2017) [18] with the purpose of recognising tweets sent by victims of natural disasters requesting aid, as well as their location.
Text mining requires a number of tasks, one of which is information extraction. The overarching goal of information extraction is to discover structured data within material that is either unstructured or semi-structured. Among the essential procedures that make up information extraction, entity recognition and connection extraction are two of the most important [27]. In their study, Vieweg et al. (2010) [23] investigated the manual identification of relevant information during emergency situations. They focused on the utilisation of geo-locations, location referencing, and situational updates to achieve greater precision. After performing a comprehensive investigation into two different emergency situations, they came up with a set of criteria that can be used to any information extraction (IE) method in order to determine which tweets are significant. Chatfield et al. [28] investigated the pattern of information exchange that occurred between affected citizens and institutions affiliated with the government. The research also highlights the benefits of employing social media in disaster response, which are beneficial not only for individuals who have been affected by the crisis but also for government institutions. The support vector machine (SVM) is a technique for supervised machine learning that is utilised for classification and regression analysis. In terms of text categorization, SVM has consistently demonstrated superior performance to all other learning algorithms [24]. In their study [25], Joachims and colleagues demonstrated that support vector machines (SVM) eliminate the need for feature dimension reduction and offer an auto-tuning property that is ideal for text categorization using their method. A hybrid technique that incorporates Chi-square and point-wise mutual information was proposed by Sharma et al. [14] as a means of selecting the most appropriate linguistic qualities. The researchers Verma et al. [15] built a model for sentiment analysis that was able to automatically identify and categorise tweets concerning situational awareness into a number of different categories. These categories included subjectivity, formal and informal language content, and personal or impersonal style.
Crisis related categories.
Capturing pertinent information regarding the disaster and deciphering the several other interdependent components, such as location, intensity, accessibility, and stress level, is the most critical aspect. For this study, however, we will simply focus on effectively obtaining and classifying social data and pinpointing the specific location of afflicted individuals. During such situations, users typically publish or develop their own hashtags (#) and other impacted individuals begin using or referring to the same. During such instances, social media platforms such as Twitter are useful for people to post disaster-related hashtags to reach everyone involved in the rescue operation.
In the context of crises, encompassing both natural and man-made disasters, there exist prevalent challenges that necessitate significant attention during and after the crisis duration. The identification and timely resolution of these difficulties are imperative in order to mitigate their impact and minimize any potential long-term consequences. The primary focus of these efforts primarily centres on addressing the fundamental necessities of individuals impacted by the situation. These necessities encompass several elements such as sustenance, housing, access to clean water, urgent medical care, and rescue operations, among others. The Streaming Twitter API is utilized to acquire a portion of these tweets. There is no way to acquire historical data using the Streaming Twitter API, as Twitter only permits the collection of data from the previous seven days. The remaining data are gathered using a third-party technology named “Modern Research” created by “Sprinklr Inc.”
List of crisis categories and associated keywords (specific to this study)
Summary of categories and associated keywords
Summary of categories and associated keywords
A magnitude 7.8 earthquake affected southern and central
Summary of crises and the volume of tweets processed for this study
Summary of crises and the volume of tweets processed for this study
Turkey earthquake: Social mentions.
In the realm of social media, Twitter proves to be advantageous during times of crisis due to its provision of unrestricted data accessibility compared to other social platforms. This below section presents an analysis of data pertaining to the share of voice, sentiment, word cloud, and peak time for the Turkey Earthquake, Joshimath Landslide, and 2022 Silchar flood in Assam.
Turkey sentiment.
Turkey earthquake: Word cloud.
Turkey quake chatter trendline.
Joshimath landslide: Social mentions.
Joshimath sentiment.
Joshimath landslide: word cloud.
Joshimath chatter trendline.
Silchar flood: Social mentions.
Silchar flood sentiment dist.
Silchar flood: Word cloud.
Silchar flood chatter trendline.
The utilisation of social media data is of significant importance during times of crisis, owing to its inherent characteristics that facilitate the rapid dissemination of information and the subsequent escalation of the crisis, hence prompting the implementation of necessary measures. The proposed data flow involves a fundamental process of picking Twitter data exclusively from the vast array of social media platforms and subsequently cleansing the data to enable machine algorithms to produce optimal and highly precise outcomes. The process of transforming a basic tweet into a valuable and implementable insight is elucidated. In the processing stage, the data is subjected to the application of rules and linguistic features in order to achieve accurate classification. Subsequently, the data is used as input to the machine learning algorithm, specifically the Simple Vector Machine (SVM), with the aim of attaining optimal accuracy and generating actionable insights. The ultimate result can be disseminated as notifications to various stakeholders or authorities in order to initiate subsequent measures pertaining to assistance, support, rescue, and related matters.
Proposed hybrid model using simple vector machine (SVM).
A technique known as keyword filtering is utilised in the model that has been suggested for the purpose of finding and analysing Twitter messages that include pertinent information. Initially, the tweets are filtered based on terms that had references to being in danger or being stranded. Those tweets that contain the keywords “earthquake”, “flood”, “landslide”,” fire”, “death”, “stranded”, “help”, “medics”, “water”, “shelter” and “caught” are selected for retention. In addition, a few more extra phrases that are frequently encountered on the internet, such as risk, hazard, danger, evacuation, and critical, are included. The list of keywords that has been compiled using the terms that have been given above is referred to as “Crisis Keywords.” The total number of tweets that contain the trapped keyword list is 187,499 out of total 232,862 tweets after being filtered.
Part-of-speech (POS) tagging is a linguistic activity that involves assigning the correct grammatical category to each word within a given sentence. The components of language known as parts of speech encompass nouns, verbs, adverbs, adjectives, pronouns, conjunctions, and their respective subcategories. The majority of part-of-speech (POS) tagging methods may be categorized into three main approaches: rule-based POS tagging, stochastic POS tagging, and transformation-based tagging. In this study report, Rule Base POS tagging is being employed. There exist several POS rules that can be employed for the processing of tweets, such as the Penn Treebank, Maximum Entropy, and Hidden Markov Models. The Penn Treebank POS model is employed in our study to classify the data into 36 distinct parts-of-speech labels.
The Penn Treebank POS tagging rule is designed to substitute the personal pronoun (PRP) with the “@” symbol, creating a collection of antecedents. When these antecedents are then followed by any crisis-related terms, they are categorized as positive. This is accompanied by an additional regulation wherein all crisis-related keywords are substituted with the sign %, followed by prepositions or secondary conjunctions. The tweet is categorized as positive, whereas all other tweets that do not meet the aforementioned criteria are categorized as negative.
NER Tagging
NER algorithms employ statistical models to comprehend the semantic and contextual aspects of words. Knowledge graphs serve to reinforce the connections between things and facilitate a comprehensive comprehension of the underlying material. The importance of named entity recognition (NER) in sentiment analysis cannot be overstated due to its significant capabilities in this field.
The inclusion of additional information such as the names of individuals, locations, time, dates, and objects is of utmost significance. This data facilitates the identification and classification of entities into distinct categories.
NER data tagging flow.
Example NER tagging with data and type.
During times of crisis, individuals who have been impacted often experience amplified levels of distress, leading to a state of panic. Consequently, the information they disseminate may lack sufficient depth or adhere to established formatting conventions. The challenge lies in comprehending the nature of limited information, which poses a difficulty in drawing conclusive findings from the existing data. In order to address the difficulty at hand, the utilisation of linguistic features has the potential to enhance the semantic value of the data. There are both statistical and linguistic features, however, several research have indicated that linguistic features yield superior accuracy in the classification of textual data [31]. The linguistic strategies employed in this study include the utilisation of bag of words, N-grams, lexicons for emotions, negations, and feelings. In this study, we are utilising the bag of words approach.
SVM – Machine learning algorithm for text classification
There are numerous machine learning approaches that are employed for the purpose of text categorization. Some examples of these techniques include the Naïve Bayes Classifier, K-Neighbors, Decision Tree, and Support Vector Machine (SVM). The Support Vector Machine, often known as SVM, is a method of supervised machine learning that is utilised for the purposes of classification as well as regression. For the purpose of assigning categories to a particular dataset, the text classification approach makes use of the Support Vector Machine (SVM) algorithm. The best way to accomplish this is to locate the hyperplane or boundary line that is most suitable for successfully dividing the text data into the groups that have been determined beforehand. The Support Vector Machine (SVM) technique generates a large number of hyperplanes with the intention of locating the hyperplane that is the most successful in efficiently separating the two groups. Choosing the hyperplane that maximises the distance from the data points that belong to each class is the method that is used to determine which hyperplane is the best. Support vectors, which are also referred to as vectors or data points that are located in close vicinity to the hyperplane, have a substantial impact on the position and distance of the optimal hyperplane. The Support Vector Machine (SVM) algorithm makes use of supervised learning in order to determine and visualise a hyperplane that efficiently differentiates between positive and negative text samples by increasing the margins.
Results and discussion
In the context of a Support Vector Machine (SVM), the incorporation of Part-of-Speech (POS) and linguistic elements often entails their utilization alongside the conventional input features. The conventional formulation of Support Vector Machines (SVM) primarily emphasises the optimisation of a decision boundary using input characteristics. However, it is possible to enhance this approach by including linguistic data into the feature vector. Let us contemplate a straightforward scenario in which we enhance the input feature vector by incorporating part-of-speech (POS) and linguistic information. Given a set of
The variable xi is the original input feature vector. The variable The variable
The support vector machine (SVM) formulation using these enhanced features closely resembles the conventional SVM formulation but takes into account the expanded feature vector.
The equation can be rewritten as follows:
Where:
The decision function for the Support Vector Machine (SVM) with the augmented feature vector is denoted as The variable The Lagrange multipliers obtained during training are denoted by The corresponding class labels are denoted by
The variable
In our work, we implemented the aforementioned formula by utilising a pre-processed dataset that includes Part of Speech (POS) and linguistic elements. This approach was employed to enhance the accuracy and categorization of the final outcome. An overview of the data volume processed in the study is recorded in Table 3. It indicates a significant presence of negative sentiment, which serves as the corpus for the study and serves as input for the machine learning algorithms. Tweets expressing good feelings are omitted from the analysis due to their major focus on expressing gratitude for the prompt assistance, rescue efforts, and medical care rendered during the disaster. Based on the aforementioned three crises and their corresponding percentages of negative sentiment, it can be inferred that flood and landslip incidents yield a relatively smaller volume of data, yet exhibit a higher proportion of unfavorable attitudes compared to earthquake data.
Summary of data used after processing and sanity
Summary of data used after processing and sanity
The aforementioned data (Table 3) is classified based on the sentiment that was identified during training. It states that when there is a crisis, a large number of people mention phrases that are, by their very nature, negative. This is the reason why eighty percent of tweets are classed as negative.
During the process of evaluating a specific dataset that is associated with crises, the approach specifies a collection of rules and algorithms that are designed to reach the maximum possible level of accuracy and efficiency. This is one of the goals of the technique. In order to carry out an analysis on the data that was previously presented, three distinct approaches were utilised. These approaches included the Decision Tree, Support Vector Machine (SVM), K-Neighbors Classifier, Random Forest (RF) and AdaBoost.
Summary of benchmarking of result before applying POS, LING
Comparison of benchmarking data for algorithms.
The data training and testing ration is kept at 80:20 for all algorithms. To begin, the data is fed into a variety of algorithms, and the outcomes are documented in the manner described in Table 3. Prior to the application of POS, LING, and POS+LING combined, this body of information is referred to as the benchmark data for the study. A greater emphasis is placed in the study on performance indicators such as accuracy, precision, recall, and F1-score in particular.
Several iterations have been carried out, during which the values of the hyperparameters for each method have been repeatedly altered. Adjustments are made to the hyperparameters of each and every algorithm in order to carry out the research, and the results are recorded. The optimal settings that are selected to achieve better results and performance from the algorithms are explained in Table 4, which is helpful in understanding such values.
Random Forest (RF) and AdaBoost achieved higher recall results for the dataset that was provided, whereas Support Vector Machines (SVM) and Random Forest demonstrate the better results accuracy.
The Support Vector Machine (SVM) technique outperforms other techniques in terms of precision, recall, F1-score, and accuracy. This implies that the SVM technique is better to other techniques in terms of overall performance.
Summary of hyperparameters and their optimal values for various algorithms used for this study
In addition to this, an analysis is performed on the two kernels that make up the Support Vector Machine (SVM), which are the linear kernel and the Radial Bias Function (RBF) kernel respectively. A comparison of the results obtained by linear and radial basis function (RBF) kernels across all possible combinations reveals that the linear support vector machine (SVM) kernel has higher performance when it is applied to the rules.
Adjustments were made to the weight and the number of neighbours in the neighbourhood in order to determine how effective the method was. During the course of this investigation, an investigation was carried out to assess the degree to which the K-Nearest Neighbours (KNN) algorithm and its weighted counterpart, Weighted K-Nearest Neighbours (Weighted KNN), are able to successfully carry out their respective responsibilities. According to the findings of the study, it was concluded that the performance of the weighted K-nearest neighbours (KNN) approach is superior to that of the normal KNN algorithm. This conclusion was reached according to the findings of the research. When applied to the context of weighted K-nearest neighbours (KNN), it was revealed through our studies that the hyperparameter K should be set to
For the further discussion and comparison of the results, we have taken KNN, Decision Tree, SVM Linear and Random Forest forward and performed the further analysis.
When the clean and sanitised corpus supplied to KNN, Decision Tree, SVM and Random Forest the results were improved in comparison with benchmark details. The precision, recall and F1-Score recorded separately after each execution. Table 5 list down the readings of precision, recall and F1-Score for the algorithms. The performance seems to be enhanced for all algorithms but significant for SVM linear, although the recall value for Random Forest is the highest to the given dataset but the overall change on F1-Score and accuracy is not substantial as compare to SVM algorithm. Below are the mathematical formulae used to calculate the precision, recall and F1-Score.
Summary of results obtained for classifiers K-Neighbors, decision tree, SVM and random forest
Summary of results obtained for classifiers K-Neighbors, decision tree, SVM and random forest
P: Precision, R: Recall, F1: F1-score.
Summary of results obtained for classifiers K-Neighbors, decision tree, SVM and random forest
Summary of results obtained for classifiers K-Neighbors, decision tree, SVM and random forest
P: Precision, R: Recall, F1: F1-score.
Comparison between benchmarking data and the POS results for algorithms.
Summary of results obtained for classifiers K-Neighbors, decision tree, SVM and random forest
Summary of results obtained for classifiers K-Neighbors, decision tree, SVM and random forest
P: Precision, R: Recall, F1: F1-score.
Comparison between benchmarking data and the LING results for algorithms.
Comparison between benchmarking data and the Combined POS
Summary of results obtained for classifiers K-Neighbors, decision tree, SVM and random forest
P: Precision, R: Recall, F1: F1-score.
Table 9 illustrates the performance of the technique with and without the hybrid SVM approach for various algorithms and the comparison of the performance. The outcomes exhibit more potential and precision when the data is subjected to rules and a linguistic approach prior to being fed into the SVM algorithm for classification.
Comparison of ML benchmarking with combined approach.
The hybrid configuration makes use of a sequential combination of features that are derived from rules and an approach that is based on machine learning. When doing so, it is important to take into account the myriad of underlying mechanisms that are associated with each of these two categories of characteristics. The outcomes of the rule-based technique are utilised as input for an algorithm that is founded on machine learning in order to accomplish the task of text classification. Within the framework of the rule-based methodology, a collection of documents is regarded as a mining field, and patterns, which are referred to as antecedents, are extracted from the field. The patterns are generated by making use of a collection of templates that have been determined in advance. This is accomplished through the analysis of a significant number of texts that are classified as belonging to the same category. In order to identify the unique categories that are present throughout the entirety of the corpus, the rule-based classifier takes use of pattern sets during the classification process. The support vector machine (SVM) classifier and other machine learning algorithms, on the other hand, consider the document collection to be a collection of distinct properties. This is in contradiction to the declaration that was made earlier. Following the process of assigning weights to the individual features, the entire document is then represented as a set of features on the representation of the document. Whenever a training set is supplied to the Support Vector Machine (SVM) as input, the machine learning algorithm works towards the objective of optimising the weights and other parameters, such as C and Gamma, in order to produce a model that is optimal. In order to accomplish the task of classifying a collection of data that has not been observed, the classifier makes use of this model in order to accomplish the task. The combination of a rule-based method and a support vector machine (SVM) classifier has been found to produce improved results. This is because it permits the simultaneous processing and analysis of texts using a variety of methodologies, which helps to explain why this is the case.
In the event of a catastrophe, the hybrid method is utilised to carry out a methodical approach to the identification of individuals who are confined and fighting for their lives. This is done in order to ensure that the individuals are successfully rescued. For the purpose of training and testing this approach, disaster data that is obtained from Twitter is used as the basis. For the objective of isolating the text from the afflicted population, the text classification system makes use of a hybrid technique that blends rule-based methodology with an algorithm for machine learning.
The study that has been done thus far indicates that the information collected through social media is of considerable relevance to those who are responsible for providing emergency services. In the event of any kind of catastrophe, it is absolutely necessary for those members of the emergency response team to have a well-defined operational procedure in place. Because of this, they have a better awareness of the situation presently. In the event of any crisis, the people who are directly affected are typically the first responders who assist those in the near vicinity while simultaneously safeguarding themselves. In addition to this, they offer assistance to the rescue workers in both a direct and indirect manner. There is a plethora of information available on social media sites that can be used to organize disaster response and recovery efforts. For the goal of sharing their experiences, explaining their emotions, talking with other people, and expressing their displeasure with the absence of basic necessities, an increasing number of individuals are turning to social media during and after any crisis. These large amounts of data have become more manageable as a result of the development of a number of advanced technologies, such as big data and parallel programming. Despite the fact that these data have gotten more manageable, there are still a number of challenges that must be addressed in order to make use of them during disaster response and recovery. These hurdles include the difficulty to classify the data into relevant categories and the inability to trust the data. Because of the nature of social media, it is not required for all of the people who submitted these data to be from the people who were affected. There are numerous situations in which other persons who are worried about the people who are affected participate in an extensive conversation about the crisis scenario on social media. When talks like this take place, unsolicited text messages are generated. These messages need to be vetted and discarded, since they could otherwise cause confusion in the relief operations. In the process of isolating the texts from the individuals who were affected, the suggested method categorises the data that is positive as positive and the other data as negative. The application of the combined rules with the bag of words and the support vector machine (SVM) leads in increased classification accuracy in the hybrid combinations that have been proposed. As far as the hybrid technique that has been developed is concerned, the most important addition that it makes is the separation of the text from the persons who are affected by the disaster and are attempting to survive.
Despite the fact that there are a great number of studies that focus on analysing the emotions that people experience during a disaster, our research is distinct from those studies because it uses a method known as sentiment analysis to differentiate the text messages that are sent by those who are trapped from those that are sent by those who are not trapped. Particularly, standards are being proposed to classify tweets as either positive or negative, which will assist rescue workers in discovering and rescuing individuals who are on the verge of becoming stranded. During the course of this inquiry, one of the most significant challenges that was encountered was the lack of sufficient data in the positive category. The accuracy of the classification that is performed by the machine learning algorithm is impacted by the fact that there is a mismatch in the number of texts that belong to the positive and negative categories. Despite the fact that the positive category texts are proportionately fewer, it is essential to separate them because they not only serve to save the lives of those who have been affected, but they also provide comfort to those who have been fighting for a considerable amount of time without food and water. This is why it is essential to separate them. Despite the fact that there are challenges associated with the processing of the data, the research that was recommended had the potential to be considered a pilot study. There is a possibility that some of the work that will be done in the future will be focused on the establishment of a dataset by combining the data from social media for a variety of disasters. This dataset will include a sizeable amount of data that is indicative of positive outcomes. In addition, the rules that have been proposed can be enlarged depending on the positive data that has been acquired, which has the potential to improve the classification’s accuracy. This is something that has been recommended. Furthermore, a comprehensive study on each category of the fundamental requirements of the people will be of assistance in the process of effective classification, which will, in turn, make the work of those who respond to emergencies easier. This will be the case because the essential requirements of the people will be considered. There are a number of challenges that are involved with text categorization; nonetheless, the hybrid model that we have provided can be utilised as a foundation for the purpose of distinguishing the real-time tweets from the folks who are concerned about their safety. It is true that this assists the rescue teams in discovering and identifying the people who are trapped, which has the effect of minimising the number of victims at the scene of the disaster while simultaneously raising the confidence of those who have been impacted.
The scope can be expanded by using additional data sources in addition to Twitter, such as blogs, forums, and other social media platforms. This has the potential to assist in the extension of the robust feature extraction and classification that is capable of supporting greater data in terms of length (at the moment, Twitter allows for a maximum of 254 characters in a single tweet).
It is possible to broaden the scope of the study to include information linked to images, which is also an essential component of the data sharing that occurs in social media. The study and the execution are carried out on textual data and its classification. Image feature extraction is a technology that has the potential to deliver more accurate and true information regarding the issue and the severity of it. Not only is the extraction of spatiotemporal and localising information difficult with the dataset that is currently available, but the extraction of image features is also something that can be addressed to some degree.
Additionally, the research primarily supports the use of English as a language, and it is possible to extend its scope to include other languages by employing natural language processing techniques and language learning models (LLM) that have been created at a more recent time.
Conclusion
Using real-time text categorization and analysis, this study proposes a hybrid strategy to identifying individuals who are vulnerable during and after a disaster. The approach is capable of detecting vulnerable persons in both situations. The classification of texts is accomplished through the utilisation of a rule-based strategy, linguistic characteristics, and an algorithm for machine learning using the hybrid approach. In this study, a number of different implementations of the hybrid technique are investigated in order to identify the ideal combination that results in the highest possible level of accuracy when it comes to the task of text classification. The effectiveness of the text categorization system has been found to be improved through the utilisation of rules, linguistic features, and linear support vector machines, as indicated by the results of laboratory experiments. The real-time hybrid methodology that has been suggested collects data from social media platforms and classifies the textual content that originates from individuals who have been affected by a disaster. During times of calamity, the consequences of the technique that was recommended have the potential to reduce the number of casualties that occur through the monitoring of social media platforms. The findings that are achieved by the strategy that was presented are beneficial for individuals who are responding to emergencies in their efforts to rescue people who are in danger.
