Using community information for natural disaster alerts

Abstract

Recently, the ceaseless rise in the global average temperature has led to extreme climates in which natural disasters, such as droughts, hurricanes, earthquakes and floods, are becoming increasingly serious. Recent research has found that social media typically reflects disasters earlier than official communication channels. In this study, the idea of collecting information on flood disasters caused during the periods of typhoons and heavy rains for a city from the plain text messages released by social media by means of a term frequency (TF) and sliding window approach is proposed. The dataset analysed here contains a total of 292 articles and 12,484 tweets. This research determines how to establish a warning mechanism, with an added notification time for flooding disasters, and it shows how to provide relevant disaster relief personnel with references. This article contributes by combining social media data with emergency management information cloud (EMIC) data, especially in the context of having a mechanism for warning about flooding disasters. According to the experimental results, a sliding window of 90 min and a sliding gap of 10 min obtained the best F-measure value (F = 0.315). The event studied was Typhoon Megi (September 2016), which caused major flooding in Tainan. For the Typhoon Megi event, the flood disaster location database had 161 streets available for matching. Based on the experimental results, it is possible to obtain a high-precision (90% or higher) accuracy rate from real-time tweet data by exploiting a social media dataset.

Keywords

Data mining disaster text mining floods time-series analysis

1. Introduction

In recent years, due to the impact of global climate change, the frequency of extreme weather and natural disasters has increased. Social media can play a critical role in propagating disaster information that can be leveraged for disaster relief [1]. Increasing research literature has solved crisis problems through social media information [2]. Among them, the application of social media in natural disasters has begun to attract the attention of researchers [3]. Social media plays an important role in coordinating relief [4]. With the popularity of smartphones, people can post microblogs, images and videos on social media sites in real time. This information can provide people with more useful instant information, especially in relation to natural disasters such as earthquakes, floods and tornadoes [5]. For example, in the Haiti earthquake on 12 January 2010, people posted text and photos of earthquake experiences through social media sites [1]. Therefore, increasing technologies, such as crowdsourcing, are applied to flood disaster risk reduction (DRR). In addition, social media services, such as Twitter and Facebook, are often used for flood, earthquake and fire applications [6].

Today, social media has become an important channel for information dissemination, even replacing the role of traditional media. Elsewhere, social media can also support raising awareness and coordinating relief efforts such as in earthquakes [7], flood [4,5], and droughts [8].

Droughts have the greatest impact among the world’s natural disasters such as floods and storms [6]. An example is Hurricane Sandy that occurred on 22–29 October 2012 with at least 147 direct deaths, with initial losses estimated to be close to US$50 billion, making Sandy the second most costly hurricane since 1900 [9]. Another example is the Typhoon Haiyan on 8 November 2013, which caused serious damage to the infrastructure of the Philippines, causing heavy casualties [10]. If the reaction to these disasters can be activated as early as possible, more people are likely to be rescued.

Through the social media platform, users can understand the behaviour of other users and the content they are currently publishing, and users repeatedly return relevant information and see what others are doing. In other words, an important characteristic of social media is its real-time nature, and it is increasingly used for emergency communications and help-related requests [11].

Scientific data on earthquakes, floods and other phenomena can increase user data integration models to predict the timing and location of future requests and requirements [1]. Deep learning technology can be used to classify the topic of upcoming events [12], and it can also be applied to image classification from social media pictures [13]. However, during a flood event, the location of the flood event is often mentioned in social media messages, but only approximately 1% of tweets in social media contain a geographic tag [5]. To improve this problem, this study will extract the flooding locations on the streets from the text content. In this article, three modules were designed. First, corpora were crawled from a large number of tweets related to the target events during flooding disasters. Second, the term frequency (TF, the number of times a word appears in a document) and time-series sliding window method were used to extract the event attributes and event locations. Finally, this research developed a method for establishing a warning mechanism that extracts notification times for flooding disasters from tweets.

The first module extracts the tweets related to the target event collected by event-related terms. The proposed method applies semantic analysis to analyse tweet content. For example, users might make tweets such as ‘is there flooding on Yunong Road and Yongda Road ?’ (裕農路跟永大路有淹水嗎) or ‘extremely heavy rain causing flooding of four thousand homes’ 豪雨成災台南逾四千戶淹水) for which ‘flooding’ (淹水) or ‘extremely heavy rain’ (豪雨) could be keywords, and ‘Yunong Rd’ (裕農路) and ‘Yongda Rd’ (永大路) could be the locations.

The principle of this approach is to assign a weight to the related terms; the weight represents the appropriateness of the target event and uses the TF and sliding window approach to compute the term. In this study, if the matching position and notification time are the same as the emergency management information cloud (EMIC) label, it is considered true positive. We calculate the accuracy, recall and F1 measurements to assess the overall accuracy of our method. In addition to well-known social platforms such as Facebook and Twitter, similar social platforms in Taiwan are also popular among local people. There are millions of registered users and tens of thousands of people publishing and browsing articles and tweets online at the same time. A bulletin board system (BBS) was established by National Taiwan University. The BBS is called PTT¹ (批踢踢). The BBS platform has features such as Twitter issues or articles and tweets to update responses (see https://en.wikipedia.org/wiki/PTT_Bulletin_Board_System).

The data corpus used in this research was chosen from PTT (https://www.ptt.cc/bbs/hotboards.html) in Taiwan as the largest BBS (electronic bulletin board service) forum. This article uses data from the Tainan board text message (tweet) in the PTT platform and the EMIC disaster report data for further analysis.

According to the 2005 World Bank Hazard Management Unit publication on natural disaster hotspots, Taiwan has more than 95% of the regions with a relatively high total population, and there are two or more risks of natural disasters [14]. Taiwan has the highest economic exposure to natural hazards [15]. For example, Morakot was one of the most serious typhoons that hit Taiwan in 2009. The Typhoon Morakot caused severe floods and very large mudslides in southern Taiwan, causing many deaths and considerable property damage [16]. To reduce the risk, this research chooses flooding in Tainan as the target event for establishing a flooding warning mechanism in flooding-prone regions. During a disaster, it is important to make the right decisions to help the affected people meet their needs. Due to the lack of direct communication with the victims, the disaster management team relies on incomplete or incorrect information most of the time [11]. In this case, information from social media can help the rescue team take the best action [11]. This research analyses community information during disasters and helps build early warning models based on people’s disaster information. The proposed model can help rescuers understand the disaster situation.

The disaster warning system should be developed to prioritise rescue time and obtain disaster locations to improve rescue efficiency. In recent years, there have been few studies on the application of text exploration technology to natural disaster early warning. Middleton et al. [5] indicated that only approximately 1% of tweets in social media contain geographic tags of disaster events. The existing research does not use this information although it is very important for understanding the impact of disasters [17]. Therefore, this research proposes a simple and fast warning mechanism that is more suitable for natural disasters, and extracts the locations of floods on streets from the text content to improve the efficiency of disaster relief.

The goals of this article are listed as follows:

Building a flooding disaster notification and warning system. This article presents the idea of how to establish a warning mechanism with notification time for flooding disasters and provides relevant disaster relief personnel with references.

Prioritise rescue time and obtain disaster locations to improve rescue efficiency.

2. Related work

In this article, several key factors for building a disaster warning systems are of concern, namely, the data stream capturing speed and the represented keywords. In this article, the sliding window and TF are adopted and discussed below.

2.1. Sliding window

Online news in social media is constantly updated, providing valuable knowledge through data analysis [18]. After an incident, a large number of relevant text information is usually released in a short period of time. Therefore, how much information should be used for analysis is an important question. Many studies have focused on the improvement of efficiency to obtain good performance in time-series data mining [19], that is, by employing sliding window [20 –22] and deep learning [13,23] approaches. Deep learning models can take into account the characteristics of time-series data for prediction [23], but training deep networks is usually a very time-consuming task [24]. However, the time-series segmentation-sliding window method divides a long time series into several short sub-sequences [25], which is effective for processing data streams [20 –22]. Therefore, this study divides tweets into slices with a fixed time window size. By scanning all sliding windows, a series of new word frequency datasets can be identified.

2.2. Term frequency

The term-weighting scheme was a way to describe the relevance of term and documents. The frequency of terms in documents (e.g. term occurrence frequency, TF) and term relevance (e.g. term frequency–inverse document frequency, TF-IDF) information was quite common [26]. TF-IDF filters out frequently occurring term and keeps important term words. However, TF-IDF emphasised the importance of terms in the document, and text-specific topic words were likely to be given a lower TF-IDF weight than rare words [27]. The TF factor was usually used to measure the frequency of terms in query text [28], which can improve text clustering [29,30]. When a disaster event occurs, to find the subject terms or keywords of a specific disaster event, the TF weight was more suitable than the TF-IDF weight. It also had a good effect in fake news detection application [31].

2.3. Warning system

In this section, the application of text mining in early warning scenarios is reviewed. Examples include the notification from the earthquake reporting system developed using social media data that is faster than that from the Japan Meteorological Agency [7]; the prediction or early warning of student performance [32]; the use of social media data and probabilistic weather forecasts for early flood detection [33]; and topic analysis of the coronavirus literature that proposed a framework method for scientific foresight analysis [34]. These studies were utilised for early warning systems and extracted useful information.

Many researchers have published their studies of analysing social media (e.g. Twitter) content. Most studies are applied to event detection [13], geographic coordinates to identify disaster locations [12,13,35], and emotion and topic analysis [3], but few studies have been conducted on natural disaster early warning mechanisms and the locations of such disasters. Some of these studies use deep learning for model training [12,13,17,35]. In practice, most of the deep learning process requires manual preparation of a large amount of data in order to train the deep learning model, and the model training process is a very time-consuming task [24,36]. Therefore, this application of natural disaster events in the real world will be greatly restricted. To improve upon these limitations, the sliding window and TF methods used in this study are implemented because they are simpler, faster, and more suitable for natural disaster warning mechanisms.

3. Methodology

This section describes the module methods used in the proposed system.

3.1. Research architecture

Our approach further analyses the relationship between flood information and official rescue time from social media data and official flood disaster notification data. The goal of this article is to analyse social media text to warn of and identify flooding disaster locations. Figure 1 shows the system architecture for the text mining process used in our study:

In the preprocessing module, the raw data were processed mainly by word segmentation, filtering stop words and assigning attributes. The feature extraction is driven by a web crawler tasked with a set of keywords.

This article used TF and a sliding window to identify the disaster trends and to find appropriate parameters from those tweets and to predict the locations of the events.

Finally, we used statistical analysis techniques to alarm the disaster location by evaluating the threshold of the warning mechanism for the government office.

Figure 1.

System architecture and information flow.

3.2. Data preparation

The system starts by transforming raw data into a usable format, mainly by word segmentation, stop-word filtering and feature selections. To identify messages that contain relevant information, this article retrieved the messages that referred to the flooding event. In this research, a web crawler was used to collect the flooding tweets from the government EMIC. Figure 2 shows how to extract the candidate terms from our example. In the segmentation step, the content utilised the Chinese Knowledge Information Processing Group (CKIP) system, which was constructed by Academia Sinica, to break up the text into terms. After that, high-frequency composition words of the article were selected after removing the noise words and stop words (such as is, are, you and I). Finally, the dataset information data were stored in a database for analysis and visualisation.

Figure 2.

Extraction of the candidate terms with CKIP.

During the process, each article is an ∈ article set A and yields the obtained article $a_{i}$ = { ${aId}_{i}^{a}$ , ${title}_{i}^{a}$ , ${content}_{i}^{a}$ , ${poDateTime}_{i}^{a}$ }, where ${title}_{i}^{a}$ is the title of the ith article, ${content}_{i}^{a}$ represents the content of the posted message and ${poDateTime}_{i}^{a}$ is the ith article post date and time. For each tweet, p∈ tweet set P, the tweet $p_{ij}$ = { ${aId}_{ij}^{a}$ , ${pushContent}_{ij}^{p}$ , ${pushDateTime}_{ij}^{p}$ }, in which ${pushContent}_{ij}^{p}$ is the jth tweet content, with ${pushDateTime}_{ij}^{p}$ as the tweet date.

Each actual flooding disaster notification d∈ actual flooding disaster set D from EMIC and yields the obtained actual flooding disaster notification $d_{k}$ = { ${town}_{k}^{d},$ ${location}_{k}^{d},$ ${happenTime}_{k}^{d},$ ${description}_{k}^{d}$ }, where ${town}_{k}^{d}$ is the administrative area, ${location}_{k}^{d}$ represents the location of the disaster, ${happenTime}_{k}^{d}$ is the reported time of occurrence and ${description}_{k}^{d}$ represents the disaster description.

3.3. Sliding window and term frequency

In this module, the temporal series of documents are grouped to calculate the term weight to evaluate each term’s importance for each time slot. The sliding window is used to calculate the frequency of each keyword in the tweet and sort the frequency of the term to identify each flood disaster event [25]. Keywords are related to disaster events, expressed in high-frequency terms and terms. After identifying the terms of each flooding disaster, the possible location and flooding disaster time are estimated. In this model, the variables include the event time t, sliding window k and sliding interval s. The total sliding window N_s = (t − k)/s + 1. The concept is shown in Figure 3.

Figure 3.

Sliding window.

Some studies [28 –30] suggested that a term’s importance can be evaluated by measuring the frequency of occurrence of the terms in the document or query texts. To process the Chinese text, we used CKIP to divide the sentence and analyse its parts of speech. This article applied standard stop-word elimination and sort for TF. The example 0809 illustrates an extremely heavy rain in 2014. This research uses a set of keywords for the flood event feature to record them, as shown in Figure 4 (e.g. keyword set W is ‘flood’, ‘water’, ‘rain’ and ‘flooding’, which in Chinese means ‘淹’, ‘水’, ‘雨’ and ‘淹水’). Among these, 永康 (Yongkang), 東區 (East Dist.), 仁德 (Rende), 南區 (South Dist.) and 安平 (Anping) are location names in Tainan.

Figure 4.

Keyword set of example 0809 of extremely heavy rain in 2014.

Figure 5 shows the number of corresponding tweets for the target event, flooding, and each tweet has its own publishing time. Figure 5 shows the actual data for the 0906 example of extremely heavy rain in 2016, for which we made a breakdown of tweets recorded during the 48 h of the flooding event using our web crawler.

Figure 5.

Example 0906 of extremely heavy rain in 2016. The figure shows the number of keywords and the actual flooding disaster notification of the disaster statistics.

Among the tweets from the PTT platform are firsthand users’ posting information, retweets and comments. In this study, from the tweet keyword dataset and the EMIC notification dataset captured, it was observed that people posted tweets about flood disasters at specific locations, roads and geographies. Finally, keywords and window sizes used as thresholds are suggested after the experiments.

4. Experiments and results

The results of this study are presented in the following sections. These experiments also suggest parameter values for the proposed model.

4.1. Dataset collection

Social media can be an important resource for information. In this experiment, PTT, the largest Taiwan-based BBS forum, was selected. PTT has many boards that cover a multitude of topics, and many articles and comments are posted each day. According to the report in 2014 by the Market Intelligence & Consulting Institute [4], most users prefer to use multimedia to share information, followed by Taiwanese users who often use online forums. Among them, PTT is one of the network forums in Taiwan [25]. Therefore, this article chose PTT–Tainan Board as the scope of this research data and flooding disaster information for the event goals. Based on a textual social networking service, we use PTT community data because of its popularity and quantity of data, and it represents local social media.

This article analysed six events of typhoon disaster information from a total collection of 292 articles and 12,484 tweets (see Table 1). The data collection was accomplished using the Python web crawler from the Taiwan social media PTT Tainan Board. Tweets were collected based on the flood theme mentioned in the title or tweet.

Table 1.

Sampling of PTT tweets during a disaster.

Year	Event name	Article	Tweets
2014	0809 extremely heavy rain	91	2512
2015	Dujuan typhoon	10	2857
2016	0611 extremely heavy rain	31	1413
2016	Nepartak typhoon	10	674
2016	0906 extremely heavy rain	21	952
2016	Megi typhoon	129	4076

PTT does not have the geographic location (i.e. geotag) and hashtags (i.e. keywords preceded by #). This research collected a transportation road network digital atlas from the Ministry of Transportation and Communications (MOTC) to identify flooding roads (see Figure 6). Each tweet contains the time, content, subject, URL and user information tweet when the tweet was created, as shown in Figure 7. The official disaster event is referred to as the National Fire Agency of Taiwan’s Ministry of the Interior for EMIC datasets, with which people can post notification content about flooding disasters during events. The EMIC datasets had a total collection of 3598 flooding disaster notifications (see Table 2). Recall, precision and the F-measure are used to evaluate the reliability of the system

Precision (P) = \frac{tp}{tp + fp}

(1)

Recall (R) = \frac{tp}{tp + fn}

(2)

F - measure = \frac{2 \times P \times R}{P + R}

(3)

where tp is the true positive, fp is the false positive, tn is the true negative and fn is the false negative.

Figure 6.

Tainan transportation road network.

Figure 7.

Tainan flooding disaster article and tweet of PTT (resource: https://www.ptt.cc).

Table 2.

EMIC flooding disasters notification.

Year	Event name	Number of notifications
2015	Dujuan typhoon	308
2016	0611 extremely heavy rain	73
2016	Nepartak typhoon	454
2016	0906 extremely heavy rain	202
2016	Megi typhoon	2561

4.2. Flood disaster prediction based on keyword frequencies

In this research, we prepared 3598 flooding disaster notification positive examples that report flooding occurrences as a training set. This part extracts the candidate terms. It must choose the most appropriate term for the flood disaster area, and it is important to sort it. For this purpose, this research selected TF measures to perform the ranking. In addition to using the flooding feature dataset, this section combines flooding disasters with the geoparsing of locations in Tainan, as shown in Figure 8. Figure 9 shows and compares the extraction results of the candidate terms with the best ranking measurements.

Figure 8.

PTT dataset of flooding location information for at-flood areas and EMIC actual flooding disaster notification. The words are written in Chinese.

Figure 9.

Keyword set.

To ensure the keyword dataset availability, this article verified the experiments of Typhoon Megi and the 0906 example of extremely heavy rain, and the results of the experiments are shown in Figures 10 and 11.

Figure 10.

Typhoon Megi in 2016. The figure shows the number of keywords and the actual flooding disaster notification of the disaster statistics.

Figure 11.

Example 0906 of extremely heavy rain in 2016. The figure shows the number of keywords and the actual flooding disaster notifications of the disaster statistics.

The results show that the keywords had almost zero occurrences before the flooding disasters occurred. When this article found that the number of keywords increased faster, it was observed that there was an actual disaster notification to EMIC in the next 2–3 h.

4.3. Evaluation results of the flood disaster warning

In this section, by comparing and analysing the pattern generation results for each window size, better metrics are identified and suggested thresholds are proposed.

We calculate different cases for each flooding location and sliding window separately so that we can calculate the threshold level of the tweet. We calculate accuracy, recall and F-measure through parameters such as the sliding gap (e.g. 10 min) and the time window size (e.g. 30 min).

In this experiment, to analyse flooding disasters and flooding roads for different cases, we evaluated the relationship of the time interval (time interval) and sliding window size (window size) for the precision, recall and F-measure, with a time interval of 10 min and different threshold levels of 30, 60 and 90 min, in the flooding disaster notification.

According to the feature data on the five flooding events, the flooding disasters evaluated by the precision, recall and F-measure are shown in Figure 12.

Figure 12.

Flooding disasters evaluated by of metrics in the next 30, 60 and 90 min.

In the network tweet crawl from PTT, to assess the accuracy of the location of the flooded street from the PTT users, we compare the city street data with the EMIC notification street to obtain the tweet dataset of the disaster area in Tainan. We evaluate the flooding roads for precision, recall and F-measure, as shown in Figure 13.

Figure 13.

Flooding roads with metrics in the next 30, 60 and 90 min.

According to the results of the accidents shown in Figure 12, we found that in the three window size analysis results, when the window size is 90 min, the threshold increases to 20, and the F-measure shows an increasing trend (better). This combination obtains the best F-measure value. In the next section, this research uses a window size of 90 min and a threshold of 20 parameters for the case study.

4.4. Case study

In this section, the above parameters derived from through actual cases are used, to assess more information on disaster relief through the PTT community media information. The event studied was Typhoon Megi (September 2016), which caused major flooding in Tainan. For the Tainan flooding event, this article applied the sliding window approach with a sample window of 30, 60 and 90 min and a sampling rate of 10 min. The tweet dataset contained 4076 tweets, of which 161 mentioned streets. Our Tainan location database has 2040 streets available for matching.

This research compared each flooding disaster with actual disaster notifications from the official EMIC by the National Fire Agency. Figure 14 shows the threshold and number of disaster road times for effective prediction. Most flood disaster messages occur where tweets occur, and some flood hazard notifications also occur at EMIC.

Figure 14.

Typhoon Megi in 2016. The figure shows the threshold and number of disaster road times for effective prediction.

Experimental results show that 50 alarm areas that received warnings by the system did not receive reports in the official system, but 161 appeared on the official reports. Table 3 presents the results of meeting the threshold value for the actual disaster notification and actual disaster location in the next 90 min. Compared with the official report, the accuracy rates are 67.04% and 93.79%. Based on the experimental results, we show that the PTT Tainan Board provided reliability for the information. The case results show that high-precision (90% or higher) geographic areas (flooding locations) can be obtained from real-time PTT data in combination with existing urban road data and flood zone information. The results show that severely flooded areas have more posts about floods.

Table 3.

Study case results on Typhoon Megi.

Type	Actual disaster notificationin the next 90 min	Actual disaster notification and location in the next 90 min
Meets the threshold (A)	179 datasets	151 datasets
Does not meet the threshold (B)	88 datasets	10 datasets
Total (C = A + B)	267 datasets	161 datasets
Accuracy rate (D = A/C)	67.04%	93.79%

Examining the tweet volumes over time highlighted important trends. Figure 14 shows that the disaster notification trend contributed to the increase in volume, while there was an increase in keyword statistics from 18:10 to 19:40. The keyword frequency increased by up to 245 times. The experimental results show that a disaster alert occurred on 27 September 2016 at 18:10:00. In fact, the social media PTT Tainan Board information shows that the Tainan City Anping District canal river overflowed at 19:49:56 (see Figure 15), but the Anping District canal overflow on the official report was recorded at 21:22:27 from EMIC. This finding means that the proposed system automatically captured the disaster location and can be used as a warning system.

Figure 15.

Typhoon Megi in 2016: (a) social media flooding pictures and (b) PTT Tainan Board of flooding information (data resource: PTT).

5. Discussion

When applying this method, it is important to consider where the flood disaster occurs because the number of people who publish information about the disaster area is directly related to accuracy. This research selected Tainan as the target area because Taiwan has the highest economic exposure to natural hazards. This article adopted the local popular social media PTT as the object of data collection, and we determined the relevance of the actual disaster situation from plain text messages released by the official government.

This method can extract useful information that helps provide such information to relevant disaster relief personnel in flood disasters. The results show that our arguments are acceptable. This research uses official geographical data, which allowed us to identify streets that could be flooded.

In this study, we used EMIC dataset verification on flooding from social media PTT text messages. Figure 8 shows the results of the representative keyword set on flooding disasters. In this study, we collected PTT messages that only involved flood disasters, and not any other natural disasters. The entire collection of tweets during the flooding event period was used for comparison with the EMIC data. This study did not consider the degree of flooding or other factors. For further work, the strength of the disaster could be considered using opinion mining.

According to the statistics on the five flooding events, the highest average F-values were 0.315, 0.257 and 0.126 for geoparsing of the flooding locations, using different window sizes. The highest average F-value was 0.315 under a window size of 90 min with a TF threshold of 20 times. Therefore, we suggest that the recommended threshold could be 20 times and the window size 90 min.

According to Section 4.3 on the experimental results, this article found that the F-measure increases when the window size increases (the larger the size of the sliding window, the better the performance). This research used the same time interval and different window sizes for the conditions, and the metrics had significant differences for the precision, recall and F-measure for the flood disaster and flood road experiments (see Figures 11 and 13). Based on the experimental results, the disaster road experiment had a relatively high accuracy rate result. The flooding road information provided by the PTT social media was a reliable source for the information.

In summary, this study revealed that (1) community data can be used for natural disaster early warnings; (2) the system can automatically capture disaster locations; and (3) the official notification is approximately 30 min later than that identified by the public. In other words, this discovery helps the government improve the efficiency of disaster relief. Moreover, social media makes information dissemination faster, less expensive and more convenient. This also brings disadvantages such as fake news produced by social media users [31]. The convenience of social media to spread information has led to the prevalence of fake news. Many studies have been conducted on the application of fake news prediction [37] and detection [31,38], in which the main goal is to prevent fake news from having a significant impact on society. False phenomena spread farther, faster, deeper and more broadly than the truth, and such influences are more obvious for false political news [39]. In particular, social media (i.e., Facebook and Twitter) is an important channel for fake news websites [40]. The reliability of community information and the acceptance of information by government agencies can thus become challenged. Therefore, this study assumes that there is no falsified information.

6. Conclusion

Based on the observations in the experiments, this study revealed that social media trends can be used as an alert to disasters earlier than the official communication channel. In this research, a warning mechanism was established by text mining [41]. However, an ever-increasing number of politicians often act as spokespersons for the occurrence and development of disasters [42]. Fake news is also becoming increasingly popular, which may damage the credibility of information [40]. Many studies [31,37,38] have discussed and analysed the reliability of social media data (such as filtering or detecting fake news). However, this research does not currently address this type of information. In the future, we can build a predictive fake news model [37] or promote misinformation containment policies [39] by referring to more social information (such as online photos and videos) to improve the reliability of information.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: The research is based on work supported by the Taiwan Ministry of Science and Technology under Grant No. MOST 107- 2410-H-006 040-MY3 and MOST 108-2511-H-0 06-0 09. We would like to thank the Center of Innovative Fintech Business Models, Taiwan for a research grant to support this research.

ORCID iD

Chun Chieh Chen

References

Gao

Barbier

Goolsby

. Harnessing the crowdsourcing power of social media for disaster Relief. IEEE Intel Syst 2011; 26: 10–14.

de Albuquerque

Herfort

Brenning

, et al. A geographic approach for combining social media and authoritative data towards identifying useful information for disaster management. Int J Geograph Inform Sci 2015; 29: 667–689.

Karami

Shah

Vaezi

, et al. Twitter speaks: a case of national disaster situational awareness. J Inform Sci 2020; 46: 313–324.

Takahashi

Tandoc

Carmichael

. Communicating on Twitter during a disaster: an analysis of tweets during Typhoon Haiyan in the Philippines. Comput Human Behav 2015; 50: 392–398.

Middleton

Modafferi

. Real-time crisis mapping of natural disasters using social media. IEEE Intel Syst 2014; 29: 9–17.

McCallum

Liu

See

, et al. Technologies to support community flood disaster risk reduction. Int J Disaster Risk Sci 2016; 7: 198–204.

Sakaki

Okazaki

Matsuo

. Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Trans Knowl Data Eng 2013; 25: 919–931.

Tang

Zhang

, et al. Examining the role of social media in California’s drought risk management in 2014. Nat Hazards 2015; 79: 171–193.

Blake

Kimber lain

Berg

, et al. Tropical cyclone report: Hurricane sandy. National Hurricane Center, 12 February 2013, http://www.nhc.noaa.gov/data/tcr/AL182012_Sandy.pdf

10.

Dashen

Suchowski

. A guide to social media emergency management analytics: understanding its place through Typhoon Haiyan Tweets by C. Graham, C. Thompson, M. Wolcott, J. Pollack and M. Tran. Stat J IAOS 2015; 31: 483–484.

11.

Ragini

Anand

PMR

Bhaskar

. Big data analytics for disaster response and recovery through sentiment analysis. Int J Inform Manag 2018; 42: 13–24.

12.

Huang

Qin

, et al. Deep learning for real-time social media text classification for situation awareness – using Hurricanes Sandy, Harvey, and Irma as case studies. Int J Digital Earth 2019; 12: 1230–1247.

13.

Wang

Zhou

, et al. Tracking flooding phase transitions and establishing a passive hotline with AI-enabled social media data. IEEE Access 2020; 8: 103395–103404.

14.

Yetman

. Natural disaster hotspots: a global risk analysis. Washington, DC: World Bank, 2005.

15.

Maplecroft

. Asia natural hazards risk – economic exposure2012, 2011, https://reliefweb.int/map/world/asia-natural-hazards-risk-economic-exposure-2012

16.

Chen

Hong

Cao

, et al. Performance evaluation of radar and satellite rainfalls for typhoon Morakot over Taiwan: are remote-sensing products ready for gauge denial scenario of extreme events? J Hydrol 2013; 506: 4–13.

17.

Mao

Thakur

Sparks

, et al. Mapping near-real-time power outages from social media. Int J Digital Earth 2019; 12: 1285–1299.

18.

Chang

Lee

. EstWin: Online data stream mining of recent frequent itemsets by sliding window method. J Inform Sci 2005; 31: 76–90.

19.

. On-line and dynamic time warping for time series data mining. Int J Mach Learn Cybernetic 2015; 6: 145–153.

20.

Tanbeer

Ahmed

Jeong

, et al. Sliding window-based frequent pattern mining over data streams. Informa Sci 2009; 179: 3843–3865.

21.

Dallachiesa

Jacques-Silva

Gedik

, et al. Sliding windows over uncertain data streams. Knowl Inform Syst 2015; 45: 159–190.

22.

Yun

Lee

. Sliding window based weighted erasable stream pattern mining for stream data applications. Future Gener Comput Syst 2016; 59: 1–20.

23.

Liu

Hsaio

. Time series classification with multivariate convolutional neural network. IEEE T Ind Electr 2019; 66: 4788–4797.

24.

Shi

Wang

, et al. Benchmarking state-of-the-art deep learning software tools. In: 2016 7th international conference on cloud computing and big data (CCBD), Macau, China, 16–18 November 2016, pp.99–104. New York: IEEE.

25.

Chu

CSJ

. Time–series segmentation – a sliding window approach. Informa Sci 1995; 85: 147–173.

26.

Escalante

García-Limón

Morales-Reyes

, et al. Term-weighting learning via genetic programming for text classification. Knowled Based Syst 2015; 83: 176–189.

27.

Chen

Zhang

Long

, et al. Turning from TF-IDF to TF-IGM for term weighting in text classification. Expert Syst Appl 2016; 66: 245–260.

28.

Salton

Buckley

. Term-weighting approaches in automatic text retrieval. Inform Proces Manag 1988; 24: 513–523.

29.

Tran

TTN

, et al. Text clustering using frequent weighted utility itemsets. Cybernet Syst 2017; 48: 193–209.

30.

Shehata

Karray

Kamel

. An efficient concept-based mining model for enhancing text clustering. IEEE Trans Knowl Data Eng 2010; 22: 1360–1371.

31.

Ozbay

Alatas

. Fake news detection within online social media using supervised artificial intelligence algorithms. Physica A: Stat Mech Appl 2020; 540: 123174.

32.

Hung

Rice

Kepka

, et al. Improving predictive power through deep learning analysis of K-12 online student behaviors and discussion board content. Inform Discover Deliv 2020; 48: 199–212.

33.

Rossi

Acerbo

Ylinen

, et al. Early detection and information extraction for weather-induced floods using social media streams. Int J Disaster Risk Reduc 2018; 30: 145–157.

34.

Cheng

Cao

Liao

. An overview of literature on COVID-19, MERS and SARS: using text mining and latent Dirichlet allocation. J Inform Sci. Epub ahead of print 31 August 2020. DOI: 10.1177/0165551520954674.

35.

Hernandez-Suarez

Sanchez-Perez

Toscano-Medina

, et al. Using Twitter data to monitor natural disaster social dynamics: a recurrent neural network approach with word embeddings and kernel density estimation. Sensors 2019; 19: 1746.

36.

Montagnon

Cerny

Cadrin-Chênevert

, et al. Deep learning workflow in radiology: a primer. Insight Imag 2020; 11: 22.

37.

Reddy

Raj

Gala

, et al. Text-mining-based fake news detection using ensemble methods. Int J Automa Comput 2020; 17: 210–221.

38.

Gravanis

Vakali

Diamantaras

, et al. Behind the cues: a benchmarking study for fake news detection. Expert Syst Appl 2019; 128: 201–213.

39.

Vosoughi

Roy

Aral

. The spread of true and false news online. Science 2018; 359: 1146.

40.

Lazer

DMJ

Baum

Benkler

, et al. The science of fake news. Science 2018; 359: 1094.

41.

Rezaeian

Montazeri

RCGM

Loonen

. Science foresight using life-cycle analysis, text mining and clustering: a case study on natural ventilation. Technol Forecast Social Change 2017; 118: 270–280.

42.

Ewart

McLean

Ames

. Political communication and disasters: a four-country analysis of how politicians should talk before, during and after disasters. Discourse Context Media 2016; 11: 3–11.