A survey on real-time event detection from the Twitter data stream

Abstract

The proliferation of social networking services has resulted in a rapid growth of their user base, spanning across the world. The collective information generated from these online platforms is overwhelming, in terms of both the amount of content produced every moment and the diversity of topics discussed. The real-time nature of the information produced by users has prompted researchers to analyse this content, in order to gain timely insight into the current state of affairs. Specifically, the microblogging service Twitter has been a recent focus of researchers to gather information on events occurring in real time. This article presents a survey of a wide variety of event detection methods applied to streaming Twitter data, classifying them according to shared common traits, and then discusses different aspects of the subtasks and challenges involved in event detection. We believe this survey will act as a guide and starting point for aspiring researchers to gain a structured view on state-of-the-art real-time event detection and spur further research in this direction.

Keywords

Event detection microblog social media survey Twitter

1. Introduction

With around 310 million monthly-active Twitter users¹ producing content from all over the world, Twitter has essentially become a host of sensors for events as they happen. An event, in the context of social media, is an occurrence of interest in the real world which instigates a discussion on the event-associated topic by various users of social media, either soon after the occurrence or, sometimes, in anticipation of it.

Similar to our definition, Dou et al. [1] defined an event as ‘An occurrence causing change in the volume of text data that discusses the associated topic at a specific time. This occurrence is characterized by topic and time, and often associated with entities such as people and location’. In the collection on Topic Detection and Tracking (TDT) [2], an event is defined as ‘Something that happens at specific time and place along with all necessary conditions and unavoidable consequences’. Dou et al. [1] state that ‘Collectively, events serve as a succinct summary of social media streams. Individually, an event and its sub-events, reveal the evolution of certain social phenomena over time’. In accordance with these definitions, the series of coordinated terrorist attacks in Paris on 13 November 2015 were an event in the context of social media, prompting a high volume of tweets as soon as the attacks took place. The various aspects of this event were tweeted by general users and news agencies all over the world, for example, ‘BREAKING.This is what we know: 35 dead, 100 hostages taken at a concert venue. Various drive by shootings. Explosions at a #Paris stadium’. The given example was a high-profile event of global consequence; however, an event detection system should also be able to detect newsworthy events at a smaller scale, with a smaller burst of tweets at any given time.

Petrovic et al. [3] analysed major and minor events reported within a 2-month period in the newswire and on Twitter streams, finding that major events were equally covered by both the traditional newswire providers and Twitter. In addition, they found Twitter has better coverage of events related to sports, unpredictable high impact phenomena, and small or local events which fall outside the radar of newswire sources. Petrovic et al. [3] also noted that, in some cases, Twitter leads on reporting events related to politics and business.

Osborne and Dredze [4] compared the coverage and latency of breaking news reported on Facebook, Google Plus and Twitter, based on major events² identified from Wikipedia which occurred between the 10 and 31 December 2013. These social media sites were also compared based on the long-tailed events detected by the event detection system proposed by Petrovic et al. [5]. Osborne and Dredze [4] found Twitter to be faster in reporting breaking news than other social media sites, but was still outperformed by the newswires.

The findings by Petrovic et al. [3] and Osborne and Dredze [4] confirm the utility of an efficient Twitter-centric event detection system which is capable of detecting events, tracking event-related updates and providing meaningful summaries of the detected events. The challenge of doing so, however, is the limited context provided by tweets resulting from the length restriction of 140 characters imposed on each tweet. On top of that, the majority of information propagated on Twitter is irrelevant for the event detection task, and the noise generated by spammers and the use of an informal language, coupled with spelling and grammatical errors, can adversely affect the event detection process.

An early study on event detection techniques by Atefeh and Khreich [6] mostly focussed on the literature from before the year 2012, but there has since been rapid and wide-ranging development in this research area. The work done by Li et al. [7], Gaglio et al. [8], Stilo and Velardi [9], Xie et al. [10], Zhou et al. [11], McMinn and Jose [12], De Boom et al. [13] and Guille and Favre [14] are examples of contemporary developments in this area, employing a wide variety of techniques.

In contrast to our work, the organising of content in the study conducted by Atefeh and Khreich [6] is based on three major categories: (1) the type of event being detected by an event detection system (i.e. specified events and unspecified events), (2) the detection task (i.e. New Event Detection and Retrospective Event Detection) and (3) event detection methods (i.e. supervised, unsupervised and hybrid). As the focus of our study is on event detection methods, we believe that a different method of organising content is required instead of simply dividing the event detection methods based on whether they are supervised, unsupervised or a hybrid.

In this article, we classify the various detection methods based on the common traits they share (i.e. using probabilistic topic modelling, identifying interesting properties in a tweet’s keywords/terms and using incremental clustering). These categories will allow readers to gain a perspective on each of the general research directions, along with the finer details which separate one event detection technique from the other. In the study by Atefeh and Khreich [6], a total of 16 different research articles were discussed in detail. Our focus in this study is to provide a wider range of coverage on different event detection methods, as necessitated by the large number of recent studies. We intend to introduce readers to the recent literature, with a self-contained summary of each of the surveyed works and a comparison of the different proposed methods. This article does not contain an exhaustive review of the Twitter-centric event detection literature; however, representative methods are discussed from various fields, such as information extraction and retrieval, machine learning, data mining and natural language processing.

The remainder of this article is organised as follows: Section 2 contains a discussion of the general event detection framework and its components. Sections 3–6 contain discussions on the event detection techniques, categorised into approaches based on term interestingness, topic modelling, incremental clustering and miscellaneous approaches, respectively. We then briefly describe the different methods employed in the literature for system performance evaluation in Section 7. Pre-processing techniques are discussed in Section 8 and different event summarisation techniques are discussed in Section 9. Finally, a number of general observations on different event detection approaches are presented in Section 10. Section 11 concludes the article with a brief summary.

2. Event detection systems: a general framework

In order to detect an event which adheres to the definitions we have sketched in the introduction, a general event detection system framework (Figure 1) is presented containing a number of components to deal with the different stages of operation in event detection. A pre-processing component receives tweets from a streaming endpoint using a Twitter streaming API and processes them to be used in the subsequent stages by different components. Part-of-Speech (POS) tagging, slang-words conversion and Named Entity Recognition (NER) are all examples of pre-processing performed on tweets. A detailed discussion of the common pre-processing techniques is provided in Section 8. After the pre-processing stage, the Detecting Events component employs an event detection technique to detect events from the processed tweets. The event detection technique employed by this component belongs to any one of the general event detection approaches based on term interestingness, topic modelling and incremental clustering. There are also miscellaneous event detection techniques found in some studies in the literature which do not directly belong to the above-mentioned three general approaches, but are used in the Detecting Events component. The Detecting Events component is responsible for grouping tweets that are related, and each group/cluster of tweets corresponds to a candidate event.

Figure 1.

A general framework for event detection systems.

Once the candidate events are detected, the Ranking Events component ranks them based on criteria to determine the events that are of interest in the real world. In some cases, a threshold value or a supervised method is used to filter out trivial events using the Ranking Events component. In Section 3, we provide a discussion of the techniques applied by the Detecting Events and the Ranking Events component within the context of term-interestingness-based approaches. Similarly, these two components are discussed in Sections 4–6 within the context of topic modelling, incremental clustering and miscellaneous approaches, respectively. In addition, a selective number of methods which are commonly used to evaluate the performance of the event detection techniques studied in Sections 3–6 are discussed in Section 7.

Some event detection systems also include a Summarising Events component which creates a summary of tweets associated with an event. This component is responsible for presenting the events to the users in a meaningful way. Further discussion of the Summarising Events component is presented in Section 9. Finally, the detected newsworthy events are rendered by a desktop or web-based application. A sample event from the streaming Twitter data is displayed in Table 1 as an example.

Table 1.

Sample event: the 2012 Nobel prize winner in literature

Date/time	Event-related tweets
4:50 p.m./11 October 2012	Nobel literature jury to announce 2012 winner: After the announcements of three science awards, the Nobel Prizes … http://bit.ly/WUc1EH
4:51 p.m./11 October 2012	Chinese author Mo Yan wins Nobel Prize for Literature 2012 (first ever Chinese author to win it) … says he’s ‘overjoyed & scared’:)
4:51 p.m./11 October 2012	Chinese writer ‘Don’t Speak’ Mo Yan has been named the winner of the Nobel Prize in literature http://ti.me/Q0FYhhMT@TIME
4:51 p.m./11 October 2012	China’s Mo Yan feeds off suffering to win Nobel literature prize: BEIJING (Reuters) – Mo Yan, who has won the No … http://yhoo.it/TAJIMM
4:52 p.m./11 October 2012	Nobel Prize for literature awarded – Mo Yan of China won the prize for his novel ‘Frog’, which explores the traditio … http://ow.ly/2sCWey

3. Term-interestingness-based approaches

This section discusses the event detection methods which rely on tracking the terms (from the Twitter data stream) likely to be related to an event. These methods are summarised in Table 2, focussing on the approaches taken to determine term interestingness, clustering techniques used to group the tweets related to an event and the techniques employed to rank the events that were detected by an event detection system.

Table 2.

Summary of term-interestingness-based approaches

Articles	Interestingness detection	Clustering techniques	Ranking events
Li et al. [7] (Twevent)	Top-k bursty word segments are identified from the word segment frequency and the user frequency, within a specific time window.	Jarvis-Patrick clustering algorithm [15]	Events are ranked using newsworthiness score, calculated exploiting Wikipedia.
Marcus et al. [16] (TwitInfo)	User provided keywords are used to identify temporal peaks in tweet frequency utilising a weighted moving average and variance.	N/A	N/A
Mathioudakis and Koudas [17] (TwitterMonitor)	Detects high-frequency terms within a specific time window.	Tweets are grouped based on term co-occurrence, following a greedy strategy.	N/A
Alvanaki et al. [18] (enBlogue)	Shift in correlation values of tag pairs are identified.	Grouping is performed based on co-occurrence of tag pairs in a minimum number of documents.	Events are ranked based on average burstiness of a topic and average number of documents containing all tags of a topic.
Gaglio et al. [8] (TLDF)	A specialised term scoring measure is utilised to retrieve top-K terms from a dynamic temporal window.	A modified SFPM algorithm [19, 20] is used to group related terms in a topic.	N/A
Cataldi et al. [21]	Emergent terms are identified leveraging the user authority and the usage of the terms in previous time intervals.	Strongly connected components containing emerging terms are identified, from a graph constructed using a term correlation vector.	Average usage of emergent terms in a topic is utilised to rank an event.
Stilo and Velardi [9] (SAX*)	Temporal series of terms within a specific time window are made discrete using Symbolic Aggregate ApproXimation (SAX) [22] and a regular expression learned from Wikipedia Events⁴ is applied to remove non-event terms.	Bottom-up hierarchical clustering with complete linkage.	N/A
Parikh and Karlapalem [23] (ET)	Increase in frequency and appearance pattern of terms are used to determine important terms.	Agglomerative hierarchical clustering algorithm.	The number of frequent terms in a cluster is used to rank an event.
Weng and Lee [24] (EDCoW)	Discrete wavelet signals built from individual terms are leveraged to filter out trivial words.	Modularity-based graph partitioning.	Non-trivial events are detected based on the number of words and the cross-correlation among the words related to an event.
Zhang et al. [25]	Bursty words are identified by detecting an increase in augmented normalised term frequency [26] and user authority.	A word relation graph is utilised to cluster bursty words.	Event popularity is computed based on a linear spread prediction function.

SFPM: Soft Frequent Pattern Mining; TLDF: Twitter Live Detection Framework.

The event detection system ‘Twevent’ [7] initially extracts continuous and non-overlapping word segments (single words or phrases) from each tweet. Statistical information obtained from the Microsoft Web N-gram Service³ and Wikipedia is used to detect nontrivial word segments. The top-k bursty event segments within a fixed time window are then calculated from the frequency of bursty segments in conjunction with the user frequency of the bursty segments. Finally, a variant of the Jarvis-Patrick clustering algorithm [15] is used to group related event segments by using the content of their associated tweets and the frequency pattern of the segments within the specified time window. The events are then filtered based on their newsworthiness score, $μ (e)$ , which is calculated using Wikipedia as a knowledge base for an event, e, containing a set of event segments, $e_{s} = {s}$ , using the following equation

μ (e) = \frac{\sum_{s \in e_{s}} μ (s)}{| e_{s} |} \cdot \frac{\sum_{g \in E_{e}} sim (g)}{| e_{s} |}

(1)

where E_e denotes the set of edges obtained after applying the clustering algorithm, with each edge representing the similarity between the event segment nodes they connect; sim (g) is the similarity score of an edge, $g \in E_{e}$ , between the two event segments, calculated based on the associated tweets and their temporal frequency patterns. The newsworthiness score, $μ (s)$ , of a segment, s, is calculated based on the prior probability of a sub-phrase, l, of segment, s, appearing as an anchor text in the Wikipedia articles containing l

μ (s) = max_{l \in s} e^{Q (l)} - 1

(2)

Events detected by Twevent [7] are highly dependent on the Microsoft Web N-gram Service and Wikipedia. This dependency can yield a set of events that are influenced by the service itself. Moreover, events that are not yet reported in Wikipedia might not be detected.

TwitInfo [16] allows a user to input event-related keywords to track an event. The system then starts logging tweets which match the user-specified keywords, detects spikes in tweet data as sub-events and automatically labels them with frequently occurring meaningful terms from the tweets. The approach taken by TwitInfo, to detect sub-events based on the peak in conversation about a topic, is inspired by the online algorithm used in Transmission Control Protocol (TCP) congestion control to detect unusual delays in packet transmission [27]. Tweets that arrive within a fixed time frame are binned, and the frequency of tweets within the bins is measured to detect spikes in the tweet frequency over time. Analogous to the approach used by TCP to determine an unusual delay in the transmitted packet being acknowledged, TwitInfo uses a weighted moving average and variance to determine an unusually large number of tweets in a bin. A significant increase in the tweet arrival rate, with respect to the weighted average of the historical mean tweet rate, is registered as a spike by the system. The local maximum of the spike is identified using a hill climbing algorithm to detect the time frame of the sub-event. The top-ranking $tf - idf$ weighted terms are used to provide meaningful labels for the sub-event, where the $tf - idf$ value for a term is obtained by multiplying the term frequency with the inverse document frequency.

TwitInfo allows aggregate sentiment visualisation using a Naïve Bayes classifier for sentiment analysis and employs a recall-normalised approach to provide an overview on the sentiment associated with an event. TwitInfo also allows users to drill down through a graphical interface into the sub-events of an event, although it is limited by the fact that the system is intended to track only the events specified by users and cannot distinguish between overlapping events.

TwitterMonitor [17] detects emergent topics by utilising an elementary queuing model to identify the high-frequency (bursty) terms from the tweets within a small time window. The bursty terms detected by the system as co-occurring in a large number of tweets are placed in the same group, where each group qualifies as a trend. A greedy search strategy is used to generate groupings, in order to avoid the high computational cost of enumerating all possible groups. To provide an accurate description of the detected trends, TwitterMonitor uses context extraction algorithms [28], which find terms that are not necessarily bursty in nature, but are correlated to the trend.

EnBlogue [18] detects emergent topics from blogs and Twitter data by computing statistical values for hashtag pairs within a given window of time and monitoring unusual shifts in their correlations. The strength of these shifts in hashtag pairs is used to rank emergent topics, and the top-k ranked topics are returned by the system. There are three major stages of system operation in enBlogue: seed hashtag selection, correlation tracking and shift detection.

EnBlogue reduces the computational cost by considering only popular hashtags as a seed. The popular hashtags are determined based on a sliding-window average of elements found in the document stream. The extracted named entities are also included as part of the seed. Any combination of the hashtags which contains at least one of the seed hashtags is further considered for the calculation of the correlation. The correlation between the hashtag pairs in a given time window is measured based on the average number of documents containing both the hashtags and the similarity between these documents. The hashtag pairs are then monitored to detect a shift in their correlation and the strength of the shift is measured using exponential smoothing as a forecasting technique. The exponential smoothing equation computes the score for possible emergent topics by giving higher weights to the most recently observed correlation values and lower weights to observations from the distant past. Finally, a post-processing phase is employed to group hashtag pairs which refer to the same event. To minimise the occurrence of multiple hashtag pairs referring to the same event, two hashtag pairs that co-exist in at least 80% of tweets are grouped in the same event.

The Twitter Live Detection Framework (TLDF) [8] adapts the Soft Frequent Pattern Mining (SFPM) algorithm [19], to detect relevant topics within a generic macro-event while addressing the dynamic nature of the Twitter data stream. Unlike enBlogue [18] and TwitterMonitor [17], TLDF uses a sigmoidal function-dependent dynamic temporal window, sized to detect events based on term co-occurrence in real time. The dynamic temporal window allows the TLDF to adapt its event detection behaviour based on the actual volume of tweets related to an event. To reduce the number of terms to be considered, the term selection method in the modified SFPM generates top-K terms from the set of tweets within the current time window. Each term is weighted based on a value which is decided depending on, first, the term being recognised as a named entity (i.e. persons, organisations and locations) by the NER [29] and its $tf - idf$ score and, second, the highest ratio of the likelihood of appearance for a term in the current time window and the reference body of randomly collected tweets. The combination of the likelihood ratio of appearance and the $tf - idf$ score of a term filters out the common terms, but retains the terms which are relevant in the current time window and are present in past topics. Moreover, during the term-weighting process, named entities are boosted by a factor of 1.5 to account for their importance in event detection.

The event detection scheme proposed by Cataldi et al. [21] initially creates a term vector representation from the tweets within a given time window. The term weights are represented as augmented normalised term frequencies [21] to reduce noise. In the next step, the system determines the authority of a user based on the number of followers of the user and the authority of the followers. The approach taken to determine user authority is similar to the PageRank algorithm [30]. A life-cycle-based content model is applied within a given time interval to determine the life cycle of each of the detected terms. The life cycle of a term is modelled by calculating the energy of the term. Energy is computed by leveraging the user authority and the usage of the term in previous time intervals. Afterwards, a threshold is dynamically set at each time interval; the terms with a higher energy value than the threshold are considered as emergent terms for that interval. Finally, a correlation vector for each of the terms is created. It represents the semantic association of a term with all the other terms in the corpus. By leveraging this correlation vector, a directed and edge-weighted graph is constructed. This allows for topic detection from the graph by identifying strongly connected sub-graphs containing an emergent term.

Stilo and Velardi [9] use temporal co-occurrence of terms instead of contextual co-occurrence to detect events. The temporal co-occurrence-based approach has been adopted to overcome the problem of limited context provided by individual tweets. The Stilo and Velardi [9] have utilised the Symbolic Aggregate ApproXimation (SAX) [22] technique to provide a temporal characterisation of events within a specific temporal window, W, by converting the time series associated with terms into a sequence of symbols. Anomalous behaviours are detected by learning a regular expression on a subset of the Wikipedia Events,⁴ which reduces the number of terms to be considered in the subsequent clustering phase. A bottom-up hierarchical clustering technique with complete linkage is applied on the remaining terms to group the terms that share similar string representations within window, W. The overall computational complexity of the event detection system is $O (Dt + L + L' W + (L' - 1) (W^{2} L') + K)$ , where D is the number of tweets in W, t is the average tweet length, L is the vocabulary dimension in W, L′ is the vocabulary dimension after pruning and K is the number of discovered events.

The event detection system, Events from Tweets (ET), proposed by Parikh and Karlapalem [23] extracts event-representative terms based on the increased frequency of the terms in consecutive time intervals. A list of intervals are associated with each term, in which they are frequent. A combination of frequency increase and appearance pattern of the terms is utilised to filter out terms that are bursty in nature but not related to any event. ET uses bigrams as candidate terms to reduce the computational cost incurred by a large number of bursty unigrams. Similar terms (bigrams) are clustered using an agglomerative hierarchical clustering technique which does not require the number of clusters to be specified in advance. The clustering algorithm utilises a similarity matrix which contains the similarity score between each pair of terms and starts by merging the terms with the highest similarity score. After a merge operation is applied, the similarity matrix is updated; this process continues until there are no pairs of terms/clusters with a similarity score above a certain threshold remain. The overall similarity between two terms is calculated based on content similarity, $C_Sim (k_{1}, k_{2})$ , and appearance similarity, $A_Sim (k_{1}, k_{2})$

Sim (k_{1}, k_{2}) = α \cdot C_Sim (k_{1}, k_{2}) + β \cdot A_Sim (k_{1}, k_{2})

(3)

The content similarity, $C_Sim (k_{1}, k_{2})$ , between the terms $k_{1}$ and $k_{2}$ is calculated using the frequently co-occurring bigrams (FCB)

C_Sim (k_{1}, k_{2}) = \frac{FCB (k_{1}) \cap FCB (k_{2})}{FCB (k_{1}) \cup FCB (k_{2})}

(4)

The appearance similarity is calculated based on the frequent time intervals (FI) of $k_{1}$ and $k_{2}$

A_Sim (k_{1}, k_{2}) = \frac{FI (k_{1}) \cap FI (k_{2})}{FI (k_{1}) \cup FI (k_{2})}

(5)

Bursty terms are detected in Event Detection with Clustering of Wavelet-based Signals (EDCoW) [24] by building a signal for each word using wavelet theory [31]. Modularity-based graph partitioning has been utilised to detect events by calculating cross-correlation among the signals generated from each word. A drawback of this method is that the process of wavelet generation for each word and the calculation of correlations are computationally expensive.

The event detection system proposed by Zhang et al. [25] detects bursty terms, where a term burst is identified from an increase in the term weight within a given time window. Each term is weighted by a combination of augmented normalised term frequency [26] and user authority. User authority is incorporated into term weighting using the assumption that the use of a term in a microblog post by an influential user (with many followers) will boost the use of the term through their followers. The approach taken to estimate user authority is similar to the PageRank algorithm [30].

Zhang et al. [25] have used a two-state automata model based on the Hidden Markov Model to detect bursty terms, proposing an incremental strategy with a complexity of $O (n)$ . From the set of bursty terms in a given time window, a term relation-directed graph is constructed. Each vertex in this graph is a bursty term, and an edge between the terms represents a co-occurrence relation. Subsequently, a graph-based clustering technique is applied to extract the strongly connected components, each of which represents an event. The system proposed by Zhang et al. [25] also predicts the popularity of an event based on the spread model, which relies on determining the user’s influence, the user’s interest in the event and the tweet volume of the event.

4. Topic-modelling-based approaches

In this section, we discuss the event detection methods which are dependent on the probabilistic topic models to detect real-world events by identifying latent topics from the Twitter data stream. The topic-modelling-based approaches for event detection associate each tweet with a probability distribution over various latent topics to find the hidden semantic structures from a collection of tweets used to guide the event detection task. These methods rely on sophisticated models to infer latent topics. In the following, we describe the motivations for using more complex modelling to account for the different aspects of topic detection with respect to event detection.

The topic-modelling-based approaches discussed in this section revolve around complex mathematical models to estimate the probability distribution of the latent topic variables. We have focussed our discussion on the different tweet-related attributes which were used to estimate the joint probability distribution, along with the other unique aspects of these approaches which distinguish them.

TwiCal [32] populates an open-domain calendar for important events in order to provide a structured representation of significant events extracted from the Twitter data stream. TwiCal extracts the named entities [33], along with associated event phrases and dates [34], from each of the streaming tweets. A supervised approach to extracting event phrases by using Conditional Random Fields [35] was adopted. A multitude of features including, but not limited to, contextual, dictionary and orthographic features were used to guide the learning and inference of event phrases. Due to the highly volatile nature of topics in Twitter data, a latent variable model is used to discover the important event types from a large body of tweets. From the discovered event types, only the coherent types found during inspection were retained and manually annotated with informative labels. These event types can be used to categorise event phrases extracted from subsequent new data. Events are ranked by measuring the association strength between an entity and a specific date, based on the $G^{2}$ log likelihood ratio statistic. This approach – identifying events based on the association between an entity and a specific date/time – does not work well for events that are unexpected or smaller in terms of significance.

The Latent Event and Category Model (LECM) [11] uses a latent variable model similar to TwiCal [32], which detects events in a structured manner. In the LECM, each tweet is modelled as a joint distribution over the named entities, date/time and location of the event occurrence, and the event-related terms. This distribution is expected to ensure that two events occurring in the same place and time are distinguishable. To categorise events of different types, each named entity is mapped to its related semantic concepts using Freebase API.⁵ Afterwards, each tweet is modelled as a joint distribution of the event-related terms and the semantic concepts of the named entities. The LECM uses the Collapsed Gibbs sampling algorithm to infer the parameters used in the model, the event types and their semantic class. Unlike TwiCal [32], the LECM uses a Bayesian modelling approach which can extract event-related keywords directly from the tweets without requiring supervised learning. Moreover, the LECM model can produce a structured representation of events directly, whereas TwiCal depends on the $G^{2}$ log likelihood ratio statistic to measure the association strength between an entity and a specific date.

The General and Event-related Aspects Model (GEAM) [36] is a hierarchical Bayesian model based on Latent Dirichlet allocation (LDA). It is similar to the LECM [11], but each tweet in the GEAM is modelled not only on event-related aspects (i.e. time, locations and entities) but also on general topics. Each named entity or hashtag in a tweet is assigned an event-related aspect. Other terms in a tweet are assigned a general topic or an event-related aspect based on a switching variable drawn from a binomial distribution. The Collapsed Gibbs sampling algorithm is used to estimate the multinomial (i.e. events, general topics and event’s aspect) and binomial distributions.

TopicSketch [10, 37] detects bursty events by maintaining a novel data sketch to detect acceleration in three quantities: the whole Twitter stream, every word and every pair of words. The system provides a low-cost solution for maintaining and updating this information. The sketch-based topic modelling approach triggers topic inference when an acceleration on these stream quantities is detected. As this strategy will result in data with dimensions in the order of millions, a set of the most recently encountered active words monitored and a hashing-based dimension reduction scheme is utilised to address this issue. A lazy maintenance scheme is employed using H hash functions to limit the number of accelerations to be updated to $O (H . | d |^{2})$ , where $| d |$ represents the average number of words in a tweet. The hashing scheme can handle the high dimensionality of data in TopicSketch, which originates from the high number of distinct words encountered by reducing the dimensionality of data to $O (H . B . K)$ , where K refers to the number of active latent topics and B refers to the number of buckets where the words are mapped using the H hash functions.

Bursty Event dEtection (BEE+) [38] is an incremental topic model which discovers bursty events by modelling the temporal information of events. It utilises a distributed execution framework to achieve real-time processing capability. Compared with documents processed in the standard topic models, where a large document has a probability distribution over a mixture of topics, a post in a microblog containing only a few sentences is more likely to be associated with a single topic, and thus a single event. Unlike most of the other topic models for event detection, in BEE+ a microblog post related to an event is determined by associating only a single hidden variable with an additional background topic which has a distribution over the common words. Burst detection in BEE+ is similar to the approach used in TwitInfo [16], which was inspired by the TCP congestion control algorithm. The process to estimate the parameters in BEE+ has a faster convergence time than traditional topic models. Moreover, the incremental parameter update process in BEE+ is able to keep track of topic drifts over time.

Spatio-Temporal Multimodal TwitterLDA (STM-TwitterLDA) [39] is a topic-model-based framework for event detection which extracts text, image, location, timestamp and hashtag-based Twitter features from each tweet in the Twitter data stream as input and then jointly models the probability distribution of these features to detect events. Note that these features, apart from plain text, may not always be present in a tweet. STM-TwitterLDA employs an Support Vector Machine (SVM) classifier to remove noisy images, a latent filter to remove general images and words, and Convolutional Neural Networks (CNN) [40] to extract visual features from images to leverage in event detection. Finally, maximum-weighted bipartite graph-matching is applied in consecutive periods to track the evolution of the detected events.

Shepard [41] proposed a theoretical framework for retrospective event detection. The proposed system combines the unified model proposed by Diao and Jiang [42] with the Nonparametric Pachinko Allocation Method (NPAM) [43] in order to detect nested hierarchies of topics, called super-topics and sub-topics. Here, the super-topics correspond to long lasting super events and sub-topics are the smaller sub-events contained within the super-topics.

Madani et al. [44] applied a nonparametric Bayesian model, called Hierarchical Dirichlet Processes (HDP) [45], to detect trending topics from the Twitter data stream. For this method, a vector of topics is initially discovered by applying the generative model of HDP on tweets so that, for each tweet, a distribution of topics is calculated by exploiting the vector of topics. The topic with the highest probability in a distribution of topics is considered a trending topic. Tweets with similar trending topics are grouped into clusters. The authors used the Gibbs sampling algorithm to estimate the parameters used in their model. In addition, the system utilises a thesaurus of terms, created using the YAGO (Yet Another Great Ontology),⁶ to incorporate semantic information which can aid in the clustering of trending topics.

5. Incremental-clustering-based approaches

As traditional clustering algorithms usually require that the total number of clusters be fixed, it is difficult to predict the total number of expected event clusters in advance for high-volume, real-time Twitter data where a wide variety of topics are discussed. The event detection methods discussed in this section follow, at their very core, a clustering strategy, which is incremental in nature in order to avoid having a fixed number of clusters. These methods are summarised in Table 3, focussing on the approaches taken to determine the term weights to generate a tweet vector, methods applied along with the incremental clustering to group event-related tweets, and the techniques employed to rank events that were detected by an event detection system.

Table 3.

Summary of incremental-clustering-based approaches

Articles	Tweet representation	Clustering techniques	Ranking events
Hasan et al. [46] (TwitterNews+)	tf − idf	Similarity-threshold-based incremental clustering, with a defragmentation sub-module applied simultaneously to handle cluster fragmentation.	A combination of filters is used to discard the trivial event clusters.
Petrovic et al. [5]	tf − idf	Similarity-threshold-based incremental clustering, exploiting locality-sensitive hashing and cluster expiration after a specific time to achieve scalability.	Events are ranked on the number of unique user posts and entropy information in a cluster.
Osborne et al. [47] (ReDites)	tf − idf	LSH-based incremental clustering proposed by Petrovic et al. [5].	Events are reported based on a content classifier, then an adapted VDM model [48] is used to classify security-related events.
Becker et al. [49]	tf – idf, boosted weight on hashtags.	Similarity-threshold-based incremental clustering, with a periodic second pass to handle cluster fragmentation.	Events are ranked using the confidence score assigned by the SVM classifier.
De Boom et al. [13]	tf − idf	Incremental clustering approach in Becker et al. [49], augmented with hashtag-level semantics.	A logistic regression classifier [50] is used to filter out trivial events.
Phuvipadawat and Murata [51]	tf − idf, boosted weight on proper nouns.	Similarity-threshold-based incremental clustering.	Popularity, reliability and message-freshness factors are exploited to rank events.
McMinn and Jose [12]	tf − idf	Size and similarity-threshold-based incremental clustering, utilising inverted indexes for named entities, coupled with aggressive tweet filtering for scalability.	N/A
Unankard et al. [52] (LSED)	Augmented normalised term frequency [26] with slang conversion and synonym expansion.	Leader–follower clustering algorithm [53], exploiting content and concept similarity.	Events are ranked on location correlation score, and tweet burst in a cluster.
Kaleel and Abhari [54]	tf − idf	Incremental clustering utilising LSH-based scheme with prefix-tree data structure.	N/A
Lee and Chien [55]	Dynamic term-weighting scheme based on the arrival rate and the occurrence probability of a term within a sliding window [56].	Modified IncrementalDBSCAN clustering algorithm [57, 58].	Events are ranked by applying an energy function over the topic clusters [59].

LSH: locality-sensitive hashing technique; VDM: Violence Detection Model.

TwitterNews+ [46] employs a two-stage operation for event detection, where the first stage involves detecting a burst in the number of tweets discussing an event, while the second stage involves clustering the tweets which discuss the same event. The first stage of operation in TwitterNews+ is handled by a Search Module. For each tweet, the Search Module facilitates the fast retrieval of similar tweets which are obtained from the set of most recent tweets maintained by TwitterNews+. The Search Module utilises an inverted index where each entry contains a term and a regularly updated, finite list of the most recent tweets containing the term. For each tweet in the system, the top-k $tf - idf$ weighted terms are used to find the previously encountered tweets from the inverted index containing these terms. A cosine similarity comparison is performed with the retrieved tweets to provide a decision on the novelty of the tweet, with a computational complexity of $O (1)$ . A tweet decided as ‘not unique’ by the Search Module means that similar tweets have been encountered before, so, therefore, the tweet is bursty and needs to be tracked in the second stage of the system. Here, similar tweets which talk about the same events are grouped together using an incremental clustering approach. The operation of the second stage of TwitterNews+ is handled by the EventCluster Module, which is responsible for taking quick decisions on assigning a tweet to an event cluster using an incremental-clustering-based approach. The EventCluster Module also maintains an inverted index where each entry contains a term and a regularly updated finite list of IDs of the most recent event clusters in which the term appeared. For each tweet decided as ‘not unique’, the top-k $tf - idf$ weighted terms of the tweet are used to find the event clusters from the inverted index containing these terms. A cosine similarity comparison with the centroids of these event clusters is performed, with a computational complexity of $O (1)$ , to find the maximum matching cluster above a certain threshold to which the tweet will be assigned. If no such cluster is found, then a new cluster is created and the tweet is assigned to the newly created cluster.

Petrovic et al. [5] have implemented a First Story Detection (FSD) system based on an adapted variant of the locality-sensitive hashing technique (LSH). The LSH-based approach provides a fast way to compare the similarity of the input tweet with previously encountered tweets to determine its novelty while complying with the constant time and space requirements of a streaming FSD system. A novel tweet represents a new story, which will be assigned to a newly created cluster. However, a tweet determined as ‘not novel’ will be assigned to an existing cluster containing the nearest neighbour. The proposed system uses the number of unique user posts and the entropy information in a cluster to rank event clusters. Besides relying on the aforementioned redundancies, the system lacks the capability to distinguish between significant and trivial events. In their later work [60], the authors have used Wikipedia as an external source to identify significant events. The approach taken to track unusual spikes in Wikipedia page views suffers from latency means that real-time event detection is delayed, as the Wikipedia stream has a substantial time lag compared with the Twitter stream.

ReDites [47] is built on the LSH-based FSD system [5], tailored for the security domain to detect events related to security (e.g. violent events, natural disasters and emergency situations). In order to improve the precision of the LSH-based event detection system [5], a content classifier was built on a passive-aggressive algorithm and trained on manually labelled events that were automatically discovered by the FSD system. Out of the entire set of detected events, security-related events are extracted, using a weakly supervised Bayesian modelling approach based on the Violence Detection Model (VDM) proposed by Cano et al. [48]. The VDM starts with a security-related word lexicon created from DBpedia⁷ and subsequently discovers new words related to security events for better classification.

Becker et al. [49] have used an incremental clustering algorithm to detect events from the Twitter stream. For each tweet, its similarity is computed against each of the existing clusters. If the similarity of a tweet is not higher than a specific threshold in any of the existing clusters, a new cluster is created. Otherwise, the tweet is assigned to a cluster with the highest similarity. Once the clusters are formulated, a Support-Vector-Machine-based classifier is used to distinguish between real-world events and non-events. The classifier is trained on temporal, social, topical and Twitter-centric features. Events are ranked based on the confidence score assigned by the classifier.

De Boom et al. [13] augmented semantic information with individual tweets in the incremental clustering approach adopted by Becker et al. [49]. The authors used TwitterLDA [61] to assign a semantic topic to each tweet and reported that, instead of using semantic topics for individual tweets, assigning semantic labels based on a coarser hashtag-level provides a significant gain in precision and recall over the Becker et al. [49] baseline. However, the hashtag-level semantics will work only when the event-related tweets contain hashtags. A similar observation was pointed out by Mehrotra et al. [62], mentioning the problem with hashtag-based pooling when a lot of tweets do not contain any hashtags, although automatic hashtag labelling can improve this situation to some extent.

An incremental-clustering-based approach similar to Becker et al. [49] is taken by Phuvipadawat and Murata [51]. The authors used pre-defined search queries, such as #breakingnews, to fetch a more focussed set of tweets from Twitter for the event detection task and boosted the $tf - idf$ scores on proper nouns to obtain a better grouping of event-related tweets. Event groups are ranked based on the popularity, reliability and freshness of the tweets in each group. The popularity is measured by using the total number of retweets in a group, and the reliability is measured based on the total number of followers of all the users within the group. Finally, the group score is adjusted by assigning more weight to recent tweets.

McMinn and Jose [12] worked on the idea that named entities are the building blocks of events and implemented a system that identifies bursty named entities for the detection and tracking of events. A drawback of this approach is that it is dependent on the accuracy of the underlying Named Entity Recognizer [63] it uses. A Tweet is clustered based on the named entities it contains; thus, the same tweet might be placed in clusters for each of the named entities contained in the tweet. A unique inverted index for each entity is maintained, where an entity is associated with a cluster of tweets that are near neighbours. This is done to reduce the computational complexity to find the nearest neighbour of a tweet based on the entity it contains. By leveraging the inverted indices, a tweet is assigned to the cluster associated with an entity where its nearest neighbour belongs. If no such cluster exists, a new one is created where both the tweet and its nearest neighbour are assigned, and the tweet is added to the inverted index of the entity. A burst on a cluster associated with an entity in a given time window is detected based on the Three Sigma rule [64], which states that, with an empirical near-certainty of 99.7%, all values in a normal distribution lie within 3 standard deviations of the mean. If the number of tweets in a cluster within a given time window exceeds 3 standard deviations from the mean, the entity is considered bursty based on the Three Sigma rule.

Once a burst is detected on an entity, an event associated with the bursty entity is created. If the average timestamp of the tweets in the cluster occurs after the initial burst, the cluster of tweets is assigned to an event. An event is considered to be expired when all the entities associated with it are no longer bursty.

Location Sensitive Emerging Event Detection (LSED) [52] detects emerging hotspot events within a sliding time frame by identifying strong correlations between the user locations and event locations. LSED clusters similar tweets from a combination of content similarity and concept similarity, using an incremental clustering approach called leader–follower clustering [53]. Tweet clustering is triggered when the timestamp of the incoming tweet is greater than the current sliding temporal window. Terms are weighted based on the augmented normalised term frequency [21], and then WordNet [65] is used for the synonym expansion of nouns and verbs from the top 20% weighted terms in a tweet, in order to ensure efficient clustering.

Content similarity is calculated based on the cosine similarity measure. In order to reduce duplicate clusters, the top terms are extended with the hypernyms derived from WordNet, calculating the concept similarity based on the Jaccard similarity measure [66]. Once the clusters are formed by the LSED system, each candidate event cluster, $e,$ is considered to detect the emerging hotspot events based on a combination of the location correlation score and the tweet burst in a cluster

EmergScor e_{e} = (1 + LocScor e_{e}) \times \frac{N_{e}}{Mea n_{Prev} + 2 S D_{Prev}}

(6)

where $LocScor e_{e}$ is the score for correlations between the user locations and the event locations in $e$ ; $N_{e}$ is the number of tweets in $e$ in the current time slot; and $Mea n_{Prev}$ and $S D_{Prev}$ are the mean and the standard deviation of the number of tweets in the previous time slots in $e$ , respectively.

Kaleel and Abhari [54] used a locality-sensitive-hashing-based scheme with a prefix-tree data structure to discover the event-related clusters. Each cluster is labelled with the most frequent terms which represent the cluster centroid. A new cluster is created for a tweet that does not belong to any cluster. Lee and Chien [55] proposed an event detection system which incorporates a dynamic term-weighting scheme [56] to handle the concept drifts in a topic. It adapts the IncrementalDBSCAN algorithm [57, 58] which utilises neighbourhood relations among the tweets to detect events.

6. Miscellaneous approaches

This section discusses the event detection methods that adopt hybrid techniques, which do not directly fall under the three categories discussed above.

Chierichetti et al. [67] proposed a linear classifier to detect events, which relies on mathematically formalising the Twitter users’ communication behaviour, instead of using textual information. The proposed system follows the users’ activity around a key event by monitoring the shift in the amount of tweets/retweets produced and the level of communication between individual users.

Guille and Favre [14] proposed an event detection system called MABED (Mention-Anomaly-Based Event Detection), which detects events by exploiting the statistical values calculated from the textual contents of tweets and the frequency of user interactions through user-name mentions. Each event produced by MABED is characterised from the duration of the event, a main event word and its related weighted words, and the magnitude of impact of the event on Twitter users.

Huang et al. [68] proposed a pattern-mining-based approach to detect events by clustering event-representative High Utility Patterns (HUPs) from microblog texts. An HUP is determined based on the importance of the terms contained in a pattern. Patterns are generated using the FP-Growth algorithm [69], which provides an efficient and scalable method for mining the complete set of frequent patterns by exploiting an adapted prefix-tree structure. To reduce the computational and the memory cost incurred with a high number of generated patterns, a top-K HUP mining algorithm is proposed that performs HUP selection and pattern generation simultaneously. HUP mining is performed from a set of microblog texts within a fixed time window by maximising the detected pattern utilities where the overlap-degree between each pair of patterns is always below a certain threshold.

Once the HUPs are detected, an incremental pattern clustering is applied to group-related patterns in clusters. As the HUPs contain little text, additional texts associated with HUPs are used while measuring pattern similarity. The clustering process is a combination of kNN classification and modularity-based clustering [70], which allows simultaneous identification of emergent and coherent topics.

Adedoyin-Olowe et al. [71] focussed on detecting events from the sport and political domains by extracting newsworthy hashtag keywords. They used a Transaction-based Rule Change Mining framework built on the Apriori method for association rule mining [72]. Dang et al. [73] proposed a Dynamic-Bayesian-Network-based model [74], which utilises the networks formed by the retweet chains on a topic and the follower–followee relations among the users, in order to infer the emergent keywords from the Twitter data stream within a specific time interval. Finally, the emergent topics are detected by applying the DBSCAN algorithm [75] to cluster the emergent keywords based on their co-occurrence relations. Fang et al. [76] proposed a MultiView Topic Detection (MVTD) framework that fuses semantic, hashtag and temporal relations among tweets to perform Co-training-based Multiview Clustering (CMC) [77 –79] for topic detection. The MVTD framework uses a suffix-tree-based vector space model as a document similarity measure.

Thapen et al. [80] proposed an event detection system to detect disease outbreaks by adapting the existing bio-surveillance early aberration reporting system (EARS) algorithm [81] to detect location-sensitive spikes in Twitter data. A linear SVM classifier was used to remove tweets that do not discuss disease outbreaks. Robinson et al. [82] applied statistical process control methods, such as dynamic bi-plots [83], Adaptive Exponentially Weighted Moving Averages (AEWMA) and Adaptive Cumulative Sums (ACUSUM) [84], on the tweets posted by the users of Twitter in Australia to identify unusual tweets that correspond to potential disease outbreaks.

Yin et al. [85] proposed an event detection framework for emergency situation awareness that utilises a probabilistic burst-detection module [86] to filter out the trivial tweets and perform incremental clustering on the tweets containing bursty features. A similar specialised event detection technique is proposed by Sakaki et al. [87], which predicts earthquakes and provides early warnings. In their work, each Twitter user is considered to be a sensor that propagates tweets as sensory information (i.e. text, time and location). A probabilistic approximation algorithm, called a particle filter [88], is used to estimate the centre and trajectory of the event location.

7. Evaluation methods

In this section, we discuss the widely employed evaluation methods from a selected number of articles (summarised in Table 4). This discussion focusses on the various performance metrics and the information regarding the tweet data sets used in different articles. As there is no standard evaluation method, the articles discussed in this section were selected to cover the measures most commonly employed (with small variations) by researchers, along with a few exceptional ones. Note that the different data sets collected by the researchers are usually from a small sample (around 1%) of the public data made available by the Twitter Streaming API, which can be filtered by location, keywords, language and so on.

Table 4.

Summary of the evaluation approaches

Articles	Ground truth	Public corpus	Corpus collection	Evaluation results
				Method	Precision	Recall	DERate
Li et al. [7] (Twevent)	No	No	Streaming data with filter	Baseline [24]	0.762	13/21	0.231
				Twevent	0.861	75/101	0.16
Marcus et al. [16] (TwitInfo)	Yes	No	Streaming data with filter	Baseline	x	x	x
				TwitInfo	0.80–0.95	0.80–0.95	x
Gaglio et al. [8] (TLDF)	No	No	Streaming data with filter	Baseline1 [18]	0.604–0.687	x	x
				Baseline2 [17]	0.546–0.753	x	x
				TLDF	0.923	x	x
Stilo and Velardi [9] (SAX*)	No	No	Streaming data	Baseline	x	x	x
				SAX*	0.79–0.91	x	x
Unankard et al. [52] (LSED)	No	No	Query-based using Search API	Baseline1 [89]	0.870	12/54	x
				Baseline2 [90]	0.913	13/949	x
				Baseline3 [18]	0.864	82/1024	x
				Baseline4 [91]	0.967	90/151	x
				LSED	0.973	90/136	x
De Boom et al. [13]	Yes	No	Streaming data with filter and query on Twitter REST API	Baseline [49]	0.649	0.363	x
				Proposed [13]	0.64	0.377	x
McMinn and Jose [12]	Yes	Yes	Streaming data with filters	Baseline1 [92]	0.048	32/506	x
				Baseline2 [5]	0.285	156/506	x
				Proposed [12]	0.636	194/506	x
Zhou et al. [11] (LECM)	Yes	No	Streaming data	Baseline [32]	0.642	x	x
				LECM	70.49%	x	x
Li et al. [38] (BEE+)	No	No	A crawler is used to obtain the data set	Baseline1 [17]	0.50	0.800	x
				Baseline2 [93]	0.50	0.733	x
				Baseline3 [94]	0.60	0.733	x
				BEE+	0.70	0.733	x
Parikh and Karlapalem [23] (ET)	No	No	Tweets collected from devices with GPS capabilities	Baseline	X	x	x
				ET	0.91	21/23	x
Guille and Favre [14] (MABED)	No	No	Streaming data with filter	Baseline1 [95]	0.600	0.456	0.250
				Baseline2 [23]	0.575	0.575	0
				Baseline3 [14]	0.625	0.525	0.160
				MABED	0.775	0.608	0.167

LSED: Location Sensitive Emerging Event Detection; SAX*: Symbolic Aggregate ApproXimation; TLDF: The Twitter Live Detection Framework; LECM: The Latent Event and Category Model; BEE+: Bursty Event dEtection; MABED: Mention-Anomaly-Based Event Detection.

Twevent [7] is evaluated using a collection of 4.3 million tweets published by Singapore-based users, collected over a period of 1 month. To evaluate Twevent, Li et al. [7] defined ‘precision’ as the fraction of detected events that are related to realistic events, ‘recall’ as the number of distinct realistic events detected from the data set on a daily basis and the ‘Duplicate Event Rate’ (DERate) as the percentage of events among all the detected realistic events which are duplicates. The system proposed by Weng and Lee (EDCoW) [24] is used as the baseline for Twevent. Marcus et al. [16] used tweets related to three soccer games and 1 month of earthquakes to evaluate their system. A human annotator was used to produce the ground truth for evaluation, using game videos from online, web-based game summaries and US Geological Survey on major earthquakes, in order to identify major soccer events and earthquakes within a specific time period.

TLDF [8] was evaluated on the data collected from Twitter, during the 64 matches of FIFA World Cup 2014. The precision is defined as the number of distinct events the system is able to detect compared with the actual number of distinct events observed during a session. The evaluation is also performed to determine redundancy, which is defined as the complimentary of the number of distinct events the system is able to detect, compared with the total number of detected events. TwitterMonitor [17] and enBlogue [18] were used as the baselines. Stilo and Velardi [9] propose an evaluation based on forming a web query using the detected event clusters and their associated dates. A candidate event cluster is considered as an actual event if a direct match is found using the web query formulated from the candidate event cluster. The evaluation was performed on 1% of a 1-year Twitter stream, with system parameters tuned to reduce the total number of events to be manually labelled for evaluation.

Unankard et al. [52] evaluated their system based on 1 week of around 200,000 tweets. No ground truth was used for the evaluation. The recall and precision were reported by manually inspecting the events detected by the system. Location-based emerging event detection (LEED) [91], enBlogue [18] and the systems proposed by Sayyadi et al. [89] and Ozdikis et al. [90] were used as the baselines. De Boom et al. [13] gathered 2 weeks’ worth of tweets collected from Flanders, Belgium-based users. A pool of hashtags gathered from the data set was used to query Twitter REST API to collect additional tweets. Finally, all the tweets in the data set were clustered to generate the ground truth; only the events containing hashtags were considered for the evaluation. The system proposed by Becker et al. [49] was used as the baseline.

McMinn and Jose [12] used the Events2012 corpus for evaluation. The corpus contains 120 million tweets and relevance judgments for 506 events. Crowdsourcing was utilised to evaluate the performance of their system, and the LSH approach of Petrovic et al. [5] and Cluster Summarisation approach of Aggarwal and Subbian [92] were used as the baselines. Zhou et al. [11] evaluated their work on a data set of 64 million tweets collected in December, 2010, with TwiCal [32] as the baseline. Only the precision of the system was reported; however, the authors used a small manually labelled data set to evaluate the system recall and reported a recall of 25.73%. BEE+ [38] was evaluated on a Weibo (Chinese Microblog) data set, with PLSA [93], TwitterMonitor [17] and BEE [94] as baselines. The authors reported the best result achieved by their system with the maximum number of topics (clusters) set to 10. Parikh and Karlapalem [23] used a small data set of 33579 tweets. No ground truth was used for the evaluation. The recall and precision were reported based on a manual inspection of the events detected by the system.

Guille and Favre [14] evaluated their system on an English language text corpus containing around 1.4 million tweets, published in November, 2009. TS [95], ET [23], and $α - MABED$ (a variant of MABED [14]) were used as the baselines for their system. The top 40 events reported by each system were manually inspected to report the precision of the detected events. Recall is reported as the fraction of distinct significant events among all the detected events, and DERate [7] is reported as the percentage of events that are duplicates among all the significant events. Xie et al. [10, 37] performed a comparison with Twevent [7] as part of their evaluation. The comparison was performed based on a manual analysis of the events detected by both the systems, in order to identify the differences between their results.

8. Pre-processing

The common pre-processing techniques applied on the Twitter data stream are as follows: POS tagging, NER, resolving temporal expressions, slang-word conversion, tweet filtering based on specific criteria (i.e. discarding retweets and/or non-English language tweets), and removing stop words, URL and user-name mentions from tweets.

The named entity tagger called T-NER [33] was developed to deal with the noisy data present in a tweet. The authors have reported that T-NER outperforms the Stanford tagger by a 52% increase in the F1 score. T-NER was used for named entity tagging in TwiCal [32] and also by the GEAM event detection system [36]. McMinn and Jose [12] used the GATE Twitter POS model [63] for POS tagging and the NER task. Zhou et al. [11] used a Twitter-trained POS tagger by Gimpel et al. [96] to perform POS tagging, and named entity tagging was performed using T-NER [33].

A temporal expression extracted from a tweet in TwiCal [32] is resolved into an unambiguous calendar reference by using a tool called TempEx [34]. LECM [11] used SUTime [97] to resolve temporal expressions. You et al. [36] used the natural language date parser – Natty⁸– to extract time information.

Unankard et al. [52] performed slang conversion based on the Internet slang dictionary.⁹ Each term is searched in the dictionary to convert the matching slang into a proper English word. Madani et al. [44] used the Porter algorithm [98] to transform variants of words into a single stem by removing suffixes or prefixes. Different word forms are mapped to their base forms using WordNet [99] for lemmatisation.

McMinn and Jose [12] use different filtering approaches as part of their pre-processing. They discard retweets and all the tweets which do not contain any named entities. Term-level filters are also employed to remove the tweets which contain terms that are usually associated with spam and noise (e.g. ‘follow’ and ‘watch’). Collectively, these filters remove more than 90% of tweets, significantly reducing the amount of data to be processed by the system. You et al. [36] discard tweets that do not contain any named entities or hashtags. Zhou et al. [11] have experimented with two different tweet filtering techniques. The first one involved creating a word lexicon based on the terms extracted from the newspaper articles that were published around the same time as the collected body. The second approach was to build an SVM-based classifier using a multitude of word features and event-related features. They have reported that the word-lexicon-based approach gives a higher precision result than the SVM-based approach.

9. Event summarisation

Once an event is detected from the Twitter data stream, it needs to be presented to the user in a meaningful and informative way, as the tweets themselves do not necessarily provide a good summary of an event.

Sharifi et al. [100] developed an algorithm that can automatically generate one-line summaries from tweets related to a given topic. Provided with a topic, the proposed Phrase Reinforcement (PR) algorithm retrieves tweets related to the topic. From each tweet, the longest sentence containing the topic is extracted. These sentences act as an input to the PR algorithm, allowing construction of a graph which represents the common sequences of words encompassing the topic. The topic is the root node or the central node, and each input sentence is a sequence of nodes containing a single word. The weight of a node is calculated based on the frequency of the word contained in the node and the distance of the node from the root node. Once the graph is generated, the PR algorithm finds the path with the highest total weight from the root node to a leaf node in order to generate the summary.

Madani et al. [44] used TextRank [101], a graph-based, unsupervised ranking model for natural language processing which derives extractive summaries from the given texts. From the tweets associated with a topic cluster, a dependency graph is created where the nodes are the tweets in the cluster and the edges between the nodes correspond to the similarity between tweets. The similarity between two tweets $T_{i}$ and $T_{j}$ , based on the set of terms $k$ belonging to the tweets, is defined as

Sim (T_{i}, T_{j}) = \frac{\sum_{k \in T_{i}, T_{j}} (freq (k, T_{i}) + freq (k, T_{j}))}{(\log (| T_{i} |) + \log (| T_{j} |))}

(7)

TextRank is applied on the tweet dependency graph created from a cluster and the score of a given tweet $T_{i}$ within the cluster, in order to rank the tweets and generate a summary. It is calculated in the following equation

S (T_{i}) = \sum_{T_{j} \in nested (T_{i})} \frac{Sim (T_{i}, T_{j})}{\sum_{T_{k} \in nested (T_{j})} Sim (T_{k}, T_{j})} S (T_{j})

(8)

where $T_{j} \in nested (T_{i})$ refers to the nodes that are connected to $T_{i}$ , and $T_{k} \in nested (T_{j})$ refers to the nodes that are connected to $T_{j}$ .

The ReDites system [47] contains a Tracking and Summarization (TaS) component. When an event is created, the real-time tracking sub-component retrieves new tweets about the event based on the most informative terms contained in the event-related tweet. The summarisation sub-component removes tweets that are textually similar to generate an extractive summary of tweets based on their temporal order.

10. Discussion

Not all approaches discussed in this survey may be applicable in the real world. This is mostly due to an issue of scalability, as some event detection methods are not equipped to handle the high volume of streaming data. Moreover, the use of a clustering approach which requires the total number of clusters to be fixed is often not useful for a streaming data environment, where a wide variety of topics are discussed, making it difficult to predict the total number of expected event clusters in advance.

Term-interestingness-based approaches usually differ in the term selection methods they employ, as well as in the way term correlations are computed and changes in term correlations are tracked. The approaches based on term interestingness can often capture misleading term correlations, while measuring term correlations can be computationally prohibitive in an online setting.

Topic-modelling-based approaches work under the assumption that some latent topics always exist in the tweets that are processed. Tweets are modelled as a mixture of topics, and each topic has a probability distribution over the terms contained in those tweets. LDA [102] has been the most used probabilistic topic model, where the topic distribution is assumed to have a Dirichlet prior. Due to the limit imposed on the length of a tweet, capturing good topics from the limited context is a problem that needs to be addressed. Moreover, the topic-modelling-based approaches usually incur too high a computational cost to be effective in a streaming setting and are not quite effective in handling the events that are reported in parallel [103]. Stilo and Velardi [9] noted that LDA-based methods usually only work in an offline manner, as the temporal aspect of the events are not often considered.

Incremental-clustering-based approaches are prone to fragmentation and are usually unable to distinguish between two similar events taking place around the same time. The fragmentation refers to phenomena where the same event is detected multiple times as a new event. Becker et al. [49] and McMinn and Jose [12] incorporated a separate component to address the fragmentation issue. Incremental-clustering-based approaches usually employ a similarity threshold value to perform clustering. Caution needs to be taken while empirically setting the threshold for determining the similarity of a tweet with a cluster, to ensure that the threshold value works well with the dynamic nature of fast changing topics in the Twitter stream.

Lehmann et al. [104] and Yang and Leskovec [105] have shown the existence of a number of different temporal patterns of events besides the event-pattern with bursty characteristics; therefore, event detection approaches which are solely dependent on identifying the bursty characteristic can fail to detect other event-patterns such as events with a relatively slow but lengthy burst of related tweets [9].

Some papers incorporate specific measures into rank events to directly address the task of identifying real-world events from a set of candidate events which were generated by their system. The supervised approaches adopted in De Boom et al., [13] Osborne et al. [47] and Becker et al. [49] might not be able to account for situations where there is a concept drift in the Twitter data stream. In addition, training a classifier in an offline manner, by labelling the Twitter data manually, can be very time-consuming. There are also unsupervised approaches [5, 24, 25, 21, 51, 52] adopted by the researchers with varying degrees of success, which rank events for the purpose of filtering trivial events out of the system-generated candidate event set.

Event ranking depends a good deal on the system’s ability to filter out spam tweets. Determining the credibility of tweets can also help in the event ranking process. Castillo et al. [106] adopted a supervised learning approach to determine the credibility of a given set of tweets, with a maximum precision and recall of nearly 80%. The features used in their supervised learning technique were based on the information found in the microblogging social media platform. The most important features to determine tweet credibility were, generally, message-based, topic-based, user-based and propagation-based features. First, TwitterMonitor [17], along with a J48 decision tree algorithm implemented in the Weka data mining tool, is used to automatically identify news events from a Twitter data set. Once the news events are identified, supervised classification is used to assess the credibility of the news. Gupta and Kumaraguru [107] have also used a supervised learning and pseudo-relevance-feedback-based approach to determine the credibility score of tweets. Regression analysis was used to discover the best features for credibility assessment. Tweet credibility detection can be used as part of the pre-processing stage, or even at the post-processing stage, to detect the credibility of the related tweets of a detected event.

Although most of the proposed approaches are evaluated using tweet data sets collected from varying locations and durations, the unavailability of a publicly available text corpus, along with the ground truth for a system performance evaluation, is a major setback for the event detection techniques discussed in our survey. To the best of our knowledge, at the time of writing this survey, the only publicly available corpus of data with the ground truth for evaluation purposes is the one provided by McMinn et al. [108]. The use of this can help us conduct a fair performance comparison among the different event detection approaches proposed by the researchers.

The survey conducted by Atefeh and Khreich [6] discussed the event detection techniques as either unsupervised or supervised event detection approaches. The Atefeh and Khreich [6] pointed out the benefit for unsupervised approaches of not requiring any labelled data, over supervised approaches which require a manually labelled data set to train a classifier. However, they suggested that the unsupervised clustering approaches can be further optimised to avoid cluster fragmentation and improve clustering efficiency by incorporating additional indicators such as location proximity for clustering, besides the temporal aspect of the tweets. Using geotags as a measure of location proximity can determine whether the tweets under consideration belong to the same event. The problem of sparsely labelled data used in supervised clustering can be handled by semi-supervised learning [109], where a small set of labelled data can be coupled with a large unlabelled data set to build a classifier, or by transfer learning [110], where useful information can be extracted from different, but related, domains. Atefeh and Khreich also stressed the need for representative data sets and a common test bed for the performance evaluation of event detection systems. In addition, combining different social-media-based information sources such as Facebook, Flickr and YouTube, along with Twitter, was suggested to provide a more robust solution for event detection.

11. Conclusion

We have conducted a survey on the noteworthy and recent event detection techniques that focus on detecting real-world events of global and/or local interest from the Twitter data stream, broadly categorising different approaches based on term interestingness, topic modelling, incremental clustering and hybrid methods. We have discussed the specific detection methodologies employed by researchers and the measures taken to distinguish between real-world events and trivial events of no consequence. This article has also briefly discussed the evaluation methods adopted by different researchers and the common Twitter data pre-processing techniques involved in the proposed systems. In addition, we have discussed a few techniques that focus on summarising the tweets of an event. We note that efficient temporal summarisation of event-related tweets is a less explored research area and an important avenue for future work.

Further work is required to propose effective measures to filter out spam and trivial events. In addition, it is imperative that proposed event detection systems are evaluated on a publicly available text corpus of data, with a rich collection of a variety of events, to facilitate a fair comparison of different systems. This article provides a structured representation of the various aspects of Twitter-centric event detection techniques, which can be a helpful starting point for researchers endeavouring to work in this field.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

M.H. has been supported by a Macquarie University Research Excellence Scholarship (MQRES).

Notes

References

Dou

Wang

Ribarsky

. Event detection in social media data. In: Proceedings of the IEEE VisWeek workshop on interactive visual text analytics – task driven analytics of social media content. 2012, pp. 971–980, http://sites.computer.org/debull/A13sept/p51.pdf

Allan

. Topic detection and tracking: event-based information organization, vol. 12. Berlin: Springer, 2012.

Petrovic

Osborne

McCreadie

. Can Twitter replace newswire for breaking news? In: Proceedings of the international AAAI conference on web and social media (ICWSM), 2013, http://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/view/6066/6331

Osborne

Dredze

. Facebook, Twitter and Google Plus for breaking news: is there a winner? In:Proceedings of the international AAAI conference on web and social media (ICWSM), 2014, http://homepages.inf.ed.ac.uk/miles/papers/icwsm14.pdf

Petrovic

Osborne

Lavrenko

. Streaming first story detection with application to Twitter. In: Proceedings of the annual conference of the North American chapter of the association for computational linguistics: human language technologies (HLT ’10), Stroudsburg, PA: ACL, 2010, pp. 181–189.

Atefeh

Khreich

. A survey of techniques for event detection in Twitter. Comput Intell2015; 31(1): 132–164.

Sun

Datta

. Twevent: segment-based event detection from Tweets. In: Proceedings of the ACM international conference on information and knowledge management (CIKM ’12), Maui, HI, 29 October–2 November 2012, pp. 155–164. New York: ACM.

Gaglio

Morana

. A framework for real-time Twitter data analysis. Comput Commun2016; 73: 236–242.

Stilo

Velardi

. Efficient temporal mining of micro-blog texts and its application to event discovery. Data Min Knowl Disc2016; 30: 372–402.

10.

Xie

Zhu

Jiang

. TopicSketch: real-time bursty topic detection from Twitter. In: Proceedings of the IEEE 13th international conference on data mining (ICDM), 7–10 December 2013, pp. 837–846. New York: IEEE.

11.

Zhou

Chen

. An unsupervised framework of exploring events on Twitter: filtering, extraction and categorization. In: Proceedings of the AAAI conference on artificial intelligence, 2015, pp. 2468–2475, http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9413/9542

12.

McMinn

Jose

. Real-time entity-based event detection for Twitter. In: Mothe

Savoy

Kamps

. (eds) Experimental IR meets multilinguality, multimodality, and interaction: 6th international conference of the CLEF association (CLEF ’15). Berlin: Springer, 2015, pp. 65–77.

13.

De Boom

Van Canneyt

Dhoedt

. Semantics-driven event clustering in Twitter feeds. In: Proceedings of the 5th workshop on making sense of microposts, vol. 1395, 2015, pp. 2–9, CEUR, https://biblio.ugent.be/publication/6887623/file/6887635.pdf

14.

Guille

Favre

. Event detection, tracking, and visualization in Twitter: a mention-anomaly-based approach. Soc Netw Anal Mining2015; 5(1): 1–18.

15.

Jarvis

Patrick

. Clustering using a similarity measure based on shared near neighbors. IEEE T Comput1973; 100(11): 1025–1034.

16.

Marcus

Bernstein

Badar

. TwitInfo: aggregating and visualizing microblogs for event exploration. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI’11), 2011, pp. 227–236. New York: ACM, http://people.csail.mit.edu/msbernst/papers/twitinfo-chi2011.pdf

17.

Mathioudakis

Koudas

. TwitterMonitor: trend detection over the Twitter stream. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD ’10), 2010, pp. 1155–1158. New York: ACM, http://snap.stanford.edu/class/cs224w-readings/mathioudakis10twitter.pdf

18.

Alvanaki

Sebastian

Ramamritham

. EnBlogue: emergent topic detection in Web 2.0 streams. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD ’11), Athens, 12–16 June 2011, pp. 1271–1274. New York: ACM.

19.

Petkos

Papadopoulos

Aiello

. A soft frequent pattern mining approach for textual topic detection. In: Proceedings of the 4th international conference on web intelligence, mining and semantics (WIMS), Thessaloniki, 2–4 June 2014, pp. 25-1–25-10. New York: ACM.

20.

Gaglio

Lo Re

Morana

. Real-time detection of Twitter social events from the user’s perspective. In: Proceedings of the IEEE international conference on communications (ICC), London, U.K., 8–12 June 2015, pp. 1207–1212. New York: IEEE.

21.

Cataldi

Di Caro

Schifanella

. Emerging topic detection on Twitter based on temporal and social terms evaluation. In: Proceedings of the tenth international workshop on multimedia data mining (MDMKDD ’10), Washington, DC, 25 July 2010, pp. 4-1–4-10. New York: ACM.

22.

Lin

Keogh

Wei

. A novel symbolic representation of time series. Data Min Knowl Disc2007; 15(2): 107–144.

23.

Parikh

Karlapalem

. ET: events from tweets. In: Proceedings of the 22nd international conference on world WideWeb (WWW’13 Companion), 2013, pp. 613–620, New York: ACM, http://www2013.org/companion/p613.pdf

24.

Weng

Lee

. Event detection in Twitter. In: Proceedings of the international AAAI conference on web and social media (ICWSM), vol. 11, 2011, pp. 401–408, http://www.hpl.hp.com/techreports/2011/HPL-2011-98.pdf

25.

Zhang

Chen

. Event detection and popularity prediction in microblogging. Neurocomput2015; 149: 1469–1480.

26.

Salton

Buckley

. Term-weighting approaches in automatic text retrieval. Inform Process Manage1988; 24(5): 513–523.

27.

IETF NetworkWorking Group. Rfc 2988: computing TCPs retransmission timer, 2015, http://tools.ietf.org/html/rfc2988

28.

Deerwester

Dumais

Furnas

. Indexing by latent semantic analysis. J Am Soc Inform Sci1990; 41(6): 391.

29.

Finkel

Grenager

Manning

. Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd annual meeting on association for computational linguistics (ACL), Ann Arbor, MI, 25–30 June 2005, pp. 363–370. New York: ACM.

30.

Page

Brin

Motwani

. The PageRank citation ranking: bringing order to the web. Stanford, CA: Stanford InfoLab, 1999.

31.

Kaiser

. A friendly guide to wavelets. Cambridge, MA: Birkhauser Boston, 1994.

32.

Ritter

Mausam Etzioni

Clark

. Open domain event extraction from Twitter. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’12), Beijing, China, 12–16 August 2012, pp. 1104–1112. New York: ACM.

33.

Ritter

Clark

Mausam

. Named entity recognition in Tweets: an experimental study. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP ’11), Edinburgh, 27–31 July 2011, pp. 1524–1534. Stroudsburg, PA: ACL.

34.

Mani

Wilson

. Robust temporal processing of news. In: Proceedings of the 38th annual meeting on association for computational linguistics, Hong Kong, 3–6 October 2000, pp. 69–76. Stroudsburg, PA: ACL.

35.

Lafferty

McCallum

Pereira

FCN

. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning (ICML ’01), Williamstown, MA, USA, 28 June–1 July 2001, pp. 282–289. San Francisco, CA: Morgan Kaufmann Publishers.

36.

You

Huang

Cao

. GEAM: a general and event-related aspects model for Twitter event detection. In: Lin

Manolopoulos

Srivastava

(eds) Web information systems engineering (WISE ’13). Nanjing, China, Berlin: Springer, 2013, pp. 319–332.

37.

Xie

Zhu

. CLEar: a real-time online observatory for bursty and viral events. Proc VLDB Endowment2014; 7(13): 1637–1640.

38.

Wen

Tai

. Bursty event detection from microblog: a distributed and incremental approach. Concurren Comput Pract Experience2015; 28: 3115–3130.

39.

Cai

Yang

. What are popular: exploring Twitter features for event detection, tracking and visualization. In: Proceedings of the 23rd annual ACM conference on multimedia conference, Brisbane, QLD, Australia, 26–30 October 2015, pp. 89–98. New York: ACM.

40.

Jia

Shelhamer

Donahue

. Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM international conference on multimedia, Orlando, FL, 3–7 November 2014, pp. 675–678. New York: ACM.

41.

Shepard

. Nonparametric Bayes Pachinko allocation for super-event detection in Twitter. In: Proceedings of the IEEE region 10 conference (TENCON), Bangkok, Thailand, 22–25 October, 2014, p. 1–5. New York: IEEE.

42.

Diao

Jiang

. A unified model for topics, events and users on Twitter. In: Proceedings of the conference on empirical methods in natural language processing, 2013, http://www.aclweb.org/anthology/D13-1192

43.

Blei

McCallum

. Nonparametric Bayes Pachinko allocation. arXiv preprint arXiv:12065270, 2012.

44.

Madani

Boussaid

Zegour

. Real-time trending topics detection and description from Twitter content. Soc Netw Anal Mining2015; 5(1): 1–13.

45.

Teh

Jordan

Beal

. Hierarchical Dirichlet processes. J Am Stat Assoc2012; 101: 1566–1581.

46.

Hasan

Orgun

Schwitter

. TwitterNews+: a framework for real time event detection from the Twitter data stream. In: Proceedings of the 8th international conference on social informatics (SocInfo ’16) Bellevue, WA, USA, Berlin: Springer, 2016, pp. 224–239.

47.

Osborne

Moran

McCreadie

. Real-time detection tracking. monitoring of automatically discovered events in social media. In: Proceedings of the 52nd annual meeting of the association for computational linguistics. Association for Computational Linguistics, 2014, http://homepages.inf.ed.ac.uk/miles/papers/acl14a.pdf

48.

Cano Basave

Liu

. A weakly supervised Bayesian model for violence detection in social media. In: Proceedings of the international joint conference on natural language processing (IJCNLP). Asian Federation of Natural Language Processing, 2013, http://www.aclweb.org/anthology/I13-1013

49.

Becker

Naaman

Gravano

. Beyond trending topics: real-world event identification on Twitter. In: Proceedings of the fifth international AAAI conference on weblogs and social media (ICWSM), 2011, https://mice.cs.columbia.edu/getTechreport.php?techreportID=1462&format=pdf&

50.

Fan

Chang

Hsieh

. A library for large linear classification. J Mach Learn Res2008; 9: 1871–1874.

51.

Phuvipadawat

Murata

. Breaking news detection and tracking in Twitter. In: Proceedings of the IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology. Toronto, Canada, vol. 3 of WI-IAT ’10, 31 August–3 September, Washington, DC: IEEE Computer Society, 2010, pp. 120–123.

52.

Unankard

Sharaf

. Emerging event detection in social networks with location sensitivity. World Wide Web2015; 18(5): 1393–1417.

53.

Duda

Hart

. Pattern classification and scene analysis, vol. 3. New York: Wiley, 1973.

54.

Kaleel

Abhari

. Cluster-discovery of Twitter messages for event detection and trending. J Comput Sci2015; 6: 47–57.

55.

Lee

Chien

. Leveraging microblogging big data with a modified density-based clustering approach for event awareness and topic ranking. J Inf Sci2013; 39: 523–543.

56.

Lee

. Mining spatio-temporal information on microblogging streams using a density-based online clustering method. Expert Syst Appl2012; 39(10): 9623–9641.

57.

Ester

Kriegel

Sander

. Incremental clustering for mining in a data warehousing environment. In: Proceedings of the 24th international conference on very large databases (VLDB ’98), New York, 24–27 August 1998, pp. 323–333. New York: ACM.

58.

Lee

Yang

Chien

. A novel Algorithm for extracting real-time microblogging topics. In: Proceedings of the 23rd international conference on computer applications in industry and engineering, Las Vegas, NV, 8–10 November 2010, pp. 55–60. Winona, MN: ISCA

59.

Lee

Chien

Yang

. An automatic topic ranking approach for event detection on microblogging messages. In: Proceedings of the international conference on systems, man, and cybernetics (SMC ’11), Anchorage, AK, 9–12 October 2011, pp. 1358–1363. New York: IEEE.

60.

Osborne

Petrovic

McCreadie

. Bieber no more: first story detection using Twitter and Wikipedia. In: Proceedings of the SIGIR workshop on time-aware information access, 2012, http://homepages.inf.ed.ac.uk/miles/papers/TwitterWikipedia.pdf

61.

Zhao

Jiang

. Topical keyphrase extraction from Twitter. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies (HLT ’11), Portland, OR, 19–24 June 2011, vol. 1, pp. 379–388. New York: ACM.

62.

Mehrotra

Sanner

Buntine

. Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, Dublin, 28 July–1 August 2013, pp. 889–892. New York: ACM.

63.

Derczynski

Ritter

Clark

. Twitter part-of-speech tagging for all: overcoming sparse and noisy data. In: Proceedings of the recent advances in natural language processing (RANLP), 2013, pp. 198–206, http://www.aclweb.org/anthology/R13-1026

64.

Pukelsheim

. The three sigma rule. Am Statist1994; 48(2): 88–91.

65.

Miller

. WordNet: a lexical database for English. Commun ACM1995; 38(11): 39–41.

66.

Tan

Steinbach

Kumar

. Introduction to data mining. 1st ed. Boston, MA: Addison-Wesley Longman Publishing, 2005.

67.

Chierichetti

Kleinberg

Kumar

. Event detection via communication pattern analysis. In: Proceedings of the international AAAI conference on web and social media (ICWSM), 2014, https://www.cs.cornell.edu/home/kleinber/icwsm14-event.pdf

68.

Huang

Peng

Wang

. Topic detection from large scale of microblog stream with high utility pattern clustering. In: Proceedings of the 8th workshop on Ph.D. workshop in information and knowledge management (PIKM ’15), Melbourne, VIC, Australia, 19 October 2015, pp. 3–10. New York: ACM.

69.

Han

Pei

Yin

. Mining frequent patterns without candidate generation. Sigmod Rec2000; 29(2): 1–12.

70.

Newman

. Modularity and community structure in networks. P Natl Acad Sci2006; 103(23): 8577–8582.

71.

Adedoyin-Olowe

Gaber

Dancausa

. A rule dynamics approach to event detection in Twitter with its application to sports and politics. Expert Syst Appl2016; 55: 351–360.

72.

Adedoyin-Olowe

Gaber

Stahl

. TRCM: a methodology for temporal analysis of evolving concepts in Twitter. In: Rutkowski

Korytkowski

Scherer

. (eds) Proceedings of the international conference on artificial intelligence and soft computing. Berlin: Springer, 2013, pp. 135–145.

73.

Dang

Gao

Zhou

. Early detection method for emerging topics based on dynamic Bayesian networks in micro-blogging networks. Expert Syst Appl2016; 57: 285–295.

74.

Murphy

. Dynamic Bayesian networks. Probabilistic Graphical Models (ed M

Jordan

), 2002, p. 7, https://ibug.doc.ic.ac.uk/media/uploads/documents/courses/DBN-PhDthesis-LongTutorail-Murphy.pdf

75.

Ester

Kriegel

Sander

. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the international conference on knowledge discovery and data mining, 1996, pp. 226–231, http://www.dbs.ifi.lmu.de/Publikationen/Papers/KDD-96.final.frame.pdf

76.

Fang

Zhang

. Detecting hot topics from Twitter: a multiview approach. J Inf Sci2014; 40(5): 78–593.

77.

Kumar

Daumé

. A co-training approach for multi-view spectral clustering. In: Proceedings of the 28th international conference on machine learning (ICML ’11), 2011, pp. 393–400, https://pdfs.semanticscholar.org/b166/9cc730f70f4e58f39fb39d04a504c148606b.pdf

78.

Jordan

Weiss

. On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems 2. MIT Press, 2002, pp. 849–856, http://ai.stanford.edu/~ang/papers/nips01-spectral.pdf

79.

Kumar

Rai

Daume

. Co-regularized multi-view spectral clustering. In: Advances in neural information processing systems 24. Curran Associates, 2011, pp. 1413–1421, https://www.cse.cuhk.edu.hk/irwin.king/_media/presentations/co-regularized_multi-view_spectral_clustering.pdf

80.

Thapen

Simmie

Hankin

. The early bird catches the term: combining Twitter and news data for event detection and situational awareness. arXiv preprint arXiv:150402335, 2015, https://arxiv.org/abs/1504.02335

81.

Hutwagner

Thompson

Seeman

. The bioterrorism preparedness and response early aberration reporting system (EARS). J Urban Health2003; 80(1): i89–i96.

82.

Robinson

Sparks

Power

. Social media monitoring for health indicators. In: Proceedings of the 21st international congress on modelling and simulation (MODSIM ’15), Modelling and Simulation Society of Australia and New Zealand, 2015, pp. 490–496, https://www.mssanz.org.au/modsim2015/K1/robinson.pdf

83.

Sparks

Adolphson

Phatak

. Multivariate process monitoring using the dynamic biplot. Int Stat Rev1997; 65(3): 325–349.

84.

Sparks

Carter

Graham

. Understanding sources of variation in syndromic surveillance for early warning of natural or intentional disease outbreaks. IIE Trans2010; 42(9): 613–631.

85.

Yin

Lampert

Cameron

. Using social media to enhance emergency situation awareness. IEEE Intell Syst2012; 27(6): 52–59.

86.

Fung

GPC

. Parameter free bursty events detection in text streams. In: Proceedings of the 31st international conference on very large databases VLDB endowment, Trondheim, 30 August–2 September 2005, pp. 181–192. New York: ACM.

87.

Sakaki

Okazaki

Matsuo

. Tweet analysis for real-time event detection and earthquake reporting system development. IEEE T Knowl Data En2013; 25(4): 919–931.

88.

Fox

Hightower

Liao

. Bayesian filtering for location estimation. IEEE Pervas Comput2003; 3: 24–33.

89.

Sayyadi

Hurst

Maykov

. Event detection and tracking in social streams. In: Proceedings of the international AAAI conference on web and social media (ICWSM), 2009, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.7972&rep=rep1&type=pdf

90.

Ozdikis

Senkul

Oguztuzun

. Semantic expansion of hashtags for enhanced event detection in Twitter. In: Proceedings of the 1st international workshop on online social systems, 2012, http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=411048DC5F6E2CF15BAB68365B8E5F5A?doi=10.1.1.455.5135&rep=rep1&type=pdf

91.

Unankard

Sharaf

. Location-based emerging event detection in social networks. In: Ishikawa

Wang

. (eds) Web technologies and applications, vol. 7808. Berlin: Springer, 2013, pp. 280–291.

92.

Aggarwal

Subbian

. Event detection in social streams. In: Proceedings of the SIAM international conference on data mining (SDM), 2012, vol. 12, pp. 624–635. SIAM, http://charuaggarwal.net/sdm2012-camera.pdf

93.

Hofmann

. Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, Berkeley, CA, 15–19 August 1999, pp. 50–57. New York: ACM.

94.

Tai

Zhang

. Online bursty event detection from microblog. In: Proceedings of the IEEE/ACM 7th international conference on utility and cloud computing (UCC), London, 8–11 December 2014, pp. 865–870. New York: IEEE.

95.

Benhardus

Kalita

. Streaming trend detection in Twitter. Int J Web Based Commun2013; 9(1): 122–139.

96.

Gimpel

Schneider

O’Connor

. Part-of-speech tagging for Twitter: annotation, features, and experiments. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies (HLT ’11), Portland, OR, 19–24 June 2011, vol. 2, pp. 42–47. Stroudsburg, PA: ACL.

97.

Chang

Manning

. SUTime: a library for recognizing and normalizing time expressions. In: Proceedings of the language resources and evaluation conference (LREC), 2012, pp. 3735–3740, http://nlp.stanford.edu/pubs/lrec2012-sutime.pdf

98.

Porter

. An algorithm for suffix stripping. Program1980; 14(3): 130–137.

99.

Miller

Fellbaum

. WordNet: An electronic lexical database. Cambridge, MA: MIT Press, 1998.

100.

Sharifi

Hutton

Kalita

. Summarizing microblogs automatically. In: Proceedings of the annual conference of the North American chapter of the association for computational linguistics: human language technologies (HLT ’10), Los Angeles, CA, 2–4 June 2010, pp. 685–688. Stroudsburg, PA: ACL.

101.

Mihalcea

Tarau

. TextRank: bringing order into texts. In: Proceedings of the conference on empirical methods in natural language processing, 2004, ACL, https://digital.library.unt.edu/ark:/67531/metadc30962/m2/1/high_res_d/Mihalcea-2004-TextRank-Bringing_Order_into_Texts.pdf

102.

Blei

Jordan

. Latent Dirichlet allocation. J Mach Learn Res2003; 3: 993–1022.

103.

Aiello

Petkos

Martin

. Sensing trending topics in Twitter. IEEE T Multimedia2013; 15(6): 1268–1282.

104.

Lehmann

Gonçalves

Ramasco

. Dynamical classes of collective attention in Twitter. In: Proceedings of the international conference on World Wide Web, Lyon, 16–20 April 2012, pp. 251–260. New York: ACM.

105.

Yang

Leskovec

. Patterns of temporal variation in online media. In: Proceedings of the 4th ACM international conference on web search and data mining, Hong Kong, China, 9–12 February 2011, pp. 177–186. New York: ACM.

106.

Castillo

Mendoza

Poblete

. Information credibility on Twitter. In: Proceedings of the 20th international conference on World Wide Web (WWW ’11), Hyderabad, India, 28 March–1 April 2011, pp. 675–684. New York: ACM.

107.

Gupta

Kumaraguru

. Credibility ranking of Tweets during high impact events. In: Proceedings of the 1st workshop on privacy and security in online social media (PSOSM ’12), Lyon, 17 April 2012, pp. 2-2–2-8. New York: ACM.

108.

McMinn

Moshfeghi

Jose

. Building a large-scale corpus for evaluating event detection on Twitter. In: Proceedings of the 22nd ACM international conference on information and knowledge management (CIKM ’13), San Francisco, CA, 27 October–1 November 2013, pp. 409–418, New York: ACM.

109.

Chapelle

Schölkopf

Zien

. Semi-supervised learning (Adaptive computation and machine learning series). Cambridge, MA: The MIT Press, 2006.

110.

Pan

Yang

. A survey on transfer learning. IEEE T Knowl Data En2010; 22(10): 1345–1359.