Abstract
Transit providers have used social media (e.g., Twitter) as a powerful platform to shape public perception and provide essential information, especially during times of disruption and disaster. This work examines how transit agencies used Twitter during the COVID-19 pandemic to communicate with riders and how the content and general activity influence rider interaction and Twitter handle popularity. We analyzed 654,345 tweets generated by the top 40 transit agencies in the US, based on Vehicles Operated in Annual Maximum Service (VOM), from January 2020 to August 2021. We developed an analysis framework, using advanced machine learning and natural language processing models, to understand how agencies’ tweeting patterns are associated with rider interaction outcomes during the pandemic. From the transit agency perspective, we find smaller agencies tend to generate a higher percentage of COVID-related tweets and some agencies are more repetitive than their peers. Six topics (i.e., face covering, essential service appreciation, free resources, social distancing, cleaning, and service updates) were identified in the COVID-related tweets. From the followers’ interaction perspective, most agencies gained followers after the start of the pandemic (i.e., March 2020). The percentage of follower gains is positively correlated with the percentage of COVID-related tweets, tweets replying to followers, and tweets using outlinks. The average like counts per COVID-related tweet is positively correlated with the percentage of COVID-related tweets and negatively correlated with the percentage of tweets discussing social distancing and agency repetitiveness. This work can inform transportation planners and transit agencies on how to use Twitter to effectively communicate with riders to improve public perception of health and safety as it relates to transit ridership during delays and long-term disruptions such as those created by the COVID-19 public health crisis.
Introduction
COVID-19 has disrupted the daily way of life for everyone working and living in the United States, and globally. Transit agencies have suffered—as ridership and farebox revenue declined (EBP, 2021). During this time, many transit riders have been faced with barriers to their trip planning, including a reduction in transit service, restrictions in transit capacity, and general concerns about health and safety. Additionally, as states and communities introduced stay-at-home orders, transportation needs shifted dramatically for large portions of the population (Liu et al., 2020).
Prior research suggests that while social media can provide a larger, immediate platform for communication, it can also increase the spread of misinformation, promote negative perceptions and attitudes towards public services, and change the behaviors of individuals (Evans-Cowley and Griffin, 2012; Gordon et al., 2013; Schweitzer, 2014). However, during times of crisis, disaster, and disruption, “just-in-time” communication is critically important to disseminate information and these types of platforms can prove to be part of an essential communication strategy (Chan and Schofer, 2014). Transit agencies rely on a multitude of communication platforms, including social media sites, to maintain active lines of communication with riders and other agencies. Twitter remains the most popular social media platform used by transit agencies (Liu et al., 2016).
Existing works on transit agencies’ use of social media found that Twitter has been widely used by transit agencies for both daily and disaster communication purposes (Bregman, 2012a, 2012; Liu et al., 2016; Raymond and O’Hara, 2014) and communication patterns are correlated with user interaction outcomes (Hosseini et al., 2018). More recent studies used advanced machine learning and text mining techniques to understand transit social media use patterns and rider satisfaction (Hosseini et al., 2018; Schweitzer, 2014). These studies consistently found that tweeting patterns (e.g., the number of tweets and communication strategies, such as replies to riders, content, and network size) are correlated with follower interaction outcomes (e.g., like counts and sentiments). Multiple studies suggest social media has the advantage to disseminate essential information in a real-time manner, which helps reduce public uncertainties and concerns in a time of crisis (Diaz et al., 2021; Harazeen, 2011; Pender et al., 2014).
This work advances the existing knowledge by examining the use of Twitter as a tool for communication during the COVID-19 pandemic, through developing and applying a transit tweet analysis framework to systematically characterize transit agencies’ tweeting patterns and rider interaction outcomes. The framework further explores how the two components are correlated with each other, using advanced machine learning and natural language processing models for the top 40 agencies in the U.S. Specifically, we used our framework to address three research questions: (1) How do general tweeting communication activity pattern and content vary across transit agencies in the U.S. throughout the pandemic? (2) How do rider/follower interactions contribute to trends in engagement and popularity of Transit agency Twitter handles? and (3) What tweeting communication patterns used by transit agencies are associated with outcomes measured in RQ2? Using web scraping, machine learning, and natural language processing also provided an opportunity to do a cross-comparative and longitudinal analysis, the size and scope of which have yet to be done for transit Twitter handles in the U.S.
Literature review
Transit agencies’ use of social media
A growing number of public transportation agencies are already using a variety of social media platforms, such as Facebook, blogging, Twitter, media and document sharing sites, social curation, geolocation, and crowdsourcing to inform and engage riders (Bregman, 2012a; Liu et al., 2016). Based on a historical survey of 43 transport agencies, conducted by Urban Transport Monitor in 2011, 54% of these agencies used Facebook, 51% used Twitter, and 37% used YouTube (The Urban Transportation Monitor, 2011). Bregman’s 2012 survey of 35 transit agencies revealed that 91% of the agencies reported Twitter as one of their top three social media platforms and 77% of the agencies used Twitter to disseminate agency announcements and service updates (Bregman, 2012a). A more recent survey of 27 transit agencies across the U.S., reported that 100% of agencies used Twitter and over 40% of these agencies had full-time social media staff (Liu et al., 2016). There have been no peer-reviewed scholarly works published with updated estimates on transit agency use of Twitter since 2016. Given the age of these surveys, we assembled the dataset for this project by determining which of the top 40 largest transit providers in the United States (ranked by VOM) had an active Twitter handle. We found that all of the top 40 transit agencies were using Twitter to communicate with riders. Given this finding, our work supports the findings made by Bregman, 2012a; Liu et al., 2016. However, it should be expected, that the frequency of activity for each agency will differ.
Bregman (2012b) also examined the social media use of six U.S. transit agencies and identified five categories of social media use by transit agencies, including (1) timely updates (e.g., agencies may use social media to enable them to share real-time service information and advisories with riders.); (2) public information (e.g., use of social media to provide the public with information about services, project planning, and fares.); (3) citizen engagement (e.g., leveraging the interactive ability of social media to connect with passengers; (4) employee recognition (e.g., use social media to recognize current workers and recruit new employees; and (5) entertainment (e.g., informal means of entertainment, such as songs, videos, and contests). Additionally, Raymond and O’Hara (2014), synthesized social media usage by transit agencies into the four “E”s of social marketing: to entice participation, to exchange information, to engage, and to experience. The top ten transit agencies use a variety of marketing techniques and strategies to communicate with riders—many of them contract with private social media analytics companies (e.g., HootSuite, Sprout Social, and Meltwater) to measure the impact of tweets, Facebook posts, and Instagram content to determine the return on investment based on a set of marketing-driven indicators (click through rates, views, etc.). Transit agencies now consider social media platforms as essential tools within broader marketing strategies (DART Daily, 2021; Doyle, 2021).
A myriad of studies have mined social media and applied sentiment analysis to determine transit riders’ satisfaction and opinions. Collins et al. (2013) first applied sentiment analysis to tweets generated by Chicago Transit Authority and found that a large volume of negative tweets is associated with service disruptions (Luong and Houston, 2015), UK (Transport Focus, 2015), and Colombia (Casas and Delmelle, 2017). Most existing empirical studies have applied in-depth analysis to one city and/or transit agency and limited studies have compared transit agencies. One study compared the communication strategies of six transit agencies with other government agencies and tweet accounts of “celebrities” (Schweitzer, 2014) and found that agencies that directly respond to questions and comments tend to receive more positive statements compared to those who mostly generate “one-way” announcements. Two more recent studies examined Twitter communication patterns and their impacts on online network formulation (Hosseini et al., 2018) and riders’ sentiments (El-Diraby et al., 2019) using data collected from three transit agencies in Canada. None of these studies has tried to systematically understand how various communication patterns on Twitter are correlated with various riders’ interaction and/or satisfactions.
Transit agency use of social media during disruption
Transit riders have three informational needs when faced with unplanned transit disruptions—the accurate prediction of length of delay, reason for delay, and alternative travel options (Cottrill et al., 2017; Transport Focus, 2011; Yates and Paquette, 2011). Receiving this information in real-time allows those directly impacted to make informed decisions about alternative travel (Pender et al., 2014), reduces uncertainty during disruptions, and avoids the lack of information frustrating for transit users (Cheng, 2010; Harazeen, 2011). Meanwhile, a review of prior practices also shows public disaster communication on social media may also be plagued by misinformation and miscommunication (Pender et al., 2014). Much of the research into transit agency use of social media during a crisis has focused on communication during natural disasters, and, to a limited extent, during times of delay and long-term disruptions. Chan and Schofer (2014) examined the use of Twitter by New York transit agencies, including Metropolitan Transit Authority, NYC (MTA NYC), New Jersey Transit, and PATH, during Hurricane Sandy. The study shows two categories of pandemic communication strategies, including general announcements regarding system status, plans, and miscellaneous information, and communication with individual riders. Additionally, the study found that MTA NYC did not directly respond to individuals on Twitter but instead directed them to alternative response teams. All agencies generated significantly more tweets during the Hurricane and gained more followers during the crisis. More recently, Diaz et al. (2021) manually examined 1401 tweets generated by 25 Canadian transit agencies during the COVID-19 pandemic and classified the tweet contents into social distancing, essential workers, face covering, and encourage home staying. As very little research exists that examines the use of Twitter by transit agencies or alternatively the use of social media by public transit to communicate travel delays, disruptions, or other emergency conditions (Pender et al., 2014), a large-scale comparison and systematic study to close the aforementioned research gaps is needed. This study takes a broader approach, by assessing findings generated from the Twitter activity of 40 transit agencies across the U.S. to better understand how communication on social media platforms might affect rider interaction outcomes during the pandemic. We compare how different transit agencies used Twitter using the latest machine learning (ML) and natural language processing (NLP) algorithms to generate lessons that can be applied to transit agency communication during delays and long-term disruptions broadly. Additionally, we link disruption communication patterns with user interaction outcomes (e.g., the number of follower gains and retweet and like counts), during the pandemic.
Data and methodology
Transit Twitter data scraping
We collected tweet handles for the top 40 largest transit agencies (i.e., size of agency defined by vehicles operated in annual maximum service [VOMs]) in the U.S. based on the National Transit Database (NTD). The number of tweet handles differs across agencies (see Figure S1). For example, Southeastern Pennsylvania Transportation Authority (SETPA) has 30 tweet handles to manage each transit line they operate, as well as a channel devoted to social engagement. While most small agencies manage only one handle, larger agencies have two to three handles to cover topics related to operation/services, general announcements, and public engagement. In addition to official handles, we also included grass-root transit handles, such as “MARTA army.” We collected and analyzed tweets associated with these tweet handles, using Python 3.9 and the Tweepy (Roesslein, 2020) and snscrape (JustAnotherArchivist, 2021) packages. We collected tweets posted between January 2020 and August 2021, resulting in 654,345 scraped tweets.
Transit Twitter data analysis framework
We developed a transit Twitter data analysis framework that captures social media communication patterns from two perspectives: the transit agency and the rider/follower (see Figure 1). From the transit agency’s perspective, we captured tweet account general activities, such as the number of tweets, replies to followers, and the use of outlinks/URL in tweets, based on prior transit social media use studies (Riquelme and González-Cantergiani, 2016; Pal and Counts, 2011; Schweitzer, 2014). ML and NLP models were also used to capture the content of tweets, based on emerging transit social media analytics literature (Haghighi et al., 2018; Riquelme and González-Cantergiani, 2016). Specifically, we developed supervised text classification models to extract COVID-related tweets. We then applied the Latent Dirichlet allocation (LDA) topic model to explore discussion themes among COVID-related tweets and used Cosine distance measure to examine the repetitiveness of tweets. Transit agency Twitter analysis framework * correlation analysis only applied to COVID-related tweets in this study.
Metrics, such as the number of followers, changes in followers since the start of the pandemic, as well as average number of likes, retweets, and reply counts, are used to capture the influence and popularity of tweets. A cursory review of transit agency marketing reports obtained from two of top ten largest transit providers in the U.S. suggests that transit agencies also use these indicators to understand return on investment and to examine how social media communication strategies can cultivate customer loyalty, engagement, and trust. All evaluation metrics are tracked over time starting January 2020. Finally, using linear regression models, we examined how a transit agency’s general tweeting activities and tweet content are correlated with user interactions, especially the percentage of change in the number of followers since the onset of the pandemic and average number of like counts per COVID-related tweets.
We classified the COVID-19 pandemic into six periods and aggregated the analysis results by period to understand if the tweeting patterns varied throughout the pandemic. The specific categories include the beginning of the pandemic (Mar.–Apr. 2020), lockdown (May–Jun. 2020), reopening (Jul.–Aug. 2020), wave I (the surge in COVID case in Dec. 2020–Jan. 2021), wave II (the surge in COVID case due to Delta variant, Jul.–Aug.2021), and off-peak time periods (any month between Mar.2020 and Aug.2021 that are not included in the prior categories).
Tweet content analysis
Text vectorization
We preprocessed the text data to prepare it for text classification and topic models, following the procedures used in prior topic modeling studies (Aman et al., 2021; Gupta et al., 2021). First, we tokenized the tweets, converted all tokens into lowercase, and removed stop words, numbers, punctuation, emoticons, hashtags, and URLs. We did not remove common stop words, such as “hi,” “please,” “thank,” as these words can help distinguish conversation tweets from general announcements. We also lemmatized the tokens by parts of speech tags, namely nouns, adjectives, verbs, and adverbs, and created bigram and trigrams, using the python NLTK package (Bird et al., 2009).
We vectorized tweets from the preprocessed text data using the Term Frequency-Inverse Document Frequency (TF-IDF) and Bag of Words (BoW) extraction methods from the Scikit-learn 1.0 package (Pedregosa et al., 2011). We used TF-IDF extraction method to prepare data for COVID-related tweets classification because the approach is a widely used in text classification applications (Qi et al., 2020). We vectorized the tweets using the BoW approach to prepare the data for the LDA model, which does not require the weighting of TF-IDF text representation (Blei et al., 2003). Mathematical formulations for both approaches can be found in Supp M1 section.
COVID-related tweets classification
We first manually labeled 3000 sampled tweets to create training and testing sets for text classification modeling. To create a more balanced training set (i.e., the number of COVID-related tweets roughly equal to the number of non-COVID-related tweets), we labeled 1500 randomly sampled tweets that contain keywords, namely, “covid,” “pandemic,” “sanitize,” “mask,” and “social distance” and 1500 tweets that did not contain these keywords. We then trained a variety of ML classifiers, including logistic regression, support vector classifier, random forest, and gradient boosting classifiers. We fine-tuned model hyperparameters (see Table S1) using a randomly assigned training set (80% of labeled data). We then selected the classifier that performed the best on the testing set (20% of labeled data) as the final classifier to identify COVID-related tweets. The ML models were implemented using Scikit-learn 1.0 package and hyperparameters were heuristically optimized using the scikit-optimize package (Head et al., 2018).
Topic modeling
We used Latent Dirichlet allocation (LDA) model to explore topics discussed in the COVID-related tweets. LDA is a generative probabilistic approach that has been widely used to discover topics from a collection of documents (Aman et al., 2021; Sun and Yin, 2017). LDA model details can be found in the Supp M2 section for reference.
Repetitiveness analysis
We manually reviewed selected tweets from different transit agencies and found that some agencies include similar or the same set of words in the announcements. Therefore, we also captured the transit agencies’ tweeting repetitiveness using the text cosine distance measure in the Scikit-learn 1.0 package, which is the L2-normalized dot product of text vectors (Schütze et al., 2008). In this study, we used the cosine distance measure because we want to capture if words have been repeatedly used rather than the sequence of the words, which are better captured using other distance measures, such as Euclidean and Manhattan distance (Wang and Dong, 2020). The mathematical formulation of cosine distance measure can be found in the Supp M3 section.
Results
COVID-related tweets classification results
The ML experiments show that the random forest classifier provides the best classification results among all examined classifiers, with an overall weighted accuracy of 93% on the test set. The classifier yields a recall of 0.91 (i.e., the probability that a COVID-related tweet can be correctly labeled as COVID-related by the classifier). The classifier precision is 0.95 (i.e., if the model predicts that a tweet is COVID-related then there is 95% chance the prediction is correct). We then applied the trained random forest classifier to the entire tweet database to identify COVID-related tweets. 24,494 tweets (i.e., 6%) were classified as COVID-related tweets.
General activities
General activities include the number of tweets, replies to riders, and use of URL (i.e., outlinks). Figure 2 shows the number of tweets made by the top 40 transit agencies during the pandemic. The results indicate that larger agencies generate more COVID-related tweets compared with smaller agencies. Number of COVID-related tweets (left) and % of COVID-related tweets (right), aggregated by month (*the agencies are ranked by VOMs in descending order)
The reply tweets refer to tweets generated by transit agencies in response to tweets that are not created by transit agency handles. Typically, a rider will create a tweet to @ a transit agency handle when they have questions and/or issues to report. Despite the average percentage of reply tweets being similar for COVID-related and non-COVID-related tweets (i.e., 40.3% vs. 38.2%), our findings revealed the use of two different communication strategies during the pandemic, including pandemic-related announcements and agency-rider conversations. Some agencies, such as Metropolitan Transportation Authority New York City (MTA NYC), Washington Metropolitan Area Transit Authority (WMATA), Chicago Transit Authority (CTA), Regional Transportation Commission of Southern Nevada (RTCSN), and San Francisco Bay Area Rapid Transit (SFBART), were more likely to be engaged in rider-initiated agency–rider conversations during the pandemic (i.e., a higher percentage of reply tweets in COVID-related than non–COVID-related tweets, see Figure S2). While other agencies (e.g., NJ Transit, King County Metro [KCM], Denver RTD, and Utah Transit) preferred to make pandemic-related announcements (i.e., agency–initiated one-way conversation) rather than agency-rider interactions. Most transit agencies (75%) were more likely to use outlinks in COVID-related tweets to introduce riders to new pandemic resources and guidance (see Figure S3). Some agencies (e.g., KCM, Dallas Area Rapid Transit [DART], and Utah Transit) have used more outlinks in COVID-related tweets.
Summary of Tweeting patterns by period.
Content analysis
Topic model results, relevant words, and example Tweets.
The temporal distribution of tweet topics was quite consistent, with more tweeting happening towards the beginning of the 20-month data collection period, peaking in March/April 2020, and less tweeting as the pandemic dragged on, as shown in the stream graph from Figure 3 (left). Most of the tweets generated by agencies were replies to riders’ concerns regarding the enforcement of face covering (i.e., dark blue) which includes drivers and passengers not wearing masks and questions regarding who should be responsible for enforcing face coverings on board transit. This topic was especially popular during the reopening period (i.e., July–August 2020). Transit agencies also actively interacted with riders by replying to their concerns regarding social distancing (i.e., yellow) which includes changes in boarding and loading policies. Service scheduling changes (i.e., dark orange) and announcements regarding fleet cleaning and sanitizing (i.e., orange) are also frequently mentioned throughout the pandemic. Free resources (i.e., light green) and essential workers and services appreciation (i.e., dark green) are the least discussed topics in COVID-related tweets. Additionally, the number of COVID-related tweets diminishes in general over time. (Left) Stream graph of topic distribution overtime during the pandemic. Tweet counts per topic are smoothed using Gaussian distribution for visualization purpose, see Supp M4 for details. (Right) Percentage of tweets by topics by period.
The percentage of tweets discussing face covering and service updates changes across periods, as shown in Figure 3 (right). The percentage of face covering-related tweets peaks during the reopening and wave II when there were increases in ridership. The percentage of service updates-related tweets is the highest in the first four months and during wave I, due to operational changes during these periods. The percentage of other topics remained stable over time.
The distributions of topics vary across transit agencies with different levels of VOMs. We divided agencies by VOMs into four groups (i.e., Top10, Top11-20, Top21-30, and 31–40) and calculated the percentage of tweets belonging to each topic by group (see Figure S5). Small to medium-size agencies tend to generate a higher percentage of tweets on Topic 1 (i.e., communicate with riders regarding face covering enforcement), especially during the second wave of the pandemic (July–August 2021), and a smaller portion of tweets on Topic 2 (i.e., announcements regarding service and workers appreciation). Large agencies with top ten VOMs tend to generate more service and schedule-related updates during the pandemic. Such a result is expected because larger agencies have more lines and services to manage.
Some agencies, such as Maryland Transit Administration (MTA Maryland), KCM, Metropolitan Transit Authority of Harris County (METRO Houston), and Metropolitan Atlanta Rapid Transit Authority (MARTA) are more repetitive than their peers, with higher average repetitiveness scores (see Figure S6). MTA Maryland and KCM re-emphasize COVID-related alerts at the beginning of the day to remind riders to only travel for essential trips, emphasize requirements of face coverings while riding transit, and CDC tips about preventing the spread of COVID virus. METRO Houston added safety tips to all tweets (e.g., “SAFETY TIP: avoid touching your face with unwashed hands”).
Rider interactions
We tracked the number of followers by tweet handle before and during the pandemic (i.e., 2020/01–2021/08) from the Social Blade web site. The Web site only tracks historical followers by month for tweet handles with above 200 followers. Therefore, we are only able to obtain historical follower counts for 47 handles out of the 89 handles we collected for the top 40 transit agencies. Most agencies end up with more followers by August 2021 (see Figure S7). Smaller agencies (i.e., yellow to green lines in Figure S7) have a much larger percent gain compared with larger peers. Additionally, we find that the average number of like counts, retweets, and replies (i.e., followers replying to a tweet created by a transit agency) are higher for COVID-related tweets compared with non–COVID-related tweets (see Figure S8-10).
Multiple linear regression results. a
*p < .1; **p < .05; ***p < .01
aThe VIF values for the included variables are [1.26, 1.29, 1.25, 1.31] and [1.07, 1.34, 1.50, 1.18] correspondingly.
bThere are agencies who lose followers during the examined period (03/2020-08/2021). The most reduction is 3%. We added 0.03 to observations before log transformation.
cThree tweet handle observations are removed as outliers, based on Cook’s distance chart, namely, Regional Public Transportation Authority’s ‘DARTAlerts’ and Southeastern Pennsylvania Transportation Authority’s ‘SEPTA_CYN' and ‘SEPTA_NHSL'.
dFour tweet handle observations are removed as outliers, based on Cook’s distance chart, namely, Central Florida Regional Transportation Authority’s ‘SFBARTalert', Northeast Illinois Regional Commuter Railroad Corporation’s ‘dartmedia', Capital Metropolitan Transportation Authority’s ‘oatstransit', and San Francisco Bay Area Rapid Transit District’s ‘PierceTransit'.
The linear regression model results show that percent change in the number of followers is positively correlated with the percentage of COVID-related tweets that agencies generate; one percent of increase in the number of tweets will lead to a 0.057% of increase in gained followers. The percentage of tweets replying to followers and the use of outlinks are also positively correlated with follower gains during the pandemic. One percent increase in reply to followers may lead to a 0.016% increase in the percentage of followers gained. One percent increase in the use of outlinks may result in a 0.024% increase in followers gained. One percent increase in the number of pre-pandemic followers is associated with a 0.02% decrease in the number of followers.
The average like counts per COVID-related tweets is positively correlated with the percentage of COVID-related tweets and the number of base followers. Specifically, one percent increase in COVID-related tweets will lead to a 0.96% increase in the average number of likes per tweet. One percent increase in tweets discussing social distancing will lead to a 0.19% decrease in the average like counts per tweet, indicating most riders are not satisfied by the social distancing enforcement condition onboard. One percent increase in average repetitiveness may lead to a 1.05% reduction in average like counts.
Discussion
The general activity analysis results show that transit agencies of different sizes have varied communication strategies. Larger agencies generate more COVID-related tweets compared with smaller agencies. This may be because larger agencies have more resources allocated toward social media management, such as full-time staff (Liu et al., 2016) and a larger audience to speak to. Smaller agencies, on the other hand, generate a higher percentage of COVID-related tweets. This may be because smaller agencies have fewer resources and, therefore, rely more on Twitter (a free platform) to disseminate COVID-related information. Additionally, we found that some agencies are more likely to be engaged in riders/followers initiated conversations, while other agencies rely more on one-way announcements, which is consistent with prior findings (Chan and Schofer, 2014).
We found that the prevailing topics discussed in COVID-related communication on Twitter were face coverings, service updates, social distancing and cleaning. The popularity of these topics varied by pandemic periods. “Face covering” and “social distancing” were most frequent during reopening periods, while “service updates” prevailed when there was a change in the pandemic situation (either the lockdown or reopening periods). These results align with findings from a prior study that applied manual labeling techniques to a small sample of tweets collected from Canadian transit agencies (Diaz et al., 2021). The topic distributions also varied by transit agency size, as the top ten agencies focus more on service updates and essential worker appreciations.
The user interaction analysis results showed smaller agencies gained a higher percentage of users during the pandemic. This may be because smaller agencies have less proportion of followers among riders to start with and they relied more on Twitter to disseminate COVID-related information due to constraints in the budget to invest in physical signages and cyber developments (e.g., web and App updates). Additionally, our results show that pandemic communication on Twitter is positively perceived by users, given that COVID-related tweets received a higher number of likes per tweet compared to non–COVID-related tweets. The exception being some smaller agencies, whose average number of like counts for non–COVID-related tweets seem to have been influenced by outliers (e.g., the agency retweeted a tweet that got a lot of attention [e.g., football event]).
Our results show that communication patterns on Twitter are associated with different user interaction outcomes. After controlling for the size of the agency (i.e., the number of followers before the pandemic), we found that proactive communication patterns (e.g., a higher percentage of COVID-related tweets, forwarding pandemic information from other sources, and actively replying to riders) are positively associated with the percentage of followers gained, indicating developing a history of active response and two-way communication on Twitter can help incubate larger follower base in the long run. Meanwhile, repetitiveness is negatively perceived, indicating agencies may be cautious towards using such a strategy for information dissemination. Finally, the change in followers is related to the number of followers that an agency starts with. It appears for example that larger agencies have already devoted resources to curate Twitter followers and well-developed transit Apps that riders otherwise rely upon for information communication, so there is less potential to gain new followers during the pandemic. Smaller agencies gained more followers, likely due to higher Twitter technology penetration and smaller adoption user base pre-pandemic. The models, however, are limited, as we did not control for agency-specific social media campaigns or online marketing investments, which merit future examinations.
Though Twitter allowed for the exploration of different communication strategies and outcomes of transit agencies during the COVID-19 pandemic, we acknowledge that the use and analysis of Twitter data create some bias. First, perceptions and opinions expressed on Twitter are by no means exhaustive. In this case, the communication strategies and outcomes captured are limited to the top 40 transit agencies. The arbitrary selection of this threshold may introduce systematic bias into the results (Cihon and Yasseri, 2016). Additionally, Twitter users are often younger, wealthier, more educated, and they tend to be more liberal compared with the general public and other social media (e.g., Facebook) users (Wojcik and Hughes, 2019). We recognize that during the COVID-19 pandemic, where transit riders who continued to ride were likely the transportation disadvantaged or captive riders, conversations between agencies and users on Twitter were likely representative of only a small portion of riders. We also note that in a manner similar to what is seen amongst Twitter users, where 80% of tweets are generated by 10% of users (Wojcik and Hughes, 2019), the majority of tweets analyzed were generated by a small subset of agencies, especially the top ten agencies.
Finally, in social media research, the question of how much data is enough continues to elude researchers. In this study, we access all tweets generated by the Twitter handles managed by the top 40 transit agencies. However, we did not analyze the tweets generated by other Twitter handles who interacted with the transit agency. For example, if a rider initiated a tweet @ transit agencies complaining about face covering enforcement and the agency handle replied, we only analyzed a portion of the conversation (i.e., the tweets generated by transit agencies). Future studies may leverage the latest updated Twitter API v2 to analyze tweets by conversations rather than by tweet handles. As Twitter continues to expand its reach within academic audiences, we will all have to continue to grapple with the inherent biases within this data.
Conclusion
In this study, we developed a tweet analysis framework to systematically understand how agencies’ tweeting patterns are associated with rider interaction outcomes during the pandemic, using advanced ML and NLP models. We applied the developed framework to tweets generated by tweet accounts associated with the top 40 transit agencies in the U.S. The results show that there were similarities in agencies’ general activity, tweet content, and user interaction on their Twitter feeds. Most agencies generated more COVID-related tweets at the beginning, lockdown, and reopening periods of the pandemic and relied on other resources (e.g., the use of outlinks) to update riders on COVID-related protocols. Most agencies also gained followers as the pandemic progressed. This indicates that transit riders do resort to Twitter to gain access to the latest information regarding services and connect to the community during the pandemic.
Our results also show differences between communication patterns and user interaction results, especially among smaller and larger transit agencies. Smaller agencies (ranked by VOM) generated a higher percentage of COVID-related tweets during the time period analyzed. Smaller agencies may have fewer marketing and communication resources available to them. Larger agencies also devoted more effort to interacting directly with riders about social distancing enforcement issues and service updates when compared with smaller agencies. Tweets made by some agencies were also more repetitive when compared with others. From the user interaction perspective, larger agencies that entered the pandemic with an already large number of followers did not gain as many new followers as those that began with a smaller number of followers. The increase in followers for agencies that began the pandemic with fewer followers may be related to our finding that COVID-related tweets generated more user interactions (more retweets and favorites per tweet).
The linear regression results show that proactive communication strategies, such as more COVID-related tweets, actively replying to followers, and disseminating information citing outside links, are positively associated with the percentage of changes in the number of followers during the pandemic, after controlling for Twitter penetration level (i.e., number of followers in January 2020) before the pandemic. This suggests that in order to effectively use Twitter to expand the reach of communication and messaging during disruptions, agencies might consider proactively communicating and replying to riders to gain a larger follower base during the pandemic. Smaller agencies are more likely to gain followers during the pandemic compared with larger agencies, regardless of communication patterns, likely due to lower Twitter adoption rates among active riders. The average like counts per COVID-related tweet is positively associated with the percentage of COVID-related tweets and negatively associated with tweets communicating concerns regarding social distancing enforcement and average repetitiveness of tweets, while controlling the number of followers pre-pandemic. This may be because social distancing tweets are complaining tweets that involve one-to-one conversations. The results suggest agencies might consider making proactive COVID-related announcements while reducing tweet repetitiveness to encourage user interactions measured in this analysis (i.e., likes).
This study has several limitations that merit future research. First, the framework developed does not include rider/follower sentiment analysis, which is also a critical component of user interactions. Future studies may also examine how communication patterns are associated with rider sentiment/satisfaction during the pandemic. Second, we only applied the developed framework to analyze tweets generated by transit agencies during the COVID-19 pandemic without baseline data. Therefore, it is hard to identify the effect of the pandemic on user interactions. To gain a more systematic understanding, it is recommended to collect baseline data about other topics different from COVID (e.g., extreme climate events) and conduct parallel processing of this topic for comparative analysis. Third, this study did not examine changes in Twitter conversation during the pandemic, which is also critical for encouraging healthy and trust-worthy conversations in online communities and feasible via using Twitter API v2. Transit agencies and planners may benefit from more in-depth conversation analysis to determine how virtual communication can be translated into real-world actions (i.e., transit ridership), to justify social media return on investment, and to contribute to the speedy recovery of transit ridership post-pandemic. Additionally, the authors acknowledge the systematic bias introduced by imposing a threshold on the number of agencies selected for participation in this study. We also acknowledge differences in Twitter communication strategies that likely exist between larger, highly resourced transit agencies and their smaller agencies counterparts which can be seen in the difference in the volume of tweets generated. Lastly, as this work is expanded to include more agency-rider discourse analysis and sentiment analysis, we would recommend that other methods of data collection targeting the transportation disadvantaged be incorporated.
Supplemental Material
Supplemental Material - Transit communication via Twitter during the COVID-19 pandemic
Supplemental Material for Transit communication via Twitter during the COVID-19 pandemic by Wenwen Zhang, Camille Barchers, and Janille Smith-Colin in Environment and Planning B: Urban Analytics and City Science
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Author biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
