Abstract
Introduction
ADHD is a condition marked by a combination of impulsiveness, hyperactivity, and inattention. About 8 million adults in the United States are known to be diagnosed with ADHD, and only one-quarter of the adults who were diagnosed when they were children are reported as having received treatment for their symptoms in the past one year (Kessler et al., 2006). Less than half of the persons with ADHD symptoms are believed to have ever received a clinical diagnosis (Chamberlain et al., 2017). Similar to other psychiatric conditions, ADHD is severely underdiagnosed in adults (Ali, Teich, Woodward, & Han, 2016). Moreover, despite the significant personal and societal burden associated with the diagnosis of ADHD, there are many contextual factors, in both the professional and community domains that contribute to the consistent underdiagnosis of ADHD (Asherson et al., 2012). In addition to educating health care professionals about appropriate screening and diagnostic evaluation tools and strategies, there is the ongoing issue of gathering corroborative data with which to document the symptoms and related impairments to the level that meet diagnostic criteria (Ramsay, 2017). Various technologies have allowed for the ecological momentary assessment (EMA) of some aspects of clinically relevant behaviors associated with ADHD that provide novel and clinically relevant insights, such as the antecedents and consequences of cigarette smoking in adults with ADHD (Mitchell et al., 2014). Although there are no such technologies that are being used in conventional diagnostic evaluations for ADHD, it is likely that such behavioral assessment technologies may someday at least be considered for use in identifying ADHD.
Social media represents an emerging technology that may provide useful signals for identifying ADHD in adolescents and adults and/or providing insights about the condition not available through traditional modes of inquiry. At the very least, it provides a new, relatively untapped avenue through which to gain naturalistic data on the day-to-day experiences and public expressions of individuals living with ADHD. With the widespread availability of internet and mobile technologies, an increasing number of teens and adults use social media every day. A recent study (Lenhart, 2015) showed that about 89% of 1,060 teens between the ages 13 and 17 years old who were interviewed used social media, with 71% of them having accounts on more than one platform. Studies have shown that several characteristics such as valence and arousal (Preoţiuc-Pietro et al., 2016); temporal orientation (Park et al., 2017); users’ personality (Guntuku et al., 2017a; Schwartz et al., 2013); perception of traits such as age, gender, and personality (Guntuku, Qiu, Roy, Lin, & Jakhetiya, 2015; Preoţiuc-Pietro, Guntuku, & Ungar, 2017); and even mental health (Guntuku, Yaden, Kern, Ungar, & Eichstaedt, 2017b) can be inferred based on users’ social media footprint. Moreover, social media is being increasingly used for communicating about mental health (Schein, Wilson, & Keelan, 2010). This makes social media a promising platform for researchers to study the behavior of users with ADHD, gaining potentially useful insights for researchers and clinicians not otherwise available to them, and a possible avenue for feedback to patients with ADHD about how they are doing rather than relying solely on traditional office visits.
Twitter is a widely used, free social media platform which is used by about 21% of U.S. adults. Users get to post 140 characters messages called “tweets,” which are public and able to be viewed by others Twitter users by default (unless marked as private by the user). On an average, 500 million tweets are posted per day. Tweets convey the views of individuals on a variety of topics and, Twitter being a social networking platform, their reach has a great impact through social multiplier effects.
In this work, we used Twitter to analyze the language of users diagnosed with ADHD, and consequently, as a means to gain different insights about the behavioral symptoms associated with ADHD, as they are manifested on social media. Our fundamental hypothesis was that the language of people who self-identify as having ADHD would be significantly different from matched controls, that this language would reveal differences in characteristics such as personality and temporal orientation between ADHD and controls, and that the language usage patterns would both confirm existing understanding of ADHD and give new insights into the daily lives of those with ADHD. Towards this goal, we took an inductive approach of computationally analyzing the large volumes of social media data with the aim of better understanding the varying manifestations of ADHD and generating new hypotheses. This data-driven approach is not an alternative to hypothesis-driven studies; the two are complementary and iterative partners in knowledge discovery (Kell & Oliver, 2004). Data-driven approaches such as used here suggest hypotheses for confirmation (or refutation). Our expectation, based on previous work (Guntuku et al., 2017b; Park et al., 2015), is that social media language can be used to help find previously unrecognized symptoms and comorbidities.
Related Work
A number of studies have analyzed social media content on mental health–related topics including depression, suicidal ideation, and schizophrenia. A majority of these studies used a closed-vocabulary approach to analyze the language of users (Coppersmith et al., 2015a). In this method, words are grouped into a set of categories (such as pronouns, positive words, food words, etc.) and their relative frequency within the text are obtained. Linguistic Inquiry and Word Count (LIWC; Pennebaker, Boyd, Jordan, & Blackburn, 2015) is one of the most popular implementations, and has over 60 psychologically relevant categories, such as social processes (words referring to family, friends, and humans), affective processes (positive and negative emotions, anger, sadness), perceptual processes (seeing, hearing, and feeling), and so on. However, the closed-vocabulary approach relies on scientists making conjectures in advance about which words are likely to be relevant, and often, particularly in social media, misses many words predictive of the outcome of interest (Schwartz et al., 2013).
Researchers are increasingly turning to open-vocabulary methods for analyzing social media (Schwartz et al., 2013). The Computational Linguistics and Clinical Psychology (CLPsych) workshop was started in 2014 to foster cooperation between clinical psychologists and computer scientists. Data sets were made available and “shared tasks” designed to explore and evaluate different solutions to a shared problem. In the 2015 workshop, participants were asked to predict if a user had PTSD or depression based on self-declared diagnoses (PTSD = 246, depression = 327, with the same number of age- and gender-matched controls; Coppersmith, Dredze, Harman, Hollingshead, & Mitchell, 2015b). On this data set, and other similar data sets consisting of multiple mental illness conditions, researchers have used open-vocabulary methods to predict conditions such as anxiety, depression, and PTSD based on self-declared diagnoses in English. One such work (Coppersmith, Dredze, Harman, & Hollingshead, 2015a) examined the differences in 10 comorbid mental illness conditions including ADHD. However, their analysis was based on a sample of 100 users with the condition of ADHD. While findings showed that some of the conditions show differences in generic categories of word usage such as auxiliary verbs, functional words, and those related to cognitive mechanisms, death, and so on between the users with condition and control groups, it is not clear what exactly the users with ADHD talk about, and if these differences in linguistic topics can be exploited to extract signal for ADHD to inform researchers and clinicians working in the domain. While Coppersmith and colleagues (2015a) studied comorbidity of the 10 conditions they examine, we attempt to provide specific insights into the language of ADHD.
The following are our methods for studying the language of users with ADHD on social media:
We take an open-vocabulary-based differential language analysis for understanding the topics that ADHD users talk about on a large-scale data set, an order of magnitude larger than previous work;
We study the relationship between ADHD and other traits such as users’ personality, temporal orientation, and word usage;
We build models to predict if a user has ADHD or not based on the language they use on social media.
Materials and Method
Data
All research procedures for this study were approved by the University’s Institutional Review Board. Our materials are public messages posted on Twitter. Many users self-identify their health condition on Twitter. Looking for statements such as “I am/was diagnosed with ADHD/ADD” returned about 1,900 users. Of these, 1,399 users were filtered based on verification of the tweets—removing ads, spam, messages, retweets, talking about others having ADHD, and so on. While we cannot be certain that the users were actually diagnosed with ADHD, we can only check if their statement of diagnosis appears to be genuine based on self-descriptions of a diagnosis. Previous studies indicate a good interrater reliability for this form of data collection (κ = .77; Coppersmith, Dredze, & Harman, 2014). We collected 1.3 million public posts between January 1, 2012 to October 30, 2016, of all users with the Twitter API. We do not have access to private messages. As we are focusing on the language of users with ADHD themselves, we excluded retweets from users. We excluded non-English tweets and non-U.S. users to avoid cross-cultural artefacts associated with health-related discussion, and also removed the tweets that were used to assess the diagnosis. In our data set, we found 11 users who also reported a diagnosis of depression, five users who reported a diagnosis of anxiety, and four who reported a diagnosis of bipolar disorder, in addition to ADHD. We did not exclude them from our analysis as understanding the sequelae of the comorbidities might be important in identifying persons with the actual condition.
As the prevalence of mental health conditions varies based on age and gender (Dos Reis & Culotta, 2015), we formed control group of users by matching each user in our ADHD database to another user by age, gender, and period of activity. We obtained the age and gender estimates by using lexica developed by (Sap et al., 2014). Overview of the data pipeline is shown in Figure 1. Then, we selected users with a minimum of 500 words across all their posts so that we have sufficient language for analysis. This retained 1,032 users with ADHD and 1,029 users in the control group. The average age in the cohorts was 23 years, with 637 females in the ADHD group and 639 in the control group.

Overview of the data collection pipeline.
Linguistic Analysis
We automatically extracted the relative frequency of single words, phrases (consisting of two or three consecutive words), topics, and LIWC features (Pennebaker et al., 2015) across all tweets of all users (see Schwartz et al., 2013, for details of the method). First, all words used by less than 1% of users were removed from analysis so as to remove uncommonly used words (outliers). Additionally, all messages with the phrase used to identify diagnosis (“I was diagnosed with ADHD”) were removed prior to further analysis.
Two-hundred topics were generated using tweets across all users in the data set of ADHD and non-ADHD users using the Mallet implementation of Latent Dirichlet allocation (LDA; Blei, Ng, & Jordan, 2003). The topic distribution of each user aggregated across all the messages was then calculated.
We isolated the patterns in users’ language using these features by correlating the words/phrases with the condition of ADHD being present or not. We used Simes p-correction to control the false discovery rate and use p < .001 for indicating meaningful correlations.
User Trait Prediction
We used automatic text-regression methods to assign to each user scores on the Big Five personality traits (Gosling, Rentfrow, & Swann, 2003) and temporal orientation scores for users. This personality model was trained on a sample of over 70,000 Facebook users, using tokens and topics extracted from status updates as features, achieving a validation predictive performance of r = ~.35 on average for all five traits (Schwartz et al., 2013), which is considered a high correlation in psychology, especially when measuring internal states (Meyer et al., 2001). We used a temporal orientation model (Park et al., 2015), which was trained on 6,000 Facebook and Twitter statuses with a predictive accuracy of 71.8%, for classifying users into being in the past, present, and future temporal orientation categories. We also calculated a few statistics such as number of posts that the user has, the number of followers and friends that the user has, and also the number of characters used to describe the user’s profile on Twitter.
Results
In this section, we first present the results obtained from closed-vocabulary analysis (LIWC) followed by the insights derived from differential language analysis based on open-vocabulary approach (including the correlations between different user traits and the condition of ADHD), and then the prediction performance obtained by different features.
LIWC Categories
For each user, we measure the proportion of word tokens that fall into a given LIWC category. Then, we compare it against the word tokens from the control data using an empirical distribution of the proportion of language attributable to each LIWC category. The correlations between word tokens in different LIWC categories with the condition of ADHD are shown in Table 1.
Correlations Between LIWC Categories and ADHD.
Note. All correlations are significant at p < .001 and Simes corrected. LIWC = linguistic inquiry and word count.
Cognitive Processes
It is interesting to see that words indicating uncertainty and a hedging behavior are highly correlated among users with ADHD. The presence of conditional statements, along with words such as “think,” “know,” and so on, may be suggestive of misgivings or a sense of unpredictability among these users. For instance, they might know that they have to do something, and that doing that at that point of time is the right thing to do, but might end up not doing it, which is consistent with the executive dysfunction and motivational deficits characteristic of ADHD (Faraone et al., 2015; Ramsay & Rostain, 2016).
Affective Processes
Anxiety is a common comorbid condition with ADHD (Nigg, 2013), and words indicating the condition are highly correlated with ADHD on social media. Day-to-day aspects of living, such as paying bills, or arriving at a place on time and many other such tasks are more difficult for ADHD adults to manage than for non-ADHD adults. Also, words representing anger are used more frequently, which is a known feature of emotional impulsivity for many people with ADHD (Barkley, 2010; Faraone et al., 2015; Surman et al., 2011).
Informal Language
Informal language such as fillers might be associated with insecurities of people with ADHD and/or difficulties adequately organizing and expressing ideas within the parameters of Twitter, and swearing might be a result of emotional impulsivity.
Open-Vocabulary Approach
As closed-vocabulary approaches like LIWC cover only a small subset of the entire language used on social media, we use an open-vocabulary approach to improve the coverage and find topics that people with ADHD talk about. Figure 2 shows some of the most prominent words and phrases in the messages posted by self-identified users with ADHD on Twitter. Users with ADHD swear significantly more than the control group, and also talk about others (“they,” “people,” “they want”), about issues (“control,” “problem_with”). Interestingly, they also talk about pokemon go.

Words/phrases more likely to be posted by Twitter users with self-reported diagnoses of ADHD compared with the control group.
To understand the themes underlying the language of ADHD, we then use the 200 topics created using LDA, to obtain the Pearson correlation between the topic distribution and ADHD (shown in Figure 3).

Highly correlated topics with ADHD.
Most of the LDA topics have messages relating to lack of focus and self-regulation (Barkley, 2010), emotional dysregulation (Faraone et al., 2015; Surman et al., 2011), intention and failure, negation, self-criticism, expressions of mental, physical, and emotional exhaustion.
User Traits
We predicted users’ personality traits and temporal orientation associated with their messages, and also calculated a few statistics, such as number of posts that the user has made, the number of followers and friend that the user has, and the number of characters used to describe the user’s profile on Twitter. These are shown in Table 2.
Correlations Between User Traits and ADHD.
Note. All correlations are significant at p < .001, Simes p-corrected.
Personality
We find that users with ADHD are more open (r = .22; Gomez & Corr, 2014; Van Dijk et al., 2017). They are also less agreeable (r = −.27), which corroborates our previous findings on high correlation with the LIWC category of swearing.
Posting Characteristics
Users with ADHD tend to post much more frequently (r = .53), and tend to have more followers (r = .4). Also, when compared with users who do not have ADHD, they tend to post significantly higher number of tweets during the night (12 a.m.–6.a.m.) as seen in Figure 4. These characteristics in posting may reflect self-regulatory difficulties in terms of engaging in a more immediately rewarding (but low priority) activity and doing so during hours typically devoted to sleep.

Proportion of posts across all users in the data set at different times (by hour) in the day.
Temporal Orientation
Social media posts associated with users diagnosed with ADHD tend to be focused in the past with very little future-oriented content.
Use of Drug Words
We also looked the distribution of words pertaining to drugs in the posts of users with ADHD, and found that they significantly talk more about dope (r = .102), wax, lean, smoke, adderall, and weed which are known to be consumed by those diagnosed with ADHD (Mitchell, Sweitzer, Tunno, Kollins, & McClernon, 2016; Wilens, 2004), running the gamut from prescribed medications for treatment, recreational drug use, and recreational drugs that are viewed as self-medication for ADHD symptoms.
Predicting ADHD
We then looked at the feasibility of predicting whether a user has ADHD or not based on their social media language. Automated analysis of social media is accomplished by building predictive models, which use “features,” or variables that have been extracted from social media data. In this work, we used LIWC and Topics as features. Features are then treated as independent variables in an algorithm to predict the dependent variable of an outcome of interest (e.g., users’ having ADHD or not). Predictive models are trained, using an algorithm, on part of the data (the training set) and then are evaluated on the other part (the test set) to avoid overfitting—a process called cross-validation. We tried several algorithms such as Logistic Regression, Random Forests, Support Vector Classification, and found that Support Vector Classification showed marginally superior performance over the others. The prediction performances are then reported as one of several possible metrics (Table 3) on an out-of-sample fivefold cross-validation setting. As there was not a huge difference in the performance, we report results only using Support Vector Classification. Topics outperform LIWC by ~5% giving the best accuracy of ~76% and an Area Under the Curve (AUC) of 0.836, which is base-rate independent. The sensitivity and specificity of the trained algorithm is shown in Figure 5.
Performance of Different Features at Predicting ADHD, Reported on an Out-of-Sample Fivefold Cross-Validation Setting.
Note. AUC = area under the curve; LIWC = linguistic inquiry and word count.

ROC curve of the classifier trained on LIWC + Topics.
Discussion
In this work, we studied the behavior of adults with self-reported diagnoses of ADHD as described by them on social media posts using Twitter. We compared their language with a control group using closed-vocabulary instrument (LIWC) and open-vocabulary approach (words/phrases and topics), and found significant differences in descriptions of self-efficacy, emotional dysregulation, negation, self-criticism, substance use, and expressions of mental, physical, and emotional exhaustion based on a review of linguistic themes of posts. We also found that users with ADHD are more open and less agreeable (Gomez & Corr, 2014; Van Dijk et al., 2017), and have a significantly different posting behavior than controls in terms of number of posts and the timing of posts, and tend to have low future-orientation, posting more about the past than controls. While some of these online behaviors can be attributed to known symptoms of ADHD, we also discovered certain novel findings as highlighted in the previous section. These insights could be used by researchers and clinicians for informing hypotheses generation to understand the varying manifestations of ADHD, and social media could potentially be used as a complementary feedback tool to give patients suffering with ADHD personal insights.
One of the features of social media posts is that it provides a naturalistic expression of a user’s particular mindset at that time. Hence, the linguistic analyses utilized in this study provide a novel adjunct to recent research examining the cognitions of adults with ADHD. Existing studies have reported that adults with ADHD endorse significantly more negative or maladaptive thoughts than control groups, both in cases of ADHD with and without comorbid diagnoses (Knouse, Zvorsky, & Safren, 2013; Mitchell, Benson, Knouse, Kimbrel, & Anastopoulous, 2013; Strohmeier, Rosenfield, DiTomasso, & Ramsay, 2016). A recent empirically based adult ADHD cognition scale is comprised of maladaptive positive thoughts about tasks that function as rationalizations for deferring a task, consistent with the observation of hedging themes in the user language in the current study (Knouse, Mitchell, Kimbrel, & Anastopoulos, 2017).
The more frequent use of words reflecting themes of uncertainty and hedging, as well as various descriptions of exhaustion, self-criticism, and negation (i.e., unfulfilled endeavors) may be consistent with preliminary prospective studies of the maladaptive thoughts and beliefs of adults with ADHD, with failure and related themes being the most commonly endorsed, which likely reflect the effects of the core features of ADHD on the self-efficacy of those with the diagnosis (Miklósi, Máté, Somogyi, & Szabó, 2016; Philipsen et al., 2017). Lastly, descriptors of mental, physical, and emotional exhaustion reflect the toll exacted by living with ADHD on the overall wherewithal of those affected, consistent with recent findings of elevated fatigue and lower self-efficacy in adults with ADHD compared with healthy controls (Rogers, Dittner, Rimes, & Chalder, 2017).
The posting behavior of adults with self-described ADHD is consistent with the features of the disorder. That ADHD adults posted more frequently on a platform that provides immediate and quick feedback makes sense; such a platform should be desirable to individuals who, as a group, have difficulties delaying gratification. Going beyond the data of this study, we might conjecture that social media provides a compelling source of distraction from other higher priority tasks. One of the tasks from which adults with ADHD may be prone to distraction is that of sleep, with this group being more likely to post during hours typically devoted to sleep. The temporal orientation of posts was much more likely to include content focused on the past than the present or future. An aspect of ADHD has been described as temporal myopia or difficulties planning, organizing, and efficiently working toward future-focused goals. This may be reflected in the emphasis on posting content about the past; users with ADHD may be more likely to look back on past endeavors with after-the-fact regret and frustration due to the effects of ADHD on following through on plan. Lastly, even the finding of ADHD adults using more filler words, netspeak, and other informal language may reflect difficulties related to self-expression in terms of developing, organizing, and expressing one’s ideas. In sum, many of the findings herein can be viewed as being consistent with ADHD behaviors seen in other settings as well as providing novel insights into the experience and mindset of adults with ADHD in a forum not available in traditional research or clinical settings, though these are hypotheses to be tested in future studies.
While the motivation of this work is to understand the manifestation of ADHD symptoms in online spaces, it is only a preliminary step toward considering the possible role of online platforms in the detection of and intervention with social media users with ADHD. We found that social media language is predictive of ADHD with an out-of-sample accuracy of 76% (.836 AUC). The use of a balanced data set for our analysis (an equal number of ADHD users and control users) allowed us to investigate the emergent linguistic markers of ADHD, but is inadequate for screening detection of a condition estimated to affect less than 10% of the U.S. population. Future studies should examine the integration of both social media data and clinician judgment for ADHD screening and/or diagnosis, such as with a semistructured interview delivered by a clinician (Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5; American Psychiatric Association, 2013).
There are several other limitations of this study. While identifying individuals with ADHD who described having been diagnosed with the condition, there is no means to verify whether they were formally diagnosed by a mental health professional and, even if so, the quality of the evaluation, including identifying comorbid conditions. The nature of the Twitter platform limits users to 140 character posts, which puts a limit on expression that thereby may limit the content and manner of posts and, thus, conclusions that can be drawn, such as the necessity to use netspeak to stay within the character limit. Moreover, while using established protocols for gathering data and models of ADHD for examining the results, interpretation of the meaning and intent of word clusters used in posts in this sort of research requires a degree of empirically- and clinically informed conjecture.
The feasibility of social-media-based assessment of ADHD also raises ethical questions. Employers and insurance companies, for example, may be motivated to assess people using their social media. As ADHD may be viewed as a “mental illness” and carry social stigma and may engender discrimination, data protection and ownership frameworks are needed to make sure the data are not used against the users’ interest (McKee, 2013). Few users realize the amount of mental-health-related information that can be gleaned from their digital traces, so transparency about which indicators are derived by whom for what purpose should be part of ethical and policy discourse.
The current study reflects the use of a novel source of information about the experience of adults with ADHD. It involved the examination of social media posts by adults self-described as having been diagnosed with ADHD compared with those of non-ADHD controls. These results have provided preliminary insights into the thoughts and beliefs and emotional experiences of adults living with ADHD based on the language-based analysis of content they chose to post on a public forum. These results were considered in light of their correspondence with existing extant research on adult ADHD, including that of common thoughts and beliefs endorsed by ADHD adults in light of the linguistic categories and themes from online posts in this study. It is hoped that this line of research, in general, and the findings reported in this study, specifically, offer a new and distinct avenue for developing a better understanding of adults with ADHD and potential for using social media for ongoing research as well as potential screening and clinical uses yet to be developed.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
