Abstract
Objective tools to assess suicide risk are needed to determine when someone is at imminent risk. In this pilot laboratory investigation, we used a within-subjects design to identify patterns in text messaging (short message service) unique to high-risk periods preceding suicide attempts. Individuals reporting a history of suicide attempt (N = 33) retrospectively identified past attempts and periods of lower risk (e.g., suicide ideation). Language analysis software scored 189,478 text messages to capture three psychological constructs: self-focus, sentiment, and social engagement. Mixed-effects models tested whether these constructs differed in general (means) and over time (slopes) 2 weeks before a suicide attempt relative to lower-risk periods. Regarding mean differences, no language features uniquely differentiated suicide attempts from other episodes. However, when examining patterns over time, anger increased and positive emotion decreased to a greater extent as participants approached a suicide attempt. Results suggest private electronic communication has the potential to provide real-time digital markers of suicide risk.
Suicide is a serious public health problem and a leading cause of death around the world. In fact, more deaths occur by suicide than by all other interpersonal violence, including war and homicide, combined (World Health Organization, 2009). Despite growing awareness of and research into suicide, rates today are very similar to those from the 1950s (Centers for Disease Control and Prevention, 2014), which indicates a critical need for better ways to identify and intervene with individuals at risk of suicide. Using a novel within-subjects design, the current pilot study sought to analyze private text messages (short message service, SMS) of nonfatal suicide attempters and identify patterns uniquely indicative of acute suicide risk (i.e., communication patterns immediately preceding suicide attempts vs. during periods of lower risk, e.g., suicide ideation only or depressed mood). To this end, we aimed to improve our ability to assess suicide risk dynamically in real time.
Need for Better Identification of Acute Suicide Risk
Given the staggering toll suicide takes on society, it is surprising that our methods for identifying when individuals are at highest risk of suicide remain ineffective. One reason for this is that suicide researchers over the past several decades have focused primarily on identifying general risk factors for suicide. As a result, the ability to identify groups of individuals at risk is impressive for a significant yet relatively rare clinical outcome. A recent model that used data from the World Health Organization, which included known risk factors for suicide attempt, accounted for 80.3% of the variance (Nock, Borges, & Ono, 2012). However, such general risk factors fail to tell us when someone is at imminent risk of suicidal behaviors. In other words, even if we know which individuals may be most vulnerable at some point, we currently lack the tools to assess whether or when that individual will actually take action to make a suicide attempt. By comparing multiple periods of high as opposed to low suicide risk, all within a sample of suicide attempters, we attempted to trace how suicide risk changes dynamically to better understand proximal risk for a suicide attempt.
Our chief method for assessing acute or short-term suicide risk remains clinicians’ judgments, which, unfortunately, have been shown not to accurately predict future suicidal behaviors (Nock, Park, et al., 2010). Difficulty in clinically assessing risk stems from the near universal reliance on self-report, which is highly problematic for several reasons, including the fact that those at greatest risk may be motivated to conceal their thoughts (e.g., to avoid or gain release from hospitalization) and that people may lack the ability to accurately assess the factors affecting their current risk. Thus, there is an urgent need for novel, data-driven tools to assess acute suicide risk. In light of these challenges, recent work has sought to use behavioral tools to overcome problems associated with self-report. We sought to take a similar behavioral approach via a within-person examination of electronic personal communications. This approach helps avoid the complications inherent to either patient or clinician self-report and moves away from traditional between-subjects comparisons, in which many features distinguish people who have as opposed to have not attempted suicide (beyond just suicide attempt status).
Novel Approaches Needed to Study Low-Base-Rate Behaviors
Despite frequent calls to develop better ways to identify risk of serious suicidal behaviors, researchers studying suicide face tremendous methodological challenges, several of which we addressed in this study. First, suicidal behaviors have low base rates, which makes it difficult to obtain sample sizes large enough to prospectively predict future suicidal behaviors. To illustrate the problem, one recent study estimated 12-month presence of suicide attempt for adults at 0.3% (Borges et al., 2010), which would mean that at least 300 unselected individuals would be required to produce a single suicide attempt during a 1-year follow-up period. Further complicating this problem is the fact that the progression to a suicide attempt (e.g., decision-making and planning associated with a suicide attempt) usually begins less than a week before the attempt (Millner, Lee, & Nock, 2016), which suggests the critical period to examine is very narrow. By reconstructing the timeline of recent suicidal behaviors through a retrospective clinical interview, we used a prospective research design to understand which features in text communications predicted suicide attempt, which overcomes the power issues of a “true” prospective study design.
Second, the problem in suicide research is that the subjects of greatest interest—suicide completers—cannot be directly studied (Millner et al., 2015). Therefore, researchers must rely on individuals with nonlethal forms of suicidal thoughts and behaviors. Because of low base rates, many studies have used less severe forms of self-harm, such as suicide ideation, to serve as outcome measures. The shortcoming of this approach is that prior research suggests that risk factors associated with suicide ideation differ from those of more serious suicidal behaviors, such as suicide attempts (May & Klonsky, 2016; Nock, Hwang, Sampson, & Kessler, 2010). By recruiting only individuals with a history of acting on their suicidal thoughts (i.e., actual suicide attempts) or on the cusp of acting on their thoughts (i.e., aborted or interrupted suicide attempts), we focused on those behaviors most strongly associated with and predictive of suicide completion.
Rise of Digital Text Data
The rising use of smartphones and content-sharing services, such as e-mail, blogs, crowd-source review sites, and social media, has resulted in a proliferation of unstructured textual data that provide a rich source of information that can be analyzed to extract characteristics of the individual (Cambria, Schuller, Xia, & Havasi, 2013; Kagan, Rossini, & Sapounas, 2013). For example, researchers in the field of sentiment analysis have used machine learning and natural language processing to capture an author’s intended sentiment (e.g., attitude, opinion, or emotional state) from subjective textual data. Such analytic approaches have recently garnered the attention of the mental health community, including suicide researchers. Recent studies have focused on using text-analytic approaches on clinical notes to determine long-term predictors for suicide (Hammond & Laundry, 2014; Kagan et al., 2013) and emotions predictive of suicide (Pestian, Matykiewicz, & Linn-Gust, 2012; Sohn et al., 2012; Yang, Willis, De Roeck, & Nuseibeh, 2012). Key findings include that early childhood abuse reported in clinical records predicts suicide attempts and that computer algorithms are more accurate than clinicians in distinguishing genuine from fake suicide notes.
Again, however, such approaches are better suited to tell us who is at risk, not when. The ubiquity of mobile technology, including private text data, offers ripe opportunities for the real-time quantification of individual-level human behavior, known as “digital phenotyping” (Onnela & Rauch, 2016; Torous, Kiang, Lorme, & Onnela, 2016). In this study, we collected participants’ smartphone text-messaging data and examined this use of language over time, which allowed for insight into how communication patterns changed as an individual drew closer to a suicide attempt (Gunn & Lester, 2012). We used a tool developed by James Pennebaker—Linguistic Inquiry and Word Count, or LIWC (Pennebaker, Boyd, Jordan, & Blackburn, 2015)—to analyze various properties of text communications. LIWC searches a text file and calculates normalized word counts for nearly 6,400 words, word stems, and emoticons that have been previously categorized into a number of linguistic and psychological dimensions. Private digital communication, which is one among numerous data streams potentially useful for digital phenotyping, was an ideal source of data for suicide research because it provides ecologically valid data that accumulate automatically and are thus resistant to biases common to the research process, such as demand characteristics or efforts at impression management. Ours is the first study, to our knowledge, to collect and analyze private text-messaging data for suicide (or any other clinical outcome).
Communication Features of Interest: Self-Focus, Sentiment, and Social Engagement
The main objective of this study was to test whether features of text-messaging data could identify and differentiate increasingly severe levels of suicide risk. We tested both whether these characteristics differed in general (mean differences) and over time (slope differences) between periods of high suicide risk (before suicide attempts) as opposed to periods of lower suicide risk (in decreasing order of risk: suicide ideation, depressed mood, or positive mood). The subset of communication characteristics we focused on were chosen on the basis of theoretical interest and prior research supporting their relevance in suicide-related outcomes. 1
First, one theory of suicide posits that suicide is a means to escape from negative self-focus (Baumeister, 1990). It follows that the feedback loop of increasing self-focus and painful recognition of self-failures may lead someone to act on suicidal thoughts as a means of escape. Indeed, previous research has shown that suicidal individuals tend to be more self-focused in their communication. One study found that poets who completed suicide as opposed to did not complete suicide relied on first-person pronouns to a much greater extent (Stirman & Pennebaker, 2001). Furthermore, in transcribed verbal interviews with suicidal and control adolescent inpatients, use of first-person pronouns was significantly higher for suicidal participants compared with control participants (Venek, Scherer, Morency, Rizzo, & Pestian, 2014). Lastly, first-person pronoun use was associated with a transition from a general mental health forum to a suicide forum among users of a popular online forum (De Choudhury, Kiciman, Dredze, Coppersmith, & Kumar, 2016). In the current study, we tested whether use of first-person pronouns (as an indicator of self-focus) was greater before suicide attempts compared with episodes of suicide ideation, depressed mood, or positive mood.
Second, prior research has identified depressed affect (Bulik, Carpenter, Kupfer, & Frank, 1990), hopelessness (Hawton, Casanas, Haw, & Saunders, 2013; Smith, Alloy, & Abramson, 2006), and anxiety (Nock, Deming, et al., 2012) as important risk factors for suicide, which suggests that individuals at suicide risk may use language expressing these emotions at higher rates and with greater negative valence. In support of this idea, use of positive and negative emotion words in transcribed verbal interviews were significantly different among suicidal adolescent inpatients compared with control participants (Venek et al., 2014). In our study, we tested whether attempt episodes demonstrated significantly greater use of negative emotion words and less use of positive emotion words (as indicators of negative sentiment). We also tested whether attempt episodes involved greater use of words related to the concept of death.
Third, according to the interpersonal theory of suicide (Joiner, 2005; Van Orden et al., 2010), suicide may result from feelings of perceived burdensomeness and thwarted belongingness. In theory, social support should combat such feelings and increase feelings of connectedness. Previous research examining between-subjects differences in perceived social support indirectly supports this hypothesis. In a study of Twitter users who were currently suicidal, compared with nonsuicidal Twitter users, individuals reported significantly less belongingness and higher burdensomeness (Braithwaite, Giraud-Carrier, West, Barnes, & Hanson, 2016). In another study, perceived social support from family was lower for hospital emergency department patients with as opposed to without a past suicide attempt (Thompson, Kaslow, Short, & Wyckoff, 2002). In the current study, we tested whether suicide attempters demonstrated greater signs of disengagement from and burdensomeness on their social support networks before suicide attempts compared with other episodes by examining patterns in outgoing compared with incoming messages.
Overview and Hypotheses
In this pilot investigation, we proposed a novel way of using digital data streams, specifically private electronic communication, to identify unique textual patterns that occur in advance of suicide attempts and during periods of heightened suicide risk. Specifically, we asked participants with a history of suicide attempt or attempts to retrospectively identify and characterize different periods of their lives: suicide attempts (defined as actual, interrupted, and aborted attempts), suicide ideation, depressive episodes, and periods of positive mood. We then quantitatively compared whether and in what ways their text messages from periods of acute suicide risk (2 weeks preceding a suicide attempt as a means to capture the period of escalation toward a suicide attempt) differed from other periods of moderate (suicide ideation) or minimal/no suicide risk (depressed/positive mood).
We analyzed and compared text messages during episodes within-persons on the basis of a number of features selected a priori and tested not only for overall mean differences between episodes but also for differences in day-to-day change over time. In this way, we aimed to combine a rich digital data set and quantitative text-analytic methods with laboratory-research methodology to address a critical public health problem in the service of improving the ability to assess and identify suicide risk in real time.
Although no studies to date have examined text-messaging content or any other private (i.e., not publicly available) electronic personal communications, we had several hypotheses based on the psychological theories of suicide previously discussed. Using a within-subjects approach, we hypothesized that characteristics of text-messaging content during periods of higher suicide risk would differ from those of lesser risk. Specifically, messages before a suicide attempt would demonstrate increased self-focus (i.e., greater singular first-person pronoun usage), greater negative emotional content (i.e., greater frequency of negative affect words in general and specifically related to anxiety, anger, and sadness and lesser frequency of positive affect words), and decreased social engagement (i.e., lower ratio of sent vs. received text messages).
Numerous aspects of this research were exploratory by nature. Given the lack of prior research examining when one might expect any differences to emerge before a suicide attempt, we did not have hypotheses on whether language differences would be observed for episodes overall (means) but not changes over time (slopes) or vice versa. We also did not have hypotheses about the pattern of any observed differences among episode-type comparisons, such as whether differences would be unique to suicide attempts (differentiated from all other episode types) or shared between suicide attempt and ideation episodes (differentiated only from depressed and positive episode types).
In this study, we aimed to use ecological private SMS data to gain insight into possible novel, real-time digital biomarkers of suicidal behaviors. By better understanding how language differs and changes as suicide risk increases, it may eventually become possible to develop more accurate and objective tools to determine level of suicide risk in real time and get individuals the help they need before they attempt suicide.
Method
Participants and recruitment
A sample of 33 participants with at least one reported past suicide attempt was recruited from the University of Virginia Psychology Department’s participant pool and from the Charlottesville community. To reach the target recruitment of the lab study, 2,377 individuals were screened online, and 77 individuals were screened by phone. See Figure S1 (in the Supplemental Material available online) for a CONSORT diagram detailing specific numbers and reasons for exclusion.
Materials
Prelab study screeners
Online screening surveys
Participant pool participants were selected on the basis of two surveys (full screening surveys and other study materials are available on OSF at https://osf.io/9f3v2/). On the participant pool pretest administered at the beginning of the semester (Survey 1), participants were asked, “Have you ever had a period of sadness in the past during which you felt hopeless?” and if so, they were then asked whether they would like to be contacted about possible participation in studies that ask more questions about this period of time in their life. Those who said yes to both pretest questions were e-mailed a link to an additional two-question survey (Survey 2), which asked, “Have you ever made a suicide attempt?” and “Have you ever had thoughts of wanting to kill yourself?” Those endorsing a past suicide attempt were e-mailed and invited to participate in a phone screen to determine whether they qualified for the study.
Phone screen
The purpose of the phone screen was to provide potential participants with more information about the study and to ensure inclusion criteria were met. Inclusion criteria included: (a) group status confirmed by report of past suicide attempt, (b) adult status (≥ 18 years old), (c) availability of and access to personal-messaging data dating back before significant life events (e.g., suicide attempt), and (d) minimal or no current desire to die (i.e., ≤ 5 on a 0–10 Likert scale and no current suicide plan/intent). Any participants with intense thoughts of suicide who were determined to be at “high risk” or “imminent risk” for suicidal behavior (as determined by a suicide risk assessment instrument) were excluded from study participation and referred for clinical care. Given that we were interested in collecting and analyzing text communications made before suicidal or other events, participants were excluded if they did not have access to at least one data service type (e.g., text messages, Facebook) dating back to before their most recent suicide attempt.
Communications data collection
Participants downloaded their communication data in the lab with the experimenter’s assistance, which ensured transparency throughout the process. SMS text message data from iPhone and Android phones were accessed using third-party software or phone applications. Specifically, participants with iPhones were instructed to download their SMS text messages using software programs iExplorer (https://macroplant.com/iexplorer), SynciOS Manager, and SynciOS Data Recovery (https://www.syncios.com), and those with Android devices were instructed to download their messages using Android mobile apps SMS Backup & Restore (https://synctech.com.au/sms-backup-restore/) and SMS to Text (discontinued). Al-though most text data were successfully extracted from both Android and iPhone devices, a number of iPhone users had encryption settings on their phones that prevented third-party iPhone software from accessing text data.
Participants were asked to bring into the lab as many other devices as they thought might contain digital data (e.g., laptops with iTunes backup, older phones), and all available SMS data from each device were downloaded (i.e., not those from only certain dates or certain recipients). Attempts were also made to retrieve data via cloud storage (e.g., iCloud) for messages not available on physical devices (e.g., if a participant had a relatively new phone). Although participants were generally interested in providing as much information as possible, efforts to collect additional messages outside of their current mobile device were minimally successful. Only a small number of participants brought in or had access to old phones. In addition, many participants did not know whether they had laptop or cloud backups, and among those who did, a number could not remember their encryption password necessary to access iTunes/iCloud.
Additional forms of personal digital data, including phone call history, Google data (Gmail, Hangouts, and Chrome browser history), Facebook messages, and Twitter messages, were collected for the purposes of future analyses but are not part of the current study. After the lab session, the raw downloaded text data were transferred to and stored on a secure server intended for the storage of sensitive information. This data storage plan was reviewed and approved by the institutional review board.
Interview and episode identification
The goal of the laboratory-based interview was to learn about past suicidal and nonsuicidal events in greater detail so that digital communication made during, just prior, or both to these events could later be compared using text-analytic techniques. During the interview, participants were asked to identify a number of specific events or episodes in the past and the calendar dates during which the episodes took place. Episodes included (a) past actual, interrupted, or aborted suicide attempts, using the 2-week period before the attempt as a “suicide attempt episode”; (b) 2-week episodes of suicide ideation (not resulting in a suicide attempt); (c) 2-week episodes of depressed mood or high stress (not resulting in suicide ideation or attempt); and (d) 2-week episode of positive mood (i.e., more positive mood than usual and no ideation or attempt). (Note that reported “suicide attempts” also included incidents in which no physical attempt was enacted but in which participants considered their planning or actions to constitute a higher level of suicidal intent than “suicide ideation” so they subjectively endorsed making an attempt on the screener questions.) The decision to set each episode at 2 weeks long was made in a conservative effort to capture the critical period of increased ideation, planning, and intent leading up to a suicide attempt (Millner et al., 2016). The question of the optimal time window was also assessed empirically, using visual inspection of descriptive figures to identify when the rate of change appeared to be most pronounced. Specifically, temporal visualizations, using a smoother (loess) function, were constructed for each variable of interest and consisted of graphs plotting daily means for the given variable during the 30 days before and after the attempt. Although highly variable across variables, changes tended to occur most notably and often around 7 days before the suicide attempt. A total of 3 to 12 episodes were collected for each participant depending on the presence and number of reported events. For each episode type, we asked about a maximum of three episodes (e.g., three suicide attempts if participant has three or more lifetime attempts).
Critically, classifications of these reported episodes became the basis of the study’s within-subjects design; episode type served as our main predictor variable and language characteristics of text messages during the episodes as the outcome variables. In terms of suicide risk levels, attempt episodes were considered “high risk,” ideation episodes were considered “moderate risk,” and depressed and positive episodes were considered “minimal/no risk.” Participants answered additional questions about each reported episode as a way to minimize the risk of misclassification, including episode-specific questions about suicidal thoughts and behaviors, depression/anxiety symptoms, and state mood. Additional descriptive characteristics are available as Supplement 1 in the Supplemental Material.
General questionnaires
Participants completed a number of general questionnaires at the end of the study, which were used to characterize the sample and were not tied to specific episodes. Specifically, participants provided information about their age, gender, race/ethnicity, citizenship, education, marital status, employment status, and living situation. Participants also reported on current and past treatment experience (e.g., medications, therapy) and psychiatric diagnoses. Some items related to symptom history were adapted from the screener sections of the World Health Organization World Mental Health-Composite International Diagnostic Interview (Kessler et al., 2004; World Health Organization, 2014).
The Self-Injurious Thoughts and Behaviors Interview (Nock, Holmberg, Photos, & Michel, 2007) was used to assess participants’ history of self-injurious thoughts and behaviors. Participants were asked to rate the presence and frequency of each behavior (i.e., nonsuicidal self-injury, suicide ideation, suicide plan, suicide attempts, subset of suicide attempts requiring medical attention) within the past month, the past year, the past 3 years, and lifetime.
Procedure
Participants were invited into the laboratory for one 2- to 2.5-hr session to complete several tasks. First, participants were instructed to download their private data sources (e.g., SMS). Second, participants were interviewed by the experimenter and asked to identify a number of episodes, including dates of past suicide attempts (and interrupted or aborted attempts) and 2-week episodes of suicide ideation, depressive mood, and positive mood. Participants were then asked to describe specific details of and context surrounding each time period. Last, participants completed the aforementioned questionnaires.
Risk assessment
Prior research indicates that asking young adults with previous suicide attempts about suicide does not cause an increase in psychological distress or increased suicidal thoughts or behaviors either immediately following an assessment (Gould et al., 2005) or several years after an assessment (Reynolds, Lindenboim, Comtois, Murray, & Linehan, 2006). However, as a precautionary measure, participants were asked two questions regarding negative mood and desire to die both at the beginning and at the conclusion of the lab session to assess any changes as a consequence of the interview and study visit. Those who significantly increased in negative affect or suicidality (i.e., any increase of 2 points or greater on the 0–10 negative mood and desire to die rating scales), were elevated in current suicidality (i.e., score greater than 3 on question about desire to hurt self), or both were administered a formal suicide risk assessment and assigned a risk level on the basis of their answers. The protocol was that those considered at moderate risk would be assisted in developing a safety plan, or a series of steps to take to keep one safe when feeling suicidal; those considered as high risk or imminent risk of suicide would be asked to provide a contact number and were immediately contacted by the laboratory director, a licensed clinical psychologist (although no participants ended up being at high or imminent risk).
Plan for analyses
Data preparation and scoring
Given that the format of SMS data differed between iPhone and Android phones, participants’ SMS data files were individually cleaned using Python to standardize the encoding of messages and naming of variables across all participants. SMS data were then merged with the participant and episode information collected during the lab study. Each individual SMS message was inputted into and scored using the 2015 version of LIWC, a language-analysis software package that calculates numeric values on the basis of the properties of the text (Pennebaker et al., 2015). The majority of LIWC variables calculate scores using the percentage of words belonging to a given category (e.g., score of 60.0 indicates 60% of words in the message belonged to the given category). These LIWC percentage scores were converted into counts and then entered as proportions in our models (i.e., number of words in a given text message belonging to vs. not belonging to the LIWC category); this approach was taken to more precisely capture the constructs of interest and appropriately weight LIWC values given the frequency of words in a given message. Other LIWC variables include counts of words (e.g., word count of message, words per sentence) and several proprietary, “nontransparent” variables (e.g., tone); for the specific LIWC variables calculated for the three psychological constructs, see Table S3 in the Supplemental Material. After this information was appended, the individual SMS files were then compiled, and identifying information (e.g., message content, sender/recipient names) was removed before analysis.
Preliminary analyses
Preliminary descriptive analyses on demographic information, mental health and suicide history, and other information pertinent to the primary analyses (e.g., iatrogenic effects of the lab study) were performed on the sample of participants contributing at least one episode of messaging data.
Primary analyses
We performed inferential analyses using mixed-effects models to examine within-subjects (between-episodes) differences among suicide attempters in several communication features. In these analyses, we focused on testing for both mean and slope differences between suicide attempt and other episode types (suicide ideation, depressed mood, positive mood) for the three previously discussed psychological constructs given our interest in understanding whether and how communication patterns differ across episodes and whether there are language patterns unique to being in an imminent suicidal state.
Advantages of mixed-effects models
A mixed-effects method was selected because of its well-established advantages in terms of producing more accurate effect estimates and its ability to handle missing data, nonnormal outcome data, and unbalanced classes (Baayen, Davidson, & Bates, 2008; Dixon, 2008; Jaeger, 2008). Using mixed models allowed us to account for variability among participants, episodes, and messages, leading to more accurate and generalizable population estimates for within-subjects effects of episode type and resolving nonindependence of the nested data. This approach also allowed us to maximize sources of variance by analyzing on the message level rather than only mean values by episode and was especially appropriate for this data set given the amount of “missing data” (i.e., participants varied widely in terms of the number of episodes types they reported and for which they had text data).
Selection of random effects and specification of models
In line with recommendations by Barr, Levy, Scheepers, and Tily (2013), a maximal-random-effects model, which included random intercepts of participant, episode, and message, and random by-participant slopes, was used to boost generalizability of the findings and protect against inflated Type I error rates. Models were fitted using the lme4 package (Version 1.1-21; Bates, Maechler, Bolker, & Walker, 2019) for the R software environment (Version 3.6.1; R Core Team, 2019). Generalized linear mixed models used the glmer function with a logit link function appropriate for binomially distributed data, which transformed parameters into log-odds units. The estimated regression coefficients produced were on a log scale. For this set of analyses, a significant result means that the odds of a category-specific word appearing in text messages differed as a function of episode type. For outcome variables that were not proportion scores (e.g., number of messages sent per day), the raw continuous variable was used in a linear mixed-effects model, using the lmer function. (For additional information about mixed-effects models, see Supplement 2 in the Supplemental Material).
Plan for primary analysis 1: Does language differ between episode types?
A series of mixed-effects models were performed with episode type as a within-subjects fixed effect (four levels: attempt, ideation, depressed, positive) and the language feature of interest as the outcome variable. A random by-participant slope for episode type and random intercepts of participant, participant-episode, and message were included as the random effects. Likelihood-ratio (Wald χ2) tests were performed to compare goodness of fit for models including and excluding the fixed effect of episode type. A significant test is conceptually similar to an omnibus test for a predictor in an analysis of variance and therefore indicates whether the inclusion of the fixed effect significantly improves model fit. Any significant tests were followed up with pairwise comparisons between each of the episode types (using z and t statistics for binary/proportion and continuous outcome variables, respectively). Mixed-effects models do not yield straightforward effect size statistics, unlike other regression models (e.g., R2), and there is not a consensus on the most appropriate approach to take (see Peugh, 2010). Here, β (standardized values of the model parameter estimates in log odds units) are reported and serve as effect sizes.
Plan for primary analysis 2: Does language approaching a suicide attempt change differently over time relative to language changes during other episode types?
Similar to the first set of analyses, the second set focused on testing for differences between suicide attempt episodes and other episode types but did so by examining differences in changes in communication over time (rather than overall mean differences) between episode types. The purpose of this second set of analyses was to examine whether communication changed differently during the 14 days leading up to a suicide attempt compared with changes during other 2-week periods for episode types of lower suicide risk (for which there is no theoretical expectation of temporal change). A series of mixed-effects models was performed for each language feature with three fixed effects: episode type (four levels: attempt, ideation, depressed, positive), day of episode (numerical factor ranging from −14 to 0), and the interaction of episode type and day. The maximal-random-effects model appropriate for these data included random intercepts of participant, participant-episode, and message and random by-participant slopes for episode type, day of episode, and Episode Type × Day interaction. Models with the full set of random slopes did not converge; therefore, only a by-participant slope of episode type was included in the final model. Although episode type and day were included as fixed effects in the models, we were interested only in the interaction term given that we evaluated episode type separately already and did not have a theoretical interest in time as an independent variable. The same procedures were used to evaluate goodness of fit of the fixed effects and pairwise contrasts for any significant interactive effects between attempt by time and other Episode Type × Time interactions.
Condition comparisons using alternate data subsets
Additional analyses were performed to examine the same set of outcome variables but in different ways to examine the robustness of the primary results and to see whether the pattern of results changed. Those analyses are available in Supplement 3 in the Supplemental Material.
Sample size and power considerations for mixed-effects designs
Given the large effect sizes reported in a number of studies examining some of the same LIWC variables, such as self-focus (ds = 1.06–1.31; Stirman & Pennebaker, 2001; Venek et al., 2014), sentiment (ds = 0.88–1.21; Venek et al., 2014), and constructs related to social engagement, such as belongingness (d = 1.52; Braithwaite et al., 2016), we conducted a power analysis that was based on the assumption of large effect sizes (although no prior studies have examined within-subjects differences, which may or may not vary substantially from between-subjects comparisons). We determined that a sample size of 30 suicide attempters (representing 60 attempts but only 20 with collected, usable SMS data, divided by the design effect) would provide enough power to detect only large effect sizes (Cramer’s V = 0.29) for χ2 tests of mixed-effects models comparing suicide attempts with other types of episodes, assuming 80% power and a significance level of .05. Therefore, it should be noted that our study was underpowered to detect small- or medium-sized effects and therefore prone to Type II error. In addition, even among those variables for which there is theoretical reason to expect large effects, most of this research relied on between-subjects rather than within-subjects designs. Thus, it was unknown whether previously observed effect sizes would hold when we compared within-subjects episodes as we did in this pilot study.
Results
Preliminary analyses
Sample characteristics
As shown in Table 1, 33 participants reported having made an enacted, interrupted, or aborted suicide attempt in the past. 2 Most of the participants were women, White, and college-aged. As expected on the basis of recruitment criteria, all participants reported a history of at least one actual/enacted, interrupted, or aborted suicide attempt (about 80% reported making an actual suicide attempt), and about half reported a history of nonsuicidal self-injury. The majority of participants reported having struggled with a mental health problem during their lifetime, and a little over half reported having a diagnosis, most commonly a mood disorder, anxiety disorder, or both.
Participant Characteristics (N = 33)
Note: Values are percentages unless otherwise noted.
Episode characteristics
A total of 293 episodes were collected across all participants. Slightly more episodes were reported for nonsuicidal periods (i.e., depressed/positive mood) compared with suicidal periods (attempt/ideation). Among the episodes queried during the lab interview, 134 (46%) episodes contained SMS data collected from 27 different participants. Among these 27 participants, 15 had data from at least one reported suicide attempt; the other 12 participants contributed data from only suicide-ideation, depressed-mood, and/or positive-mood episodes and were still included in the analyses. Participants reported and had SMS data for approximately 1.5 episodes for each episode type on average. Across all episodes, 189,478 text messages were collected and analyzed, ranging between about 1,200 and 1,600 text messages per episode on average. Participants were generally confident about the accuracy of the dates they selected and rated 90.3% of episodes as very certain (i.e., exact days) or somewhat certain (i.e., may be off by a few days). Consistent with expectations, severity of suicide risk level of episode type was associated with greater suicide ideation and negative mood symptoms (for additional descriptive information and analyses of episodes, see Supplement 1 in the Supplemental Material).
Characteristics of suicide attempts
Additional descriptive information about reported actual, interrupted, or aborted suicide attempts was collected to better understand the methods used and circumstances leading to the attempts as well as to validate that the level of suicidal intent and lethality associated with the attempts was high (for additional information, see Table S4 in the Supplemental Material). Among attempters with SMS data, 13 participants reported a single lifetime attempt, and 14 reported multiple attempts (M = 1.74, SD = 0.86). Among the 21 attempts with corresponding SMS data, the most common methods reportedly used or considered/aborted were medications/overdose (43%), hanging/suffocation (38%), and jumping from a height (24%). On the basis of participants’ description of the objective lethality of the attempt, 29% of incidents involved some action being taken in which some physical harm was caused (e.g., taking a higher than normal dose of medication, resulting in nausea and light-headedness), whereas 71% of incidents did not result in any physical harm (e.g., driving to high place, such as a bridge, but deciding not to jump). Notwithstanding the low rates of actual physical harm, participants reported fairly high intent to kill themselves (M = 3.95, SD = 0.86), and subjective judgment of the lethality of the suicide method was reasonably high (M = 3.38, SD = 0.86), which confirmed the serious nature of the attempt episodes. Furthermore, a specific suicide plan (i.e., time and place) was present for most of the attempts (76%).
Iatrogenic effects
Regarding possible iatrogenic effects of the study, mood reported before the study (M = 6.33, SD = 1.38) did not significantly change afterward (M = 6.12, SD = 1.17), t(32) = 1.05, p = .304, but desire to die had a slight but significant decrease from before the study (M = 0.82, SD = 1.10) to afterward (M = 0.61, SD = 0.90), t(32) = 2.23, p = .033.
Results of primary analysis 1: Does language differ between episode types?
When testing whether language use was associated with episode type, analyses revealed no significant fixed effect of episode type for variables reflecting self-focus and social engagement (see Tables 2 and 3 for full results).
Analysis of Deviance Table for Fixed Effect of Episode Type on Self-Focus, Sentiment, and Social-Engagement Variables
Note: Pairwise comparisons performed only for significant fixed effects. In all cases, χ2 test df = 3. Values in boldface type are statistically significant.
Follow-Up Pairwise Comparisons for Sadness
Note: Values in boldface type are statistically significant.
Regarding sentiment, there was a significant effect of episode type on use of sad words, χ2(3) = 8.74, p = .033. Pairwise comparisons revealed that attempt episodes were significantly higher in sad words compared with ideation (z = 2.64, p = .008) but not compared with depressed or positive mood (zs = 1.07–1.09, ps = .277–.285). No significant effects by episode type emerged for more general emotional content (i.e., positive emotion, negative emotion, emotional tone) or other specific emotions/constructs (i.e., anxiety, anger, death).
Taken together, these results suggest that when examining overall mean differences between episode types, there are few differences, although suicide attempts appear to be associated with greater use of language indicating sadness. However, language indicating sadness was not unique to suicide attempts (i.e., attempts were higher in sad words compared with ideation but not depressed or positive mood).
Results of primary analysis 2: Does language approaching a suicide attempt change differently over time relative to language changes during other episode types?
When testing whether language use changed over the course of the 2 weeks leading up to a suicide attempt differently relative to change during the 2 weeks identified for other episode types, no significant interactions emerged for any of the variables related to social engagement (i.e., daily word/message counts for and ratio between outgoing and incoming messages). However, analyses revealed several significant Episode Type × Time interaction effects for the other two constructs of interest (see Table 4 for full results).
Analysis of Deviance for Interactive Effects of Episode Type and Time for Self-Focus, Sentiment, and Social Engagement Variables
Note: Pairwise comparisons were performed only for significant interactions. Follow-up pairwise comparisons are double-indented. Values in boldface type are statistically significant.
Regarding self-focus, there was a significant interaction for singular first-person pronoun use, χ2(3) = 11.57, p = .009, and pairwise comparisons revealed that the change over time approaching a suicide attempt significantly differed from change over time for ideation (z = 2.99, p = .003) and depressed mood (z = 2.10, p = .035) but not for positive mood (z = 0.89, p = .374). As shown in Figure 1a, self-focus tended to increase preceding an attempt, whereas depression and positive episodes were flatter, and ideation appeared to show change over time in a downward direction.

Differences in Episode Type × Day of Episode interaction for (a) first-person pronoun use, (b) positive emotion, (c) anger, (d) death words, and (e) emotional tone.
Regarding sentiment, the use of words indicating positive emotion, anger, death, and emotional tone changed over time differently as a function of episode type. There was a significant interaction for positive emotion, χ2(3) = 41.67, p < .001, in which positive emotion decreased more steeply during attempt episodes compared with all other episode types, including ideation (z = 1.96, p = .049), depressed (z = 5.92, p < .001), and positive (z = 3.50, p < .001) episodes (Fig. 1b). There was also a significant interaction for anger words in which anger increased more steeply during attempt compared with all other episode types, including ideation (z = 2.00, p = .046), depressed (z = 2.49, p = .013), and positive (z = 2.54, p = .0111) episodes (Fig. 1c). There was also a significant interaction for death words, χ2(3) = 9.47, p = .024. However, there were no significant differences in change over time for attempt compared with all other episodes (zs = 0.33–1.83, ps = .067–.739; Fig. 1d); ideation episodes appeared to show a steeper decrease in death words compared with both attempt and positive mood episodes. Lastly, there was a significant Episode × Time interaction effect for emotional tone, χ2(3) = 26.81, p < .001, in which attempt episodes decreased in the level of positive, upbeat language over time compared with depressed episodes, t = 4.27, p < .001, but not compared with ideation episodes, t = 1.48, p = .140, or positive episodes, t = 0.86, p = .390 (Fig. 1e).
Taken together, the results suggest that communication may change in different ways during the time leading up to a suicide attempt compared with other times of lesser suicide risk. Unique to attempts, positive emotion decreased and anger increased to a greater extent as participants approached a suicide attempt relative to the other episode types. In addition, self-focus appeared to change over time differently for attempts compared with ideation and depressed but not positive episodes, which suggests that self-focus did not uniquely distinguish attempts from other episodes.
Discussion
In this pilot study, we examined private electronic communication from past suicide attempters as a potential source of real-time digital biomarkers of heightened suicide risk. We employed a within-subjects design to evaluate how language use in text messages differed and changed over time just before a suicide attempt (high risk) relative to other periods when participants had suicidal thoughts but did not attempt (moderate risk), or were depressed but not suicidal, or during periods of positive mood (low/minimal risk). We used an automated language-analysis software (LIWC) to produce scores on a set of variables intended to capture three psychological constructs of interest—self-focus, sentiment, and social engagement—and then tested both for overall mean differences in language use and for differences in changes over time during the 2 weeks before a suicide attempt relative to during other episode types.
In terms of overall mean differences, few reliable differences emerged, although results indicated that the period of high risk just before a suicide attempt was associated with messages indicating greater anxiety and sadness. However, none of these differences in language use were uniquely associated with suicide attempt episodes and are therefore not specific characteristics of a high suicide risk state. Although language use was different between attempt and ideation episodes on a number of language features, such as sadness, these differences did not hold we compared attempts with other nonsuicidal episodes. Therefore, these mean differences analyses were unable to identify language features in text messages that could reliably identify high or moderate suicide risk states.
However, when examining differences in patterns over time, results suggested that communication changed in different ways during the time leading up to a suicide attempt compared with other periods of lesser risk. Unique to attempts, anger increased and positive emotion decreased to a greater extent during the 2 weeks before suicide attempts relative to the other episode types. Language indicating self-focus tended to increase over time during attempt episodes, although the trajectories for these variables could not reliably differentiate high suicide risk from other risk states. Overall, these results indicate that a small set of specific private text communication habits, particularly tied to use of emotional language, potentially provide clues into the suicidal mind and may serve as temporally sensitive markers of suicide risk.
Digital communication patterns as novel markers of risk
Self-focus
Operationalized by singular first-person pronoun use (“I” words), we hypothesized that self-focus would be greater during suicide attempt episodes relative to other episodes. This hypothesis was based on a theory construing suicide as a means to escape from negative self-focus (Baumeister, 1990) and the vicious feedback loop created from increasing self-focus and recognition of self-failures as well as previous research demonstrating more self-focused communication among suicidal individuals (Stirman & Pennebaker, 2001; Venek et al., 2014). Results indicate that self-focus was not especially pronounced for attempt episodes when considering the entire 2-week episode, but self-focus appeared to increase during those 2 weeks. However, this increase was steeper relative only to ideation and depressed mood, not relative to positive mood.
This result ran contrary to our hypothesis and the general premise that degree of self-focus would be expected to map onto risk levels (i.e., no, low, moderate, high) in a linear manner. However, prior research suggests a possible reason why a language variable such as self-focus may behave similarly for attempt and positive episodes despite them being on opposite ends of the risk scale. Agitation and anxiety are better predictors of suicide attempt than a clinical diagnosis of depression (Busch, Fawcett, & Jacobs, 2003; Nock, Hwang, et al., 2010), which is in line with the thinking that a suicide attempt, in contrast to suicide ideation or depression, is a behavior that requires energy and activation to enact. Accordingly, it is plausible that there are similarities between psychological states during attempt and during positive episodes, which may be reflected by similarities in language use. This result potentially suggests that self-focus may be especially sensitive to changes in risk related to increased energy or activation and that the “signal” may be detectable only by examining subtle temporal changes. Even so, this interpretation is based on speculation, and change in self-focus over time did not differentiate high risk from other lower risk states, which reflects limits to its current utility as a means to identify individuals at risk of suicide attempt.
Sentiment
We hypothesized that text message communication before suicide attempts (relative to other time periods) would exhibit more negative sentiment (higher negative emotion and lower positive emotion) given prior research identifying a number of affective and emotional factors associated with suicide risk. One question we had was whether language reflecting negative emotion (or lack of positive emotion), which is a common feature of many psychological disorders (Brown, Chorpita, & Barlow, 1998), could be used to make fine-grained distinctions between levels of suicide risk. Although suicide attempts were generally associated with higher levels of sadness and anxiety (but with significant overlap with other episodes), suicide attempts were uniquely associated with changes over time in both positive emotion and anger. Specifically, decreases over time in positive emotion and increases over time in anger were markedly steeper leading up to a suicide attempt compared with other episodes.
A difference between episode types in anger is not entirely surprising given prior research has found an association between trait anger and suicide attempts (Ammerman, Kleiman, Uyeji, Knorr, & McCloskey, 2015; Daniel, Goldston, Erkanli, Franklin, & Mayfield, 2009; Hawkins & Cougle, 2013) and as previously discussed, the fact that suicide attempts may require an increase in activation to enact. However, the fact that such a difference emerges only when looking at language use over time (i.e., leading up to a suicide attempt) underscores the potential importance and utility of examining risk factors for suicide attempts dynamically. Likewise, it was somewhat unexpected that differences between suicide attempt and other episodes for lower positive emotion emerged only when examining change over time given that one might expect a more persistent lack of positive emotion in language during the 2 weeks before a suicide attempt. These findings raise the intriguing possibility that psychological constructs such as sentiment, which do not seem especially specific to suicide, may serve as unique indicators of high suicide risk when examined over time in high-risk populations.
Use of death-related words did not significantly differ between episodes, which suggests the need to identify hidden and more subtle signs of risk beyond explicit endorsements of suicide-specific language. Given that this was a small pilot study, more research is necessary, but these findings underscore the utility of identifying risk markers using real-time data and raise the possibility that specific, temporally sensitive markers of suicide risk may be found in seemingly general, trait-like psychological constructs, such as sentiment and emotion.
Social engagement
The interpersonal theory of suicide (Joiner, 2005; Van Orden et al., 2010) proposes that the motivation behind suicide is driven by feelings of thwarted belongingness and perceived burdensomeness. In theory, social support should combat such feelings (of thwarted belongingness and perceived burdensomeness) and increase feelings of connectedness. We hypothesized that suicide attempt episodes would demonstrate greater signs of social disengagement. Results did not support this hypothesis. Differences in counts of and ratios between sent and received text messages between attempts and other episodes did not emerge whether we looked at episodes overall or over time.
Although these data do not provide evidence that communication habits, separate from language content, may be useful indicators or suicide risk, it is possible that these particular methods for capturing social engagement were too basic to detect any meaningful signal. For example, we examined only aggregate information about incoming and outgoing messages and were not able to examine more fine-grained details about these interactions, such as who was initiating conversations, whether certain texts to participants were going unanswered, or whether the content of texts may have indicated signs of social distress or rejection. Future studies on these or other data could examine more intricate interpersonal dynamics to better understand whether other social factors may help identify signs of heightened suicide risk.
Future directions in textual analyses to enhance suicide prediction
There are a number of exciting questions to pursue in the future. In this study, we limited our linguistic analyses to a handful of categories according to the presence of single words in a custom dictionary (LIWC). One weakness of this approach is that examining single words in isolation can fail to capture the semantic context of the word (e.g., “not happy” would count as positive emotion because negation is not accounted for) and has not been modified on the basis of language categories tailored toward constructs of interest for suicide specifically. Future studies could examine two-word or three-word phrases (called n-grams) or word embeddings, popular natural-language processing methods, to capture more semantic meaning. Even more, it may be possible to take a qualitative coding approach whereby researchers could develop a codebook of themes (e.g., relationship distress, hopelessness) and raters could then blindly code episodes to see whether episodes differ thematically. Furthermore, data-driven methods for understanding attempt episodes include unsupervised learning techniques, such as topic modeling, to identify themes in communication before suicide attempts that might provide richer descriptions of the themes and inform future predictors of interest (for a useful review of methods for analyzing social media language, see Kern et al., 2016). A future study could also examine how language and communication habits change following a suicide attempt, which could provide potentially valuable insight into the psychological effects of suicidal behaviors. It should be acknowledged, however, that data collection of SMS and other private digital communication is likely to remain an ongoing challenge for researchers, particularly given that younger populations transition to closed ecosystems and proprietary platforms (e.g., Snapchat) for their interpersonal communication.
The results of the current study could also serve as the basis for building a machine-learning model to automate identification of text features associated with suicide risk. Machine learning refers to a set of algorithms designed to predict membership in a class on the basis of a set of features (James, Witten, Hastie, & Tibshirani, 2013). Machine-learning techniques may be particularly useful for predicting low-base-rate behaviors such as suicide attempts. In a study of Army soldiers, Kessler and colleagues (2015) created a machine-learning classifier that used known risk factors and found that 5% of individuals assigned by the classifier to the highest risk category accounted for over 50% of suicide deaths at follow-up. In another study, Walsh, Ribeiro, and Franklin (2017) used electronic health records to develop machine learning algorithms that predicted future suicide attempts among adult patients. The strong accuracy achieved in these studies demonstrates the ability to use an ensemble of predictors, which on their own would carry trivial predictive value, to predict a complex multifactorial clinical outcome. The current results could guide development of a machine-learning model to predict and classify episode types by identifying text features associated with suicide risk, which our research group has already begun to explore. Having such a rich data set (text messages) offers the opportunity to see whether a bottom-up, data-driven approach can detect signals that are statistically related to suicide but that are not known theoretically and that we as humans would otherwise not detect. In a recently published study, our group (Nobles, Glenn, Kowsari, Teachman, & Barnes, 2018) used a deep neural-net, machine-learning classifier to model within-subjects episode-type differences between attempt/ideation episodes and depressed mood episodes using an atheoretical set of communication variables (n-grams), including ones not analyzed in this current study. Sensitivity and specificity were moderate to strong, which indicates that the algorithm performed fairly well at classifying episodes. These findings suggest the promise of detecting “hidden” but meaningful signals using predictive models even when only a small number of classification units are available. Using this machine-learning framework, future studies could track people in real time (vs. retrospectively) to determine sensitivity and specificity of suicide risk predictions as well as fuse SMS with other smartphone-based digital data streams (e.g., voice samples via microphone, spatial trajectories via GPS) to provide more robust digital phenotyping (Torous et al., 2016).
Clinical implications
Despite decades of research, judgments of imminent suicide risk remain low in accuracy, partly because of a reliance on at-risk individuals’ subjective self-report, which is prone to efforts to conceal, an inability to accurately assess one’s current state, or both. Knowledge gained from this study could put us one step closer to the development of an objective monitoring tool capable of tracking individuals’ communication “behind the scenes” that would notify suicidal individuals, their clinicians, their families—or a combination of all three if their patterns of communication indicate increasing levels of suicide risk. To further increase precision, it may even be possible to someday develop a machine-learning algorithm to “learn” how a given individual differs in general from a normative sample to increase and individualize predictive accuracy. This kind of approach that is temporally sensitive and takes into account individual differences could have profound implications for predicting when a person, not just who, is at risk of suicide attempt.
Although the possible future clinical applications of this work could help address a major public health burden, the development of a predictive tool would raise a number of important ethical challenges. Such considerations include determining how the consent or permission process for users would work, who would get notified if text messages included elevated risk, who gets access to model predictions (e.g., insurance companies), how data are deidentified and stored, and what intervention would be undertaken. Similar to diagnostic medical tests that produce certain levels of false positives and false negatives, decisions would need to be made regarding the most appropriate threshold for what would be considered “elevated risk” deserving of intervention. For example, is it preferable to flag more individuals but with less certainty of risk (producing more false positives) or fewer individuals but with greater certainty of risk (producing more false negatives)? The field would also need to grapple with questions related to mandated reporting and involuntary hospitalization. For example, what is the most appropriate action for someone who denies having suicidal thoughts, plans, or intent but whose text messages indicate elevated risk? Would such a situation warrant hospitalization? These and many more ethical questions will need to be addressed if a predictive monitoring tool for suicide risk is to be effectively implemented.
Limitations and conclusion
There are several methodological limitations to acknowledge. First, dates and information regarding episodes relied on retrospective self-report, which may not have been entirely reliable, especially for less recent episodes. A prospective design in which participants were assessed frequently for the presence of suicidal thoughts or behaviors would resolve some concerns about self-report. However, such a design was not practical given the very low base rate of suicide attempts and would necessitate more participants than is feasible for a laboratory study. Furthermore, concerns about this retrospective report are somewhat minimized because although suicide history was reported retrospectively, the actual communication data used for analyses were not and thus are ecologically valid and not prone to demand characteristics.
Second, classification of episode type depended on participants’ interpretations of whether their behaviors qualified as a specific type of event. Prior research has shown that single-item self-report questions can lead to misclassification (Millner et al., 2015). To overcome this potential limitation, efforts were made to ensure the language used was precise, and multiple follow-up questions were asked to assess suicidality of each episode beyond a yes/no question (e.g., asking for suicide ideation severity using a continuous scale). However, future studies could use more strictly objective measures, such as clinical charts, to categorize events (although two suicide attempts with the same level of medical severity or lethality do not necessarily entail the same extent of planning, intent, and desire to die associated with the act).
Third, a strength of this study design is its emphasis on differentiating suicide attempts from ideation given that we are ultimately concerned with preventing suicidal behaviors, not just thoughts. However, logistics of accessing and downloading communication data necessitated enrolling participants with nonlethal attempts. It is possible that characteristics of suicide attempters differ from those of suicide completers, which has been borne out somewhat by prior research (e.g., DeJong, Overholser, & Stockmeier, 2010; Joiner, Pettit, Walker, & Voelz, 2002). For example, suicide completers compared with nonlethal attempters demonstrate higher levels of perceived burdensomeness and are more likely to have experienced job and financial stress and used alcohol or drugs before their attempt. Furthermore, although more women attempt suicide, men are about 4 times more likely to die by suicide (Murphy, Xu, & Kochanek, 2013). Note that we used a broad definition of what we considered a “suicide attempt episode,” including not only attempts in which some concrete action was initiated (e.g., at least one pill swallowed) but also attempts that where interrupted or aborted (e.g., traveled to and strongly considered jumping from a height). This approach may have changed or decreased the size of observed effects.
Fourth, the number of participants and reported episodes in the pilot study were small and provided only enough power to detect large effect sizes. In addition, the random-effects structure of the mixed-effects models was elaborate to maximize generalizability of the results, but this plausibly resulted in further loss of power. Therefore, there is the increased possibility of Type II error. Even so, it is important to consider the trade-offs associated with various research designs.
Fifth, our analyses involved running a fairly large number of tests, which potentially increases the chances of rejecting the null hypothesis given the sheer number of tests. Given that this is a pilot study with a small and already potentially underpowered sample and the fact that we did not have concrete directional hypotheses for many of the tests, we decided not to artificially suppress Type I error (i.e., using a multiple comparisons correction) but rather view any significant results in light of these caveats (Rothman, 1990). In addition, using the maximal (or near-maximal) random effects in our models likely decreased the chances of Type I error given that prior research has argued that maximal-random-effects structures can be overly conservative and lead to a significant loss of power (Matuschek, Kliegl, Vasishth, Baayen, & Bates, 2017).
We used a novel, within-subjects, laboratory-based research design to identify and better understand real-time patterns in communication unique to periods preceding suicide attempts. This is the first research study, to our knowledge, to examine the association between private text-messaging data and mental health outcomes, suicide or otherwise. This laboratory investigation identified novel predictors of suicidal behaviors, which may be used in the future by machine-learning models to predict acute suicide risk and identify whether and where an individual is on the pathway from thinking about suicide to acting on those thoughts. It is our hope that this research puts us one step closer to developing more objective, effective ways to predict and prevent future suicide-related behaviors.
Supplemental Material
Glenn_Open_Practices_Disclosure – Supplemental material for Can Text Messages Identify Suicide Risk in Real Time?: A Within-Subjects Pilot Examination of Temporally Sensitive Markers of Suicide Risk
Supplemental material, Glenn_Open_Practices_Disclosure for Can Text Messages Identify Suicide Risk in Real Time?: A Within-Subjects Pilot Examination of Temporally Sensitive Markers of Suicide Risk by Jeffrey J. Glenn, Alicia L. Nobles, Laura E. Barnes and Bethany A. Teachman in Clinical Psychological Science
Supplemental Material
Glenn_Supplemental_Material – Supplemental material for Can Text Messages Identify Suicide Risk in Real Time?: A Within-Subjects Pilot Examination of Temporally Sensitive Markers of Suicide Risk
Supplemental material, Glenn_Supplemental_Material for Can Text Messages Identify Suicide Risk in Real Time?: A Within-Subjects Pilot Examination of Temporally Sensitive Markers of Suicide Risk by Jeffrey J. Glenn, Alicia L. Nobles, Laura E. Barnes and Bethany A. Teachman in Clinical Psychological Science
Footnotes
Acknowledgements
We thank Tara Saunders, Austin Smith, and Abbie Starns for their help with study design and data collection, the Teachman and Barnes labs for their feedback, and Clay Ford and the Research Data Services team at the University of Virginia library, Courtney Soderberg and the Center for Open Science, and Eric Turkheimer for their consultation on data-analytic issues. We also acknowledge the support and conceptual contributions by Charlene Deming, Karthik Dinakar, Adam Jaroszewski, Alex Millner, Evan Kleiman, and Matthew Nock.
Transparency
Action Editor: Christopher G. Beevers
Editor: Scott O. Lilienfeld
Author Contributions
All of the authors contributed to the study concept and design. Recruitment and data collection were performed by J. J. Glenn. A. L. Nobles and J. J. Glenn processed and formatted the data, and J. J. Glenn performed the data analysis under the supervision of B. A. Teachman. J. J. Glenn drafted the manuscript, A. L. Nobles and L. E. Barnes provided critical revisions, and B. A. Teachman provided extensive revisions. All of the authors approved the final version for submission.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
