Abstract
Issues in applied survey research, including minimizing respondent burden and ensuring measures’ brevity for smartphone administration, have intensified efforts to create short measures. We conducted two studies on the psychometric properties of single-item satisfaction, love, conflict, and commitment measures. Study 1 was longitudinal, surveying college-age dating couples at three monthly waves (n =121, 84, and 68 couples at the respective waves). Partners completed single- and multi-item measures of the four constructs, along with other variables, to examine test–retest reliability and convergent, concurrent, and predictive validity. Single-item measures of satisfaction, love, and commitment exhibited impressive psychometric qualities, but our single-item conflict measure performed somewhat less strongly. Study 2, a cross-sectional online survey (n = 280), showed strong convergent validity of the single-item measures, including that of conflict.
Social scientists often confront the need to assess multiple constructs in large research projects. Many prominent frameworks, such as self-determination theory (Ryan et al., 2008), the theory of planned behavior (Ajzen, 2012), and the health belief model (Janz & Becker, 1984) invoke five or more constructs. Thus, even if a survey measured only the minimum variables necessary for a model, the questionnaire could still be lengthy. Because lengthy questionnaires can hinder participant recruitment and retention, as well as timely survey completion, the present studies investigated the psychometric soundness of single-item measures of four major close-relationship constructs: love, satisfaction, conflict, and commitment.
Questionnaire Length
Long questionnaires can pose many problems. They can increase respondent burden (Bowling, 2005) and may lead respondents to skip items if a survey seems too long and repetitive (Robins et al., 2001). Indeed, at least three meta-analyses have linked greater questionnaire length with lower participation. Burgard et al. (2020) reported a correlation of –.19 between number of items and response rate across 20 studies (mean survey length = 54 items). Rolstad et al. (2011) examined 20 experiments in which each original researcher created two survey lengths that they deemed “longer” and “shorter” (e.g., five- vs. 15-page questionnaires) and compared the response rates. These researchers found significantly lower response rates to longer questionnaires (effect size equivalent to an odds ratio of 1.14). Finally, Mercer et al. (2015) found 20% lower response to what they considered high (vs. low) burden surveys (one criterion was whether an interview took less than or greater than 1 hr, so that this result is less applicable to brief surveys).
Even if participants begin a survey, they may stop answering midway through (i.e., “breakoff”). Mavletova and Couper (2015) meta-analyzed associations between survey length and breakoff in 39 mobile web-survey samples. There was a marginal (p = .07) effect of survey length, with the authors elaborating that, “A 10-minute survey increases the odds of breakoffs by 1.09 and a 30-minute survey by 1.42 compared to a 5-minute mobile web survey” (p. 89). Thus, even when comparing two short survey lengths (10 vs. 5 minutes), differences in breakoff rate are discernible.
With these problems in mind, researchers have long sought to shorten lengthy measures. Examples include Cacioppo et al.’s (1984) reduction of the Need for Cognition scale from 34 to 18 items; Carver’s (1997) halving of COPE subscales from four to two items; Hendrick et al.’s (1998) shortening of Love Attitude Scale subscales from seven to four items; and Gosling et al.’s (2003) Ten-Item Personality Inventory, which assesses each of the Big Five factors with two items. Some measures even feature just one item, such as the Single-Item Self-Esteem Scale (Robins et al., 2001), single-item measures of experiences in emerging adulthood (Arnett, 2016), and single-item measures of relationship satisfaction and life satisfaction (Dyrenforth et al., 2010),
A newer concern also warranting brief measures is the emergence of mobile smartphones (in 2007; Twenge, 2017) as a means for taking surveys. An American Association for Public Opinion Research task force (Link et al., 2014) stated plainly that “the adage ‘shorter is better’ applies to mobile surveys for a number of reasons” (p. 782). These include difficulty in responding due to the small screen and keyboard, and respondents being used to smartphone transactions being brief (e.g., texting or using apps). Furthermore, even if a researcher intends an online survey to be completed on a desktop or laptop, a “nonignorable and growing percentage” of people, estimated as of 2014 to range from 8 to 23%, are deciding to take such surveys on mobile devices (Link et al., 2014, p. 782). Relatedly, daily diary studies often use electronic technology and can also create respondent burden due to frequent response occasions (Chatzitheochari et al., 2018). Keeping questionnaires as short as possible is, therefore, an important applied issue in survey research practice.
Psychometric Theory and Scale Dimensionality
Scale length also, of course, implicates psychometric theory and practice. Greater numbers of items on a scale tend to make it more reliable (Robins et al., 2001). Robins et al. note that, “By aggregating over multiple items, errors—if they are random—cancel out” (p. 152). However, an assumption of random or independent errors across items may be questionable (Robins et al., 2001). Robins et al. add that, instead, “asking more or less the same question 10 times may in fact compound systematic errors” (p. 152). Another basis for determining whether a single-item measure is appropriate is how conceptually faceted a measure is. Robins et al. use the personality trait conscientiousness to illustrate a multifaceted construct, as conscientiousness taps into characteristics, such as punctuality, orderliness, and responsibility and, therefore, would almost certainly require multiple items to measure well. On the other hand, if a construct appears to represent an “unambiguous or narrow” (Allen et al., 2022, p. 2), “concrete” (Diamantopoulos et al., 2012, p. 446) unidimensional, or homogeneous idea, a single item may be sufficient. Fletcher et al. (2000) noted, as well, that narrowly focused measures can enhance face validity and avoid item-content overlap with other related constructs. Sometimes, however, a concept’s suitability for single-item measurement will be unclear without testing. Health, for example, would appear to be multidimensional, consisting of physical well-being, mental well-being, presence, or absence of disease or activity restriction, and so on (Bowling, 2005). Yet, a single-item measure asking people to rate their own health as excellent, very good, good, fair, or poor has been shown to predict mortality, suggesting that people can mentally integrate different aspects of health (Bowling, 2005).
Diamantopoulos et al. (2012) raised several other psychometric issues pertaining to the use of single- vs. multi-item scales in the context of predictive validity in the field of marketing. Given that single-item measures sometimes do function similarly to their multi-item counterparts, Diamantopoulos et al. conducted a large simulation study to pinpoint the conditions under which single-item measures may perform well. They manipulated seven factors (e.g., number of items in the multi-item measures, interitem correlations of the multi-item measures, sample size). Overall, single-item measures only rarely (14% of the time) exhibited greater predictive validity than did their multi-item counterparts (in another 26% of tests, single- and multi-item measures did not differ significantly). However, when multi-item instruments had extremely high interitem correlation (e.g., r = .80–.90), single items’ predictive validity exceeded that of multi-item scales roughly 30% of the time and the two fared similarly around 20% of the time. For small sample sizes (i.e., N = 50), multi-item scales outperformed their single-item counterparts only around half the time. Single-item measures also fared well when relatively small associations (~ r = .30) with other variables were expected. Number of items in the multi-item scales did not affect single items’ performance. Though single-item measures clearly will not outperform their multi-item counterparts across the board, Diamantopoulos et al.’s results suggest that they likely have their niches. The degree to which single-item measures perform well for romantic-relationship constructs, specifically, is the question we address conceptually and empirically in the following sections.
Need for Brief Measures in Studying Close Relationships
Among areas in social-personality psychology, the study of close relationships would be particularly likely to benefit from the development of additional brief measures. Relationship researchers frequently administer many constructs—pertaining both to relationship processes (e.g., communication, conflict) and relationship quality or outcomes (e.g., satisfaction, love, commitment)—within a single study (Joel et al., 2020). In fact, Joel and colleagues identified 27 relationship constructs that had each been used 10 or more times in a pool of 43 dyadic longitudinal datasets. Hence, single-item versions of relationship-relevant constructs may help mitigate respondent burden and attrition. In addition, rapid, repeated-measurement approaches, such as daily diaries and ecological momentary assessment fit naturally with the aim of studying changes in relationship functioning and quality (Smyth et al., 2017), but lengthy measures would impede this aim.
Three-item scales are available to study close relationships, but these may not be preferable to single-item measures. First, items on these three-item scales tend to have some degree of redundancy. As one example, Schumm et al.’s (1986) Kansas Marital Satisfaction Scale contains the items “How satisfied are you with your husband/wife as a spouse?” and “How satisfied are you with your relationship with your husband/wife?” As another example, the Commitment subscale of Fletcher et al.’s (2000) Perceived Relationship Quality Components (PRQC) Inventory repeats the sentence structure, “How _____ are you to your relationship?,” with the three items differing via the insertion of “committed,” “dedicated,” and “devoted.” Second, with the PRQC containing six subscales (satisfaction, commitment, intimacy, trust, passion, and love), even with three-item subscales, the full instrument is somewhat lengthy at 18 items.
Whereas, there is great breadth in relationship constructs, we argue that four of the most central constructs within the field are love, satisfaction, conflict, and commitment. Love has one of the longest histories of empirical assessment among relationship constructs (Rubin, 1970), along with a record of continued theoretical development (e.g., Clark et al., 2019; Lee, 1973; Rempel & Burris, 2005; Sternberg, 1986). Satisfaction and commitment are two of the most commonly studied relationship outcomes (Joel et al., 2020). Finally, conflict is perhaps the most widely studied negative phenomenon in couples, having potential deleterious effects on relationships, psychological well-being, and physiological health (Aloia & Solomon, 2015). We review the four constructs further in the following sections.
Love
Whereas, some constructs in close relationships have been conceptualized as multidimensional, they may practically and empirically be closer to having only one or two dimensions. Love, for example, has been conceptualized in many ways. Lee (1973) proposed a six-factor model, using Greek and Latin terminology (e.g., eros for passionate love; storge for love growing out of friendship; and agape for selfless, overarching love; see Hendrick & Hendrick, 1986, 1989 for elaboration). Sternberg (1986) proposed a three-factor model of love, consisting of intimacy, passion, and decision-commitment. Hendrick and Hendrick (1989) factor analyzed a large pool of purported love measures. These included attachment styles (Hazan & Shaver, 1987), Lee’s six love styles (Hendrick & Hendrick, 1986), Sternberg’s three facets, Hatfield and Sprecher’s (1986) Passionate Love Scale, and Davis’s (e.g., Davis & Latty-Mann, 1987) Relationship Rating Form, which assesses six dimensions (e.g., intimacy, passion, care/support). The factor analysis (using scale-level variables) yielded a five-factor solution, with a dominant first factor accounting for 32% of the variance. The first factor featured high loadings (.39–.85) on Lee’s eros, mania, and agape; Sternberg’s intimacy, passion, and commitment; Hatfield and Sprecher’s passionate love; and Davis’s viability, intimacy, passion, care, and satisfaction. The second factor (14% of variance) suggested a warmth/authenticity theme, whereas, none of the remaining factors had more than two high-loading items. Hendrick and Hendrick (1989) interpreted the first factor in these terms: “Passionate love was certainly a major component of the factor, but intimacy, commitment, satisfaction, and aspects of caring love also appeared to be important” (p. 792). Hence, although subscales with different contents loaded on Factor 1, it still supported the idea of a dominant, general love factor. Our single-item love measure (“To what extent do you love ____ [your partner] at this stage?”) captures this general dimension.
Satisfaction
Relationship satisfaction has mainly been conceptualized as a single dimension, centered on positive affective evaluation (Fincham et al., 2018). Fincham and colleagues advocate adding a second dimension to encompass negative affective evaluation, for which additional research would be necessary to develop a single-item version. Other terms related to, but conceptually distinct from, satisfaction include adjustment and quality. Adjustment refers to a smooth functioning relationship, consisting of good communication, few and readily resolved conflicts, and the attitude or feeling that one is satisfied with the partner and relationship, among other things (Sabatelli, 1988). Quality has sometimes been defined similarly to satisfaction and sometimes similarly to adjustment. Although relationship satisfaction and similar constructs have been conceptualized in multidimensional terms, studies have supported a dimensional solution for relationship quality that “encompasses differences in degree rather than differences in kind” (Kliem et al., 2015, p. 1197) and the notion that satisfaction has a strong positive association with markers of relationship adjustment (e.g., intimacy, agreement; Hassebrauck & Fehr, 2002). Our single-item satisfaction measure (“How satisfied are you with your relationship?”) fits best with the notion of satisfaction as a global evaluative or attitudinal reservoir of positivity toward the relationship (Fincham et al., 2018; Sabatelli, 1988).
Conflict
Conflict has been conceptualized as multidimensional, as described by Zacchilli et al. (2009). After reviewing the literature on interpersonal conflict and strategies for its resolution, Zacchilli and colleagues composed items reflecting different aspects. They then conducted exploratory factor analyses, yielding six facets they labeled compromise, domination, avoidance, separation, submission, and interactional reactivity. These six facets reflect primarily conflict resolution strategies, rather than the experience of conflict per se. An older, though still prominent, multi-item conflict measure is that from Braiker and Kelley (1979). It assesses areas, such as frequency of arguing and feeling angry toward one’s partner. Other forms of negative social interaction have also been documented, such as one person acting unsupportive toward a stressed individual or undermining another person’s goal pursuit. Brooks and Dunkel-Schetter (2011) propose an overarching concept of social negativity, consisting of three broad dimensions: conflict, insensitivity, and interference. Of these, insensitivity and interference can be considered instigators of conflict, not an overall assessment of conflict. Our single-item conflict measure corresponds to the first of the three dimensions.
Commitment
Finally, commitment has been conceptualized as consisting of two larger dimensions, dedication (i.e., intrinsic desire to be with one’s partner) and constraint (e.g., remaining with one’s partner due to hardships, guilt over violating a personal moral code, etc., that would result from leaving) (Johnson, 1982; Stanley & Markman, 1992). Constraint items typically do not inquire about constraint directly (e.g., “I am committed to my partner because breaking up would mean I had to move out”); rather they include only the latter portion (“I would have to move out if we broke up”). Rusbult et al.’s (1998) commitment model also invokes the presence or absence of attractive alternatives to one’s current relationship. Attractive alternatives can weaken one’s dedication to the current partner, but lack of attractive alternatives could also be considered a constraint that keeps someone in a relationship. Our dataset included separate multi-item commitment inventories tapping into dedication and alternatives, respectively (see Table 1), so that, we used both to enrich our analyses of this construct. Our single-item commitment measure (“How committed are you to your partner?”) leaves it to participants to decide which factors (dedication, constraint, etc.) are the driving forces. However, for laypersons, dedication may be the interpretation that most readily comes to mind. On that assumption, our single-item measure resembles typical dedication items and, thus, can stand in for that aspect of commitment.
Description of SI and MI Measures, and Predictive Criteria (Study 1).
Note. Item “To what extent do you love your partner at this stage” on MI measure, which resembled SI version, was retained, as its presence minimally affected the results (see Table 3). SI = single-item; MI = multi-item.
Alphas computed separately for men and women and for each time-point (ranges appear for each alpha). bItem “Rewarding-Disappointing” omitted to avoid overlap with disillusionment. cItem on sexual intimacy omitted, as it measures behavior whereas other items measure attitudes.
The Present Research
The present research evaluated the psychometric properties of single-item measures of relationship satisfaction, love, and commitment used by Sprecher (1999) and of conflict, which we developed. (Sprecher mainly analyzed the satisfaction, love, and commitment items together as a three-item index.) We conducted two studies, one longitudinal and the other cross-sectional. Following the approach of Robins et al. (2001), Study 1 investigated whether these single-item measures were as (or comparably) reliable (test–retest 1 ) and valid (convergent, concurrent, and predictive) as multi-item measures of the same constructs. Convergent validity involved examining correlations of single-item measures with their corresponding multiple-item scales. Performance of the single-item measures was also assessed via standard errors of measurement (gauging the precision of the single items relative to the full multi-item scales) and item response theory (IRT; gauging the single items’ functioning relative to their underlying constructs). Study 2 sought to enhance the generalizability of findings using a very different participant population than in Study 1, and also allowed for modification of any single-item measures whose performance in Study 1 was substandard.
Tests for concurrent and predictive (longitudinal) validity were similar, except for using one vs. multiple assessment waves. Each of the single-item measures (and its full-length counterpart) were examined for correlation, concurrently and predictively, with a criterion variable tailored to the single-item measure. The following groupings of variables were used:
The single-item and full-length measures of relationship satisfaction were tested for (negative) correlation with the criterion of romantic disillusionment (Niehuis et al., 2019, 2021). Disillusionment has been conceptualized, in part, in terms of perceived declining satisfaction (Huston et al., 2001; Niehuis et al., 2011) and has been related empirically to self-perceived declines in favorable relationship qualities (Niehuis & Bartell, 2006).
The single-item and full-length measures of love were tested for (positive) correlations with the criterion of the focal respondent’s affectionate behavior as perceived by the partner (Niehuis et al., 2016; Wills et al., 1974), as love and affectionate behavior are closely related (Schoenfeld et al., 2012).
The single-item and full-length measures of conflict (i.e., active disagreement and disharmony between people) were tested for (positive) correlations with the criterion of the focal respondent’s overt negative behavior (e.g., doing something intentionally to irritate the partner), as perceived by the partner (Niehuis et al., 2006, 2016; Wills et al., 1974). Whereas overt negativity can occur regardless of whether active conflict is present, conflict and negatively intentioned behaviors both fit within Brooks and Dunkel-Schetter’s (2011) larger notion of social negativity.
The single-item and two full-length measures of commitment were tested in relation to the criterion of ambivalence (Braiker & Kelley, 1979), as ambivalence would likely detract from commitment (or lack of commitment would foster ambivalence). Huston (1994) theorized that ambivalence was a central hindrance to developing commitment. In addition, Ogolsky et al.’s (2016) cluster analysis of dating couples’ commitment processes found couples with the types of commitment associated with advancing relationship seriousness (partner-focused and socially involved) to have low ambivalence.
In all analyses, the focus was on comparing the reliability and validity of a single-item version of a construct (e.g., satisfaction) vs. its corresponding full-length version. In Study 1, with college dating couples, we expected the multi-item version of a construct to exhibit greater test–retest reliability and concurrent and predictive validity correlations with criterion variables than would the corresponding single-item version (Hypothesis 1 or H1), due to the generally superior performance of aggregated items (Epstein, 1983; Robins et al., 2001). We also expected large positive convergent-validity correlations between the single- and multi-item measures of the same construct (H2–Study 1). Study 2, with adults recruited through Mechanical Turk and Facebook, we tested only convergent validity, which we again expected to be strong (H2–Study 2). Instances in which multi-item measures did not significantly outperform their single-item counterparts would suggest that the single-item versions were viable for research use.
Overall Open Science Information
Codebooks/questionnaires and statistical supplements for both studies are stored at the Open Science Framework (OSF) at https://osf.io/yfn3r/. Our statement on data sharing appears there, as well. Most analyses were extremely basic (i.e., descriptive statistics, Pearson’s correlations), so that, no programming code is provided for these. Code/syntax is provided in the relevant statistical supplements for advanced analyses we conducted (IRT and constraint tests within structural equation modeling [SEM] of equality of coefficients). Confirmatory factor analyses were conducted using a graphical (rather than syntax-based) program (AMOS; Arbuckle, 2017, with additional analyses in lavaan; Rosseel, 2012), so that, only descriptions of the running of these models appear in the relevant statistical supplement. Hypotheses were not preregistered, as doing so was not required at the time the studies were initiated. Sample sizes represented the maximum number of individuals or couples we could recruit within the context of our available time span and funding, while ensuring we could detect medium-sized correlations (i.e., r = .30) with at least 80% power.
Study 1
Methods
Sample
Data came from a larger longitudinal study of dating couples, who were assessed online via Qualtrics surveys 3 times with 1-month interval between assessments (between March 2014 and August 2019). Recruitment was based at a large state university in the southwestern United States. One previous article has been published from this dataset on an unrelated topic and using different measures One previous article has been published from this dataset on an unrelated topic and using different measures (Niehuis et al., 2019). That article provides further details on the sample and methods. Briefly, eligibility requirements for couples were that both partners be at least 18 years old, never married, and childless. No minimum relationship length was specified. Only heterosexual couples’ data were analyzed for statistical practicality (i.e., the very small number of same-sex couples being insufficient for analyzing indistinguishable dyads; Kenny et al., 2006). Sample sizes were N = 121 couples at Time 1 (T1), N = 84 couples at Time 2 (T2), and N = 68 couples at Time 3 (T3). At the study outset, participants’ mean ages were M = 21.8 years old (Mdn = 21) for men and M = 21.4 years old (Mdn = 20) for women; standard deviations (SD) for men and women were 4.9 and 5.0, respectively. Participants were mostly White (68% of men and 70% of women), with the remaining classified as Hispanic (22% of men and 21% of women) and other (10% of men and 9% of women). Analyses were conducted separately in men and in women (i.e., only actor effects and not partner effects).
Procedures
Upon Institutional Review Board (IRB) approval, the project was advertised via e-mail announcements that go to all members of a university community. Interested couples were asked to contact the research team, who provided them with a couple identification number (to be used for each of the three waves of data collection) and the link to the Time 1 survey. One month later, the couple received a link to the Time 2 survey, and so on. Partners who completed all three waves of the study were entered into a drawing for a US$10 Amazon gift card, with several gift cards available in the lottery and the odds of winning being 1 in 25.
Measures
All measures are described in Table 1. For each of our four primary constructs—satisfaction, love, conflict, and commitment—the table includes the wording for the single-item version, the name of the multi-item measure(s) used to assess the same construct (with sample items), the criterion variable for analyses of concurrent and predictive validity (with sample items), and alpha (internal consistency) reliability coefficients for multi-item measures. Means and SD for the single-item measures and their corresponding multi-item measures appear in Statistical Supplement A. Two full-length commitment measures were used: the Investment Model Scale (IMS-Commitment; Rusbult et al., 1998) and the Commitment Inventory’s (Stanley & Markman, 1992) Comparison Level (CL) of Alternatives subscale. Slight modifications to measures are detailed in Table 1. Given Robins et al.’s (2001) suggestion that unidimensional multi-item constructs were the best candidates for shortening into single-item versions, we provide evidence of each full-length measure’s unidimensionality in Supplement B.
Analysis Plan
Pearson’s correlations were first examined to assess various forms of reliability and validity. Most of the analyses were likely to yield large correlations (e.g., test–retest reliability, convergent validity), which are associated with statistical power (i.e., larger effect sizes are easier to detect). For example, with r = .40 (which we expected many correlations to exceed), a sample size of 47 yields 80% power at p < .05, two-tailed (Kohn & Senyak, 2020). Hence, our sample sizes of 121 at the initial wave of Study 1 should provide sufficient power. 2
We compared the functioning of single- and multi-item measures in the following way. If a single-item version of a measure exhibited a correlation (e.g., test–retest, convergent validity) that was larger (in absolute value), equal to, or only slightly smaller (within .10) than the corresponding multi-item version, we considered the single-item version to function comparably to the multi-item version. Only when a correlation for a multi-item version exceeded that for the corresponding single-item version by at least .10 was the difference tested for significance. Our decision to test only correlations differing by at least.10 for statistical significance of the difference was subjective; however, we believed that even if differences smaller than .10 (e.g., rs of .57 vs. .52) were statistically different, most researchers would not see a practically significant difference. Special SEMs of the analogous correlations were set up (Preacher, 2006) to test whether the multi-item version was significantly larger. 3 For each gender and construct tested via SEM, equality constraints were placed on relevant pairs of correlations (e.g., single-item test–retest from T1 to T2 and multi-item test–retest from T1 to T2). If the constraints significantly harmed model fit relative to a corresponding unconstrained model, it indicated a significant difference between the correlations. To be sensitive to instances of full-length versions outperforming single-item versions (as a quality control matter), we used p < .05 for comparisons. Controlling for multiple comparisons via more stringent levels (e.g., p < .001) would make it harder to detect differences between the single- and multi-item versions, thus, likely overstating the value of single-item measures. A detailed summary and outputs of the equality-constraint analyses are available in Supplement C. Further evaluations of the single-item measures, relative to their multi-item counterparts, were then conducted with regard to standard errors of measurement and IRT.
Results
Test–Retest Reliability
Table 2 presents 1-month (T1–T2 and T2–T3) and 2-month (T1–T3) test–retest correlations for the single- and multiple-item satisfaction, love, conflict, and commitment measures. 4 Test–retest reliability was generally strong, with all correlations in the range of .50–.89, except for four (between .39 and .47). All test–retest correlations were significant, p < .001. For men’s satisfaction and women’s satisfaction (analyzed separately by gender), the multi-item measures consistently had significantly larger test–retest correlations than did their single-item counterparts (consistent with H1). Only for men, the multi-item measures of conflict exhibited significantly greater reliability across waves than did the single-item ones (women’s difference in T1–T2 test–retest correlations between the multi- and single-item versions, though seemingly sizable, was nonsignificant). Finally, the T1–T2 test–retest correlation for women was larger for each of the multi-item commitment measures (dedication and CL for alternatives) than for the single-item commitment measure. Overall, the love construct (no significant test–retest reliability differences between single- and multi-item versions) and the commitment construct (significant differences only in one subgroup, women between T1 and T2) showed greatest comparability between the performance of single- and multi-item versions. 5
Test-Retest Correlations Across Time Points for Single- and Multi-Item Measures for Men and Women (Study 1).
Note. Each correlation significantly different from zero (p < .001). 95% confidence intervals are in parentheses. Correlations with superscript (a), as a set, differed from those with superscript (b); same for (c) vs. (d), and (e) vs. (f). Women’s T1-T2 SI commitment test-retest correlation (superscript g) differed from that for MI commitment-dedication (h) and, in a separate analysis, from that for MI commitment-CLalt (i). All significant comparisons of different correlations, p < .02 or better. SI = single-item; MI = multi-item; CL = comparison level.
Convergent Validity
For each construct, correlations between the respective single- and multi-item measures were tested for convergent validity (Table 3). These correlations were consistently strong across gender and wave, all p < .001 and many in the .70s and .80s in absolute value; though such judgments are inherently subjective, Allen et al. (2022) characterize convergent-validity correlations of r = .80 to be “good” and those of r = .70 to be “acceptable.” These findings support H2–Study 1. Because the single-item love measure (“To what extent do you love at this stage?”) was highly similar to the item “To what extent do you love your partner at this stage” on the multi-item love measure, alternative analyses were conducted removing the latter item from the multi-item measure. As shown in Table 3, however, the single-item love measure had very similar convergent-validity correlations with the multi-item love measure, whether or not the similar item was included. Hence, the similar item did not appear to inflate appreciably the convergent validity for love, and so the similar item was retained in the multi-item love measure for the remaining analyses. Of the two full-length commitment measures, one was framed in terms of CL of alternatives (CLalt), so that, a high score indicates high valuation of alternative partners. The obtained negative correlations between the single-item commitment measure and the CLalt commitment index are thus what would be expected. The correlations between single- and multi-item measures of conflict ranged from .43 to .61, which was not weak by any means, but not as strong as the correlations for other variables.
Convergent Validity Correlations Between Single- and Multi-Item Measures by Time Point and Gender (Study 1).
Note. All correlations significant at p < .001. 95% confidence intervals are in parentheses. Correlations in brackets are with item removed from multi-item love scale similar in content to the single item (see Table 1). Because the comparison level for alternatives (CLalt) measure of commitment is keyed so that a high score indicates greater favorability toward dating someone other than one’s partner, this measure’s negatively signed correlation with the single-item commitment measure would be expected. CL = Comparison Level.
Concurrent Criterion Validity
All correlations but one (single-item conflict with negative behaviors in men at T2) were significant, mostly at p < .001 (Table 4). Differences in how the corresponding single- and multi-item measures of a construct correlated with their assigned criterion variable were largely confined to relationship satisfaction and its correlations with disillusionment. Multi-item satisfaction exhibited significantly stronger correlations with disillusionment (in absolute value) than did single-item satisfaction in women at T1 and T2, and in men at T3. Also, in women at T1 and T3, the multi-item (low) commitment measure of CL for alternatives (CLalt) exhibited stronger correlations (in absolute value) with the criterion of ambivalence than did single-item commitment. This subset of findings was consistent with H1. However, most findings yielded similar concurrent validity of the single-item measures and their multi-item counterparts. Additional analyses examined specificity, the extent to which correlations between core measures (e.g., love) and their conceptually matched concurrent-validity constructs (i.e., affection) exceeded correlations between core measures and unmatched concurrent-validity constructs (e.g., love with negative behaviors), as shown in Supplement D. These analyses did not appear to yield appreciable differences based on whether the single- or multi-item versions of the core constructs were used.
Concurrent Validity Correlations of Single- and Multi-Item Measures With Criterion Variables by Time Point and Gender (Study 1).
Note. All correlations significantly different from zero at p ≤ .001, unless noted (*p ≤ .05. **p ≤ .01, ns = not significant). 95% confidence intervals are in parentheses. Correlations with superscripts (a) and (b) differ significantly from each other (p < .05), as do those with superscripts (c) and (d) (p < .05). Correlations with superscripts (e) and (f) differ significantly from each other in absolute value, tested by reverse-scoring the Comparison Level for Alternatives (CLalt) measure (p = .001). SI = Single-item; MI = Multi-item.
Partner-reported for focal respondent (e.g., women’s self-reported feelings of love would be correlated with their male partners’ reports of the women’s affectionate behaviors).
Predictive Criterion Validity
The multi-item satisfaction measure exhibited stronger negative longitudinal associations than did its single-item counterpart with the criterion of disillusionment over all three intervals (T1–T2, T2–T3, and T1–T3) in men and during the latter two in women (Table 5). In men, multi-item conflict showed significantly stronger longitudinal correlations than did its single-item counterpart with men’s overt negative behaviors from T2–T3 and T1–T3. Finally, in women from T1–T2 and T1–T3, each of the multi-item commitment measures (dedication and CL for alternatives) showed stronger absolute correlations with the criterion of ambivalence than did the single-item measure of commitment. These findings were consistent with H1.
Predictive Validity Correlations of Single- and Multi-Item Measures With Criterion Variable by Time Point and Gender (Study 1)
Note. 95% confidence intervals are in parentheses. Set of correlations with superscript (a) significantly different from those with superscript (b), same for (c) and (d), and (e) and (f). Women’s SI commitment predictive-validity correlation (superscript g) differed from that for MI commitment-dedication (h) and, in a separate analysis, from that for MI commitment-CLalt (i). All significant comparisons of different correlations, p ≤ .01 or better. Correlations with superscripts (g) and (i) differ significantly from each other in absolute value, tested by reverse-scoring the Comparison Level for Alternatives (CLalt) measure. SI = single-item; MI = multi-item.
Partner-reported for focal respondent (e.g., women’s self-reported feelings of love would be correlated with their male partners’ reports of the women’s affectionate behaviors).
p < .05. **p < .01. ***p < .001.
Standard Error of Measurement
The standard error of measurement (SEMeas) is a tool that “can be used to provide a range around the observed value within which the theoretical ‘true’ value lies” (Geerinck et al., 2019, p. 4). Tighe et al. (2010) note further that, “Any individual candidate will . . . have a particular true score, and the [SEMeas] describes the likely range of actual scores such a candidate might achieve as a result of the unreliability of the assessment.” SEMeas was defined as the SD of a given T1 variable multiplied by the square root of the quantity (1 minus test–retest reliability of the variable between T1 and T2). SEMeas is, thus, minimized when the variable has a small SD and high test–retest reliability. The multi-item measures were defined as the mean of the constituent items, so that, single- and multi-item measures were on comparable metrics. Figure 1 plots the SEMeas for each measure of satisfaction, love, conflict, and commitment in men and women. Most of the single-item measures had comparable SEMeas values to those for multi-item measures (roughly .40–.80), signifying that the single-item measures were no more prone to error or unreliability than their multi-item counterparts and could thus stand in for the multi-item versions. However, the single-item conflict measures had the largest SEMeas values (1.11 in men and 1.20 in women), indicating that these single-item measures did not function as well as their multi-item counterparts.

Standard Errors of Measurement for Men’s and Women’s Satisfaction, Love, Conflict, and Commitment, Grouped by Single- and Multi-Item Measures (Only Men’s and Women’s Conflict Are Labeled, Due to Their High Values).
Item Response Theory
We examined rudimentary graded-response IRT models (Al Nima et al., 2020; Edwards, 2009) using T1 variables. Given the technical nature of IRT and its extensive tables and figures, detailed results appear elsewhere (Supplement E). In the present studies, IRT analyses were conducted separately within each domain (i.e., love, satisfaction, conflict, commitment). Within each domain, the single-item measure was entered into an IRT analysis along with all the items for the corresponding multiple-item instrument (e.g., Study 1’s single-item conflict measure was entered along with all the items of the Relationship Questionnaire’s conflict subscale). Analyzing these items together operationalizes the underlying latent construct (here conflict) and allows one to see how scores on the latent construct map onto responses to the single-item measure (and also responses to each item on the multiple-item instrument). Study 1 involved eight IRT analyses (the four constructs, separately in male and female partners). To exemplify one of the clearer sets of findings, Figure 2 presents the visual results (known as trace lines) for men’s and women’s conflict. The general pattern of these graphs is that the probability of selecting the lowest response option (“not at all”) peaks when respondents have extremely low levels of the underlying latent construct, the probability of selecting the highest option (“extremely”) peaks when respondents have extremely high levels of the underlying construct, and the endorsement probability of each intermediate answer option peaks at a corresponding point on the underlying construct. This pattern gives the appearance of small bell-curves appearing sequentially. Figure 2 shows that most of the conflict items adhered to this pattern, including the single-item measures. Hence, the single-item measures functioned similarly to any of the items assessing conflict as part of a multi-item scale.

Item Response Theory Trace-Line Graphs for Men’s and Women’s Conflict Items in Study 1.
IRT yields numerical values in addition to its graphical depictions. The a slope/discrimination coefficients (which are analogous to factor loadings but can exceed 1) for the single-item measure in the eight analyses ranged in absolute value from .65 (women) and .90 (men) for conflict to 2.51–3.73 for all other single-item measures. According to Anderson and Miller (2020), slope/discrimination coefficients above 1.69 are “very highly discriminating” (p. 609). For conflict, the single-item measures’ coefficients fell in the range of “moderate” discrimination.
In addition, each item has multiple threshold (also known as difficulty or location) parameters (b), with the number of b parameters equal to the number of response options minus 1 (i.e., b1 for the lowest response option, b2 for the second lowest option, etc.). Because b coefficients are on a z-score metric, the first few b coefficients (b1, b2, etc.) should be negative, as respondents should be very low on the underlying construct if they select a low response option (technically, to have a 50% probability of choosing the given option or lower on the item). Conversely, b coefficients for the higher response options should be positive. The trace-line curves reflect this aspect of the b coefficients (see Supplement E for further explanation). Cut-offs for interpreting b parameters do not appear to exist, but general suggestions do. In an example using marital disaffection, Anderson and Miller (2020) wrote that, “If the scale were adequately measuring lower levels of disaffection, we would expect at least some of the items to have difficulty levels lower than –1” (p. 615).
The b1 coefficients for single-item measures of satisfaction, love, and commitment ranged from –1.84 to –2.36 (one coefficient came out +2.00 as a technical artifact, explained in Supplement E), indicating that anyone who gave the lowest endorsement to the single-item versions of these positively toned constructs really had to be low on the respective underlying construct. Thresholds for endorsing the highest answer options for satisfaction, love, and commitment (e.g., b5 or b6) were generally near zero for the single-item measures (and for the items comprising the full scales, as well). Hence, respondents did not need to be extremely high on the underlying construct to strongly endorse the single item. This asymmetry between clearly negative b1 values but near-zero (rather than highly positive) b5 and b6 values likely results from positivity biases in studies of romantic relationships. Responses are frequently skewed toward the favorable end of scales, as individuals in unsatisfying relationships may break up quickly or disproportionately decline to participate in relationship studies (Barton et al., 2020; Park et al., 2021). In addition, those who do participate may be influenced by social-desirability norms to rate their relationship as highly satisfying.
Finally, for conflict, b1 values for the single-item measure were –.76 and –1.25 for men and women, respectively, whereas the b6 (threshold for the highest endorsement) values were 3.80 and 1.75. Hence, respondents needed to be low on the underlying construct to report the lowest level of conflict, but high on the construct to report a high level. These findings suggest that, on the whole, the single-item measures (other than for conflict) functioned well and no worse than did the multi-item versions.
Discussion of Study 1
The single-item love measure performed similarly to its multi-item counterpart in all analyses (even when an item similar in content to the single-item measure was removed from the multi-item version). At the other extreme, the multi-item satisfaction measure outperformed the single-item version on various forms of reliability and validity (consistent with H1). The single-item satisfaction version did not function poorly by any means (e.g., test–retest reliability ranging from .50 to .78; longitudinal predictive-validity correlations with disillusionment ranging from –.56 to –.76; five of the six concurrent-validity correlations with disillusionment ranging from –.70 to –.80; and convergent validity correlations with the multi-item satisfaction measure from .72 to .85). It is just that the multi-item satisfaction measure did even better.
The single-item commitment measure generally showed strong psychometric properties, albeit with some limited exceptions (e.g., its predictive-validity correlations with ambivalence from T1 to T2 reaching only –.34 in men and –.30 in women). The one construct for which the single-item measure was generally substandard was conflict. Most of its convergent-validity correlations with multi-item conflict reached only the .40s and .50s, whereas the corresponding correlations for other constructs were mostly in the .70s and .80s. Also, men’s test–retest reliability for single-item conflict between T2–T3 and T1–T3 fell below .50. Finally, compared with the other single-item constructs, single-item conflict had some of the weakest concurrent- and predictive-validity correlations with its respective criterion variable, focal individuals’ overt negative behaviors, as perceived by their respective partners. In the IRT analyses, the single-item conflict measure was the only single-item measure not to be deemed “very highly discriminating.” Regarding the single-item measures’ b (threshold) coefficients, however, conflict showed the best symmetry (negative standing on the latent construct required to give the lowest answer choice and positive standing required to give the highest answer choice). Finally, the single-item conflict measures were also suboptimal in terms of their standard errors of measurement. Hence, to investigate a newly worded conflict item, as well as extend the research to a sample beyond college students, we undertook Study 2.
Study 2
Overview
Study 2 was undertaken to examine psychometric properties of a newly worded conflict measure (used alongside satisfaction, love, and commitment single-item measures highly similar to those in Study 1). In addition, Study 2 sought to extend our findings beyond a college-age dating sample from one university and to test the convergent validity of the single-item measures with a new set of corresponding multi-item measures.
Methods
Sample
A total of 280 individuals participated in Study 2, an internet survey of adults who were mostly older, and from a broader array of relationships, than in Study 1 (conducted in December 2019 and January 2020). Participants represented the following relationship statuses: married, n = 129; divorced but not remarried, n = 9; separated but not divorced, n = 4; in a relationship but not living together, n = 67; and in a relationship and living together/cohabiting, n = 71. Participants from dissolved relationships responded retrospectively, as described below. All other relationship statuses (e.g., divorced but remarried) were excluded to avoid respondent confusion over which relationship (current or former) to report on. Most participants (97%) lived in the U.S., were female (n = 212, 76%), and were heterosexual (n = 254, 91%). Nearly, all were White (n = 273, 98%). Average age was 30.3 years old (Mdn = 28, SD = 10, range 18–66). Married participants reported an average of 8.2 years (SD = 9) of wedlock, whereas, cohabiters’ mean length of living together was 2.7 years (SD = 3). As many dating relationships (non-cohabiting) were expected to be less than 1 year long, daters responded to a more fine-grained relationship-length item: 1 (less than 1 month), 2 (1–3 months), 3 (3–6 months), 4 (6–9 months), 5 (9–12 months), 6 (1–2 years), and 7 (> 2 years). Daters had a mean score of 5.5 (SD = 1.7), corresponding to 1 year or slightly longer on the ordinal scale. Due to the relatively small percentage of men in the study, analyses were conducted only on the full sample, not by gender. Also, presumably due to the survey’s length, there was considerable drop-off in participation from beginning to end, yielding sample sizes from n = 140–152 for variables focal to this study.
Procedures
Upon IRB approval, participants were recruited to complete an online Qualtrics survey via two websites. One was Amazon.com’s Mechanical Turk (MTurk). Though MTurk can reach diverse audiences in terms of geography and age, a concern has been whether MTurk workers have unique characteristics limiting generalizability (e.g., spending considerable time online). Comparisons of MTurk workers with community and undergraduate research participants have shown, however, that many findings replicate between MTurk workers and other groups (for a review, see Goodman et al., 2013). The other source was Facebook. Most participants (90%) entered the study through Facebook rather than MTurk (10%). MTurk and Facebook announcements advertised a “Romantic/Marital Relationship Survey,” to which participants could answer in response to either current or past relationships. Participants were required to be at least 18 years old. All participants were directed to a survey hosted online by Qualtrics and paid US$4, corresponding to roughly one half-hour’s work under the U.S. minimum wage (US$7.25).
Measures
Single-Item Measures
Single-item measures representing the same four constructs as in Study 1—satisfaction, love, conflict, and commitment—were assessed. The conflict item was modified substantially (now, “How much conflict is/was there in your relationship?”; 1 = none to 5 = extremely large amount) to improve its psychometric performance. The other three items were modified slightly to reflect Study 2’s inclusion of married, as well as dating, participants, and of participants in dissolved relationships. These items were “In general, how satisfied are/were you with your relationship?,” “How much do/did you love your partner/spouse?,” and “How committed are/were you to your partner/spouse?” (response options from 1 = not at all to 5 = extremely).
Validity Measures
Four full-length measures were used to test the single-item versions’ convergent validity. The Satisfaction and Love subscales of the PRQC Inventory (Fletcher et al., 2000) tested the convergent validity of the single-item satisfaction and love measures, respectively. The three PRQC Satisfaction items share the structure “How _____ are you with your relationship?” and fill the blank with “satisfied,” “content,” and “happy” (alpha reliability = .97). The three PRQC Love items share the structure “How much do you _____ your partner?” and fill in the blank with “love,” “adore,” and “cherish.” Its alpha reliability in the present sample was .95. PRQC response options range from 1 = not at all to 7 = extremely. Conflict intensity, assessed with the Interactional Reactivity subscale of the Romantic Partner Conflict Scale (Zacchilli et al., 2009), was tested for convergent validity with the single-item conflict measure. Interactional Reactivity contains six items, such as “Our conflicts usually last/lasted quite a while” (responses from 1 [strongly disagree] to 5 [strongly agree]; α = .89). Finally, to test convergent validity of the single-item commitment measure, general commitment toward one’s partner was assessed via Klein et al.’s (2014) “target-free” measure of commitment. Each of the four items (e.g., “How committed are/were you to [target]?”) leaves open the object of one’s commitment, which in our case, we filled in with “your spouse/partner” (response options from 1 [not at all] to 5 [extremely]; α = .91). Evidence of the multi-item measures’ unidimensionality appears in Supplement B.
Analysis Plan
Study 2 focused entirely on single-item measures’ convergent validity with their corresponding multi-item versions; hence, correlations were analyzed. The Study 2 sample size (140–152 depending on which item, due to participant drop-off) ensured sufficient power for detecting correlations of .40 and higher (as discussed above).
Results
Preliminary Analyses
We first examined basic descriptive statistics. Means and SD for the single-item measures and their corresponding multi-item measures in Study 2 appear in Supplement A.
Convergent Validity
Table 6 displays correlations between single- and multi-item measures of the four focal constructs. Values along the diagonal (bold) represent correlations between the single- and multi-item versions of the same variable (e.g., single-item love with multi-item love). For the most part, the correlations on the diagonal were the largest (in absolute value) in the matrix, supporting the convergent validity (H2–Study 2) and specificity of the single-item measures (i.e., single-item measures’ higher correlations with their corresponding multi-item measure than with other multi-item measures). One exception to specificity was that whereas single-item commitment correlated r = .74 with multi-item commitment, single-item commitment also correlated r = .73 with multi-item love. The newly modified single-item conflict measure correlated .60 with its corresponding multi-item measure, thus demonstrating an improvement in convergent validity compared with Study 1, in which the convergent-validity coefficients for single-item conflict ranged from .43 to .61 across waves and gender. 6
Convergent Validity Correlations Between Single- and Multi-Item Measures (Study 2).
Note. For all correlations, p < .001; 95% confidence intervals are in parentheses. Correlations for single- and multi-item versions of the same variable appear in bold. PRQC = Perceived Relationship Quality Components.
Correlation when item with similar content to single-item version removed from PRQC Satisfaction scale.
Item Response Theory
Four IRT analyses, one for each construct, were conducted for Study 2. Because the sample recruited individuals rather than dyads, the full sample was analyzed, as opposed to male and female subgroups. The a slope/discrimination coefficients (factor-loading analogues) for the single-item measures of satisfaction, love, and commitment (2.75, 3.15, and 2.28, respectively) all exceeded the 1.69 cut-off for “very highly” discriminating, which is good. The a coefficient for single-item conflict (1.08) was only “moderately” discriminating, however. The satisfaction, love, and commitment single items’ b1 values were –2.05, –2.12, and –1.86, so that, respondents indeed had to be very low on the respective latent factor to give the lowest answer option on the corresponding single item. The full sequence of b coefficients for the single-item conflict measure (b1 = –1.94, b2 = .16, b3 = 1.08, b4 = 1.35) showed the same kind of symmetry as in Study 1, namely that to give a low answer, respondents needed to be low on the latent construct, whereas to give a high answer, they needed to be high on the construct. This is a favorable property of the single-item conflict measure.
Discussion of Study 2
Study 2 constituted a very different participant population than Study 1. Yet, Study 2 also supported the convergent validity of single-item measures of satisfaction, love, conflict, and commitment with their respective multi-item measures. Associations between single- and multi-item measures of the same constructs showed substantial specificity, and the single-item conflict measure showed solid convergent validity. Satisfaction, love, and commitment showed strong a slope/discrimination coefficients (analogous to good loadings in factor analysis), whereas the b threshold coefficients for conflict showed good symmetry (i.e., low answers on the item stemming from low standing on the construct and high answers stemming from high standing).
General Discussion
Given that many relationship constructs, including satisfaction, love, conflict, and commitment, represent relatively unidimensional concepts—documented via the literature and our own analyses (Supplement B)—we tested single-item versions of these four constructs for psychometric soundness. Reliable and valid single-item measures would be of great applied value, given researchers’ long-standing interest in minimizing respondent burden and maximizing survey completion, and newer considerations involved with smartphone administration. Results of two studies—one with college dating couples and another with somewhat older adults with varying relationship statuses—largely supported the psychometric soundness of the single-item measures. The constructs for which single-item measures performed comparably well to their multi-item counterparts were love and commitment on test–retest reliability and concurrent and predictive validity, and conflict on predictive validity. Single-item satisfaction did not fare as well, particularly on test–retest reliability. As is typically the case, the results can likely be improved (e.g., through additional wording refinements), so that, continuous evaluation and testing may be warranted.
Whereas, multiple-item measures traditionally have been preferred over single-item alternatives for reasons, such as ability to capture multiple facets of a psychological construct and cancelation of random measurement error (Epstein, 1983; Robins et al., 2001), it has also been argued that single-item versions may be advantageous relative to multiple-item measures that merely repeat the central essence of the construct (we acknowledge that redundancy is not inevitable in multi-item measures, particularly when the items tap into multiple facets of a given overall construct). In most Study 1 analyses (i.e., test–retest reliability, concurrent, and predictive validity), direct comparison of the functioning of single- and multi-item measures of the same construct yielded no significant differences, suggesting that researchers can obtain comparably high-quality data with the more concise versions. In some instances, as we expected (H1), multi-item measures of satisfaction (primarily), conflict, and commitment showed better psychometric performance. Even though single-item satisfaction did not perform as well as the corresponding multi-item version in a relative sense (e.g., women’s T1–T2 and T2–T3 test–retest reliabilities were .70 and .78 for the single item vs. .80 and .89 for the multi-item scales), psychometric results for single-item satisfaction (the .70 and .78 test–retest reliabilities) were still very good in an absolute sense.
Single-item measures were included in our larger Study 2, in part, to refine the wording of the conflict item. The word “conflictual” in the Study 1 version (“How conflictual is your relationship?”) may have seemed unfamiliar or odd to some respondents, hence, we adopted more conventional terminology in Study 2 (“How much conflict is/was there in your relationship?”). The new version performed well in its Study 2 test of convergent validity with a full-length conflict-related measure. As noted, we encourage development of single-item versions of other constructs in the close-relationships field (e.g., trust, understanding), to broaden the scope of constructs that can be assessed concisely.
Based on our research and the literature, we offer recommendations for using brief measures. First, researchers should mention a survey’s brevity during participant recruitment (Burgard et al., 2020). Second, although periodic item refinement is warranted whether an item stands alone or exists within a larger scale, it would appear most imperative in the former case, as there are no other items to cancel out error. As one example, we reworded the core statement of our single-item conflict measure for Study 2, as noted above. As another example (Bowling, 2005), the common single-item measure of self-rated health added a “very good” option to the prior options of excellent, good, fair, and poor to increase sensitivity/discriminativeness. Third, researchers should use various criteria when selecting candidates for single-item measures. When selecting a single item from items on an existing multi-item scale, one might take the “most powerful items from the parent instruments” (Bowling, 2005, p. 344), such as the item(s) with the largest factor loading(s). In addition, as noted, potential single-item measures should tap into a uni-dimensional phenomenon and the item should not reduce the entire range of experiences in the domain (e.g., satisfaction) to one simplistic statement.
Strengths of our research included the different samples and full-length convergent-validity measures between the two studies, aiding generalizability; the many measures (single- and multi-item) available; and (in Study 1) the longitudinal follow-ups. As limitations, our samples were relatively small (with participant drop-off in Study 2) and neither was representative of a known population; furthermore, our samples may skew toward higher-educated persons. Common-method bias (i.e., respondents’ self-report evaluations of their relationship) is another limitation. Given trends in contemporary polling, it is likely that surveys of more representative samples will be conducted using predominantly single-item measures, making it difficult to compare single-item and full-length surveys in representative samples.
In conclusion, our studies contribute to applied survey-research needs while simultaneously adhering to psychometric principles and analyses. With societies around the world becoming increasingly fast-paced (Friedman, 2016), survey research, too, must adapt.
Supplemental Material
sj-pdf-1-psp-110.1177_01461672221133693 – Supplemental material for Psychometric Evaluation of Single-Item Relationship Satisfaction, Love, Conflict, and Commitment Measures
Supplemental material, sj-pdf-1-psp-110.1177_01461672221133693 for Psychometric Evaluation of Single-Item Relationship Satisfaction, Love, Conflict, and Commitment Measures by Sylvia Niehuis, Karsen Davis, Alan Reifman, Kenzi Callaway, Ali Luempert, C. Rebecca Oldham, Jayla Head and Emma Willis-Grossmann in Personality and Social Psychology Bulletin
Supplemental Material
sj-pdf-2-psp-110.1177_01461672221133693 – Supplemental material for Psychometric Evaluation of Single-Item Relationship Satisfaction, Love, Conflict, and Commitment Measures
Supplemental material, sj-pdf-2-psp-110.1177_01461672221133693 for Psychometric Evaluation of Single-Item Relationship Satisfaction, Love, Conflict, and Commitment Measures by Sylvia Niehuis, Karsen Davis, Alan Reifman, Kenzi Callaway, Ali Luempert, C. Rebecca Oldham, Jayla Head and Emma Willis-Grossmann in Personality and Social Psychology Bulletin
Supplemental Material
sj-pdf-3-psp-110.1177_01461672221133693 – Supplemental material for Psychometric Evaluation of Single-Item Relationship Satisfaction, Love, Conflict, and Commitment Measures
Supplemental material, sj-pdf-3-psp-110.1177_01461672221133693 for Psychometric Evaluation of Single-Item Relationship Satisfaction, Love, Conflict, and Commitment Measures by Sylvia Niehuis, Karsen Davis, Alan Reifman, Kenzi Callaway, Ali Luempert, C. Rebecca Oldham, Jayla Head and Emma Willis-Grossmann in Personality and Social Psychology Bulletin
Supplemental Material
sj-pdf-4-psp-110.1177_01461672221133693 – Supplemental material for Psychometric Evaluation of Single-Item Relationship Satisfaction, Love, Conflict, and Commitment Measures
Supplemental material, sj-pdf-4-psp-110.1177_01461672221133693 for Psychometric Evaluation of Single-Item Relationship Satisfaction, Love, Conflict, and Commitment Measures by Sylvia Niehuis, Karsen Davis, Alan Reifman, Kenzi Callaway, Ali Luempert, C. Rebecca Oldham, Jayla Head and Emma Willis-Grossmann in Personality and Social Psychology Bulletin
Supplemental Material
sj-pdf-5-psp-110.1177_01461672221133693 – Supplemental material for Psychometric Evaluation of Single-Item Relationship Satisfaction, Love, Conflict, and Commitment Measures
Supplemental material, sj-pdf-5-psp-110.1177_01461672221133693 for Psychometric Evaluation of Single-Item Relationship Satisfaction, Love, Conflict, and Commitment Measures by Sylvia Niehuis, Karsen Davis, Alan Reifman, Kenzi Callaway, Ali Luempert, C. Rebecca Oldham, Jayla Head and Emma Willis-Grossmann in Personality and Social Psychology Bulletin
Footnotes
Acknowledgements
The authors thank the College of Human Sciences at Texas Tech University for awarding its 2018–2019 Undergraduate Research Experience Grant to Karsen Davis and her faculty mentor, Sylvia Niehuis, which facilitated presentation of Study 1 findings at the Texas Council conference.
Authors’ Note
Data for Study 1 were collected by Dr Niehuis. An earlier version of this manuscript was presented as a poster at the Texas Council on Family Relationships 2019 annual conference in Austin, Texas. Karsen Davis, Kenzi Callaway, and Jayla Head are now graduate students at Kansas State University, University of Minnesota, and Mercer University-Atlanta, respectively. Rebecca Oldham is an Assistant Professor at Middle Tennessee State University and Emma Willis-Grossmann is currently a graduate student at Texas Tech University.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Study 2 was funded via a Scholarship Catalyst Program grant to Alan Reifman from the Office of the Vice President for Research at Texas Tech University.
Supplemental Material
Supplemental material is available online with this article.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
