Test-Retest Reliability and Mode Effects on Single-Item and Multi-Item Measures in a Survey of Adolescent and Young Adults with Cancer

Abstract

Purpose:

This project examined test-retest reliability and survey mode administration across single-item and multi-item measures among adolescent and young adult (AYA) cancer survivors.

Methods:

Forty-six AYAs randomly assigned to survey mode (phone, online, and paper) completed the survey and were invited to complete the survey again 1 week later.

Results:

Mode effects were found on 6% of single-items and 25% of multi-item scores. Reliability was low for 52% of single-items and 8% of multi-item scores.

Conclusion:

Multi-item measures should generally be used over single-item measures due to better reliability, but single-item measures may be preferable when mode effects are large.

Keywords

psychometrics survivorship quality of life

Introduction

One of the great debates in psychometrics and survey development is the use of single- versus multi-item measures. Currently, no consensus exists on whether single-item or multi-item measures are preferred. Single-item measures have a lower burden for both participants and scientists.¹ However, studies have shown that multi-item measures are more reliable and have better content, construct, and criterion validity.² While most patient-reported outcomes are multi-item, several single-item measures have been shown to be valid and reliable (e.g., self-rated health, visual analog scale,^1,3,4) though this can differ by domain.⁵ Previous research on single-item measures has focused on estimating validity and internal consistency for reliability.^6,7 However, prior studies have shown that single-item measures are not always superior to multi-item measures.^8–10

Test-retest reliability and mode effects tend to be less well studied when comparing single- and multi-item measures. Test-retest reliability reflects the reproducibility of the scale scores and how consistent the scores are over a specified period of time.¹¹ Mode refers to how the survey is completed, such as with a research assistant, on paper, or online, and mode effects are when the mode is associated with statistically significant differences in participants answers.¹² Identifying whether single- or multi-item measures maximize test-retest reliability and minimize mode effects could help reduce participant burden while maintaining study integrity.

The issue of whether to use single- or multi-item measures is particularly salient for adolescents and young adults (AYA) with a history of cancer. AYAs are in the age group most likely to own smartphones,¹³ and completing multi-item measures on smartphones is challenging.¹⁴ AYAs may experience several time-intensive developmental milestones (e.g., completing formal education, starting a career, forming relationships, and family planning) during and after cancer treatment.¹⁵ Consequently, long, multi-item survey measures may be challenging for AYAs to complete. Single-item measures are also more feasible for screening and treatment monitoring in clinical care.¹⁶ Given the potential benefits of single-item measures for AYAs, additional evaluation in this population is warranted.

The aim of the present analysis was to evaluate the psychometric properties of single- and multi-item measures among AYA cancer survivors. We aimed to evaluate two psychometric properties of single- and multi-item measures: effects of survey mode (by telephone with an interviewer, online, and mailed paper survey) and test-retest reliability. Most prior studies have used an observational design comparing answers of participants no matter the mode of survey used to complete, potentially impacting internal validity and the ability to infer direct effects of survey mode on responses.^17,18 In this study, we used a randomized, between-subjects design with participants randomly assigned to one of three survey modes. We also included repeated measures to assess test-retest reliability rather than attempting to estimate internal consistency reliability for single-item measures. The study was situated within an NIH-funded program project grant focused on clinical care gaps and unmet needs in AYA cancer survivors;¹⁹ thus, the survey covered multiple domains of the cancer survivorship experience, enabling the investigation of survey mode effects and test-retest reliability for single-item and multi-item measures across these domains.

Methods

Study design

The Valuing Opinions and Insights from Cancer Experiences (VOICE) Program is focused on health care needs and utilization in cancer survivors diagnosed at ages 15–39 years. The activities discussed in this article were a part of survey development for the larger VOICE Program. This pilot study recruited a sample of AYA cancer survivors 2–10 years after diagnosis. A randomized, between-subjects experiment was used to examine differences by mode of completing the survey. Participants were assigned to groups using simple randomization. A repeated measures design was used to examine the test-retest reliability of each single-item and multi-item measure. Study activities were approved and overseen by the Kaiser Permanente Northern California Institutional Review Board.

Participants

Participants were recruited from Kaiser Permanente Northern California (KPNC), a large integrated health care system providing health care to over 4.6 million members in Northern California. Eligible individuals were 15–39 years of age at cancer diagnosis and 2–10 years post-cancer diagnosis when recruited into the study. To prevent overlap with the main VOICE Program survey, eligible individuals had cancer but were not diagnosed with one of the cancer types examined in the main study, which are the top 10 most common cancers in AYAs (leukemia, non-Hodgkin lymphoma, Hodgkin lymphoma, melanoma, sarcoma, colorectal, cervical, thyroid, testicular, and breast cancers). Eligible individuals were continuously enrolled at KPNC since their cancer diagnosis and at least 18 years old at recruitment. Participants did not have to be in remission and could have experienced a second primary cancer or recurrence.

Procedures

Potentially eligible participants (n = 302) were identified from electronic health records at KPNC. Potential participants received a series of emails, phone calls, and mailed paper letter invitations to participate in the study, consistent with the Dillman method.²⁰ Invitees were further screened to confirm eligibility, including willingness to be randomized to survey completion mode. Consented participants (n = 84) were then randomized to survey mode (mailed paper, online web-based, or interviewer-delivered by phone). One week after completing the first survey, participants completed the same survey by the same mode as the first survey. The 1-week interval was chosen to minimize recall bias and to reduce the likelihood of actual change in symptoms and quality of life that complicates reliability testing. Participants who did not complete the surveys after the initial invitation received up to five reminder attempts. Participants received $15 for completing the first survey and $25 for completing the second survey.

Measures

Survey items and measures were drawn from previously evaluated measures²¹ and large national surveys. Most items were drawn from the Medical Expenditure Panel Survey²² the Patient-Reported Outcomes Measurement Information System (PROMIS,²³) the Functional Assessment of Cancer Treatment measures,²⁴ and the Childhood Cancer Survivor Study (CCSS) insurance survey.²⁵ Survey items were grouped into seven domains (Fig. 1): cancer care; work, education, and finances; social support and well-being; health history; fertility and sexual health; demographics; and health behaviors. Except for demographics, which had only single-item measures, each domain had both single- and multi-item measures. A total of 148 items from the survey were evaluated in this study, with 68 of the 148 forming the 12 multi-item measures.

FIG. 1.

Survey domains, number of single-items examined, and scores examined. The number of single-items examined was 148, the total across all seven domains. FACT, Functional Assessment of Cancer Therapy; FSDS, Female Sexual Dysfunction Scale; MEPS, Medical Expenditure Panel Survey; PROMIS, Patient-Reported Outcomes Measurement Information System; WAHS, Worry about Affording Health care Scale.

Statistical analyses

To examine the representativeness of the survey sample, we first compared completers, non-completers, refusers, and non-responders on characteristics available from the electronic health record for all four groups. Partial completers and completers were defined as those who were randomized and completed one and two surveys, respectively. Respondents were defined as completers and partial completers combined. Non-completers were defined as those who contacted the study team, completed the screening, and were randomized but did not complete the first survey. Refusers were people who contacted the study team and actively declined to participate. Non-responders were those who passively declined (did not contact the study team and did not participate). Ineligible persons were those who did not meet the eligibility criteria outlined above.

Analyses to identify mode effects and test-retest reliability were performed for each individual item and for each multi-item score. Mode effects analyses used the first survey and compared the three modes using ANOVAs for continuous items and measures and chi-squared tests for categorical or ordinal items. Mode effects were defined as statistically significant (p < 0.05) differences between modes. Test-retest reliability compared responses and scores on the first and second survey using intraclass correlation coefficients (ICCs) for scores and continuous items²⁶ and kappa statistics for categorical or ordinal items.²⁷ ICCs over 0.70 and kappa statistics over 0.60 were defined as showing good reliability, consistent with interpretation guidelines.^26,28

Results

From July to October 2022, 302 AYA cancer survivors from KPNC were approached to participate. Seven of these people were found to be ineligible because they could not complete the survey in all three modes or indicated that they did not have cancer. Among those remaining and presumed eligible, 158 did not respond and 53 refused. Of the 84 survivors who enrolled in the study and were randomized, 38 did not complete the survey (non-completers). A total of 46 people completed the survey, with 39 completing both surveys and 7 completing only the first survey. As shown in Supplementary Table S1(supplementary materials) and Table 1, completers, non-completers, refusers, and non-responders did not differ on age at diagnosis or cancer type compared to completers. A greater proportion of the non-completers, refusers, and non-responders were Hispanic or Asian compared to completers. More than half of the 46 respondents (63%) had completed college and identified as female (56%) (not shown). Respondents were a mean of 6.3 years post-diagnosis, and most (61% of the entire sample) had undergone surgery for their cancer (not shown).

Table 1.

Characteristics of AYA Cancer Survivor Survey Participants by Survey Mode Randomization Group and Level of Participation^a

	Mode
2 Surveys (Completers)	Web (n = 15)	Phone (n = 16)	Paper (n = 8)	Total (n = 39)
Age	30.73 (7.04)	29.75 (7.55)	31.38 (4.27)	30.46 (6.67)
Hispanic ethnicity	0 (0%)	4 (25%)	2 (25%)	6 (15%)
Race
Asian	2 (13%)	4 (25%)	1 (13%)	7 (18%)
Black	1 (7%)	1 (6%)	0 (0%)	2 (5%)
White	11 (73%)	10 (63%)	6 (75%)	27 (69%)
Other	1 (7%)	3 (19%)	1 (13%)	3 (8%)
Cancer type^b
Hematological malignancy (e.g., multiple myeloma)	6 (40%)	6 (38%)	0 (0%)	12 (31%)
Gastrointestinal (e.g., stomach)	1 (7%)	1 (6%)	1 (13%)	3 (8%)
Head and neck (e.g., larynx)	0 (0%)	0 (0%)	1 (13%)	1 (3%)
Gynecological, urinary, genital (e.g., ovary, bladder)	4 (27%)	6 (38%)	5 (63%)	15 (39%)
Other (e.g., brain, Kaposi sarcoma)	4 (27%)	3 (19%)	1 (13%)	8 (21%)

1 Survey (Partial completers)	Web (n = 1)	Phone (n = 1)	Paper (n = 5)	Total (n = 7)
Age	33.00 (-)	32.00 (-)	32.60 (4.56)	32.57 (3.74)
Hispanic ethnicity	0 (0%)	1 (100%)	0 (0%)	1 (14%)
Race
Asian	0 (0%)	0 (0%)	1 (20%)	1 (14%)
Black	0 (0%)	0 (0%)	2 (40%)	2 (29%)
White	1 (100%)	1 (100%)	2 (40%)	4 (57%)
Other	0 (0%)	0 (0%)	0 (0%)	0 (0%)
Cancer type^b
Hematological malignancy (e.g., multiple myeloma)	0 (0%)	1 (100%)	1 (20%)	2 (29%)
Gastrointestinal (e.g., stomach)	0 (0%)	0 (0%)	1 (20%)	1 (14%)
Head and neck (e.g., larynx)	0 (0%)	0 (0%)	0 (0%)	0 (0%)
Gynecological, urinary, genital (e.g., ovary, bladder)	1 (100%)	0 (0%)	2 (40%)	3 (43%)
Other (e.g., brain, Kaposi sarcoma)	0 (0%)	0 (0%)	1 (20%)	1 (14%)

0 Surveys (Non-completers)	Web (n = 11)	Phone (n = 13)	Paper (n = 14)	Total (n = 38)
Age	33.27 (4.36)	30.08 (6.80)	30.00 (7.02)	30.97 (6.30)
Hispanic ethnicity	4 (36%)	4 (31%)	3 (21%)	11 (29%)
Race
Asian	3 (27%)	3 (23%)	5 (36%)	11 (29%)
Black	1 (9%)	2 (15%)	1 (7%)	4 (11%)
White	5 (46%)	6 (46%)	7 (50%)	18 (47%)
Other	2 (18%)	2 (15%)	1 (7%)	5 (13%)
Cancer type^b
Hematological malignancy (e.g., multiple myeloma)	4 (36%)	4 (31%)	3 (21%)	11 (29%)
Gastrointestinal (e.g., stomach)	1 (9%)	1 (8%)	1 (7%)	3 (8%)
Head and neck (e.g., larynx)	0 (0%)	1 (8%)	4 (29%)	5 (13%)
Gynecological, urinary, genital (e.g., ovary, bladder)	5 (46%)	4 (31%)	5 (36%)	14 (37%)
Other (e.g., brain, Kaposi sarcoma)	1 (9%)	3 (23%)	1 (7%)	5 (13%)

Characteristics are from the medical record.

This pilot survey did not include people diagnosed with the 10 most common AYA cancers [listed in methods] to prevent overlap with the main survey.

Mode effects

A small percentage of items showed effects by survey mode (Fig. 2). By domain, the proportion of single-items with mode effects ranged from 0% (fertility and sexual health; demographics; health behaviors) to 17% (cancer care). The overall proportion of items with mode effects was 6%. Mode effects were seen on items about receiving help or care discussions, mental health, and having heart conditions. In general, items with significant mode differences showed that participants completing surveys online or by paper were more likely to report worse health or less help than those completing the survey by phone. Of the 12 multi-item scores, 3 (25%; anxiety, sexual satisfaction, and quality of life) showed significant mode effects, with participants who completed the surveys by phone reporting better health than those completing the surveys online or by paper.

FIG. 2.

Proportion of scores or items with mode effects and low test-retest reliability by domain.

Test-retest reliability

Many individual items showed low test-retest reliability (Fig. 2). Except for demographics, between 35% (health history) and 78% (social support and well-being) of items within a survey content domain had low reliability. Individual items with low reliability included the financial worry items from the Work, Education, and Finances domain and whether a care provider discussed certain topics at the time of diagnosis from the Cancer Care domain. Conversely, only one multi-item score (benefit finding in the Cancer Care domain; 8%) had low reliability. Cronbach’s alpha for the multi-item measures are in the supplementary tables (Supplementary Table S2).

Discussion

In this study of survey response characteristics in AYA cancer survivors, we found that mode effects were more common in multi-item measures, but multi-item measures had better test-retest reliability compared to single-item measures. Except for demographics, the results tended to be consistent across the other six content domains of the survey. Our findings support the use of multi-item measures over single-item measures with AYAs due to better reliability. However, single-item measures may be preferable when mode effects are large and when controlling for mode or limiting modes is not possible. For example, study results showed that phone surveys possibly had a social desirability effect compared to paper and electronic surveys. When phone surveys have to be used, single-item measures may be preferable to multi-item measures. Regardless, most studies should consider multi-item measures due to better test-retest reliability.

Consistent with prior studies,^17,18 mode effects showed that participants reported worse health when completing the survey online or on paper compared to by phone. One potential reason for the mode effects was social desirability for the phone survey. Social desirability is more likely to happen with phone or in-person surveys;^29–31 our finding of significant mode effects suggests that self-administered surveys on paper or through electronic platforms may be preferable. While not observed in this study, prior research has shown that there may be non-equivalence between electronic, web-based surveys, and paper surveys and these differences could result from electronic surveys more often being completed in the participants natural setting with less control over the format of the survey and potential influences from the environment.^32–34 This study rigorously developed the survey tools to ensure equivalence across modes, and web and paper surveys were both completed by the participants in their natural environment. Both of these factors could explain the equivalence of paper and electronic surveys observed here.

The better reliability of multi-item scores was also consistent with previous studies, although some items would be expected to have lower estimated reliability due to the timeframe of the question changing over time (e.g., feelings right now, current financial worry,¹) The better test-retest reliability of multi-item scores is not surprising given that longer measures tend to have better internal consistency, another form of reliability.^35,36 The choice of multi-item versus single-item measures should consider the specific characteristics of the study, including the number of constructs measured and previously documented mode effects.

Future studies are needed to examine how many items are necessary to gain the improved reliability of multi-item measures with AYAs. Many of the multi-item measures in this study had two or three items, suggesting that multi-item measures could consist of only a few items to balance the benefits of single- and multi-item measures. The two-item global health measures from the PROMIS have been shown to be reliable and valid, further supporting the potential utility of brief measures.³⁷ This could be particularly useful for studies of AYAs such as in this study of cancer survivors, as many AYAs access surveys via smartphones¹³ and completing complex multi-item measures on smartphones is challenging.¹⁴ Remote patient monitoring using PROs through various platforms (text, web-based, apps, and patient portals) is also becoming common within oncology, and brief PROs could be helpful for reducing patient burden while supporting these efforts.³⁸ AYAs also undergo developmental milestones that might require unique survey measures. Single-item measures may seem to be an appropriate way to assess these important milestones and associated health outcomes. However, using individual items from multi-item measures to assess these experiences could lead to more error in statistical analyses, requiring larger sample sizes.³⁹ When using multi-item measures in surveys of AYAs, the multi-item score should be prioritized in analyses instead of each individual item to improve reliability.

Research studies, particularly those focused on health disparities, are moving toward using ecological momentary assessment (EMA), in which participants complete two or more brief surveys per day.⁴⁰ EMA studies may be particularly important for assessing the quality of life of AYAs as they experience the aforementioned developmental milestones that might complicate single surveys with long recall periods (i.e., 6–12 months). Most multi-item measures would be too long for EMA studies even if the timeframe was adjusted to reflect the past day or few hours. While multi-item measures should be used as developed unless modern psychometrics such as item response theory were used, additional studies are needed to adapt such measures to ultra-brief (2–3 items) versions that may be more appropriate for use with AYAs and especially for EMA studies. These measures could be useful both for EMA and for quickly assessing outcomes associated with AYAs’ developmental milestones.

The limitations of this study provide important context. The small sample size and number of multi-item measures limited the ability to compare across content domains. The sample size from this study may not have been large enough to reliably estimate effects for items with more than two response categories given the low frequency of some responses. Because this study was part of a larger project, the items and measures were limited to those needed for the parent study. Participants were also restricted to one health care system in one state, which might not reflect responses from AYA cancer survivors more generally. The need to be willing to complete the survey in all three modes could have led to the low response rate and limited generalizability. There were also some differences in completion by randomized mode, and this could have affected responses. Validity was also not assessed beyond examining mode effects. These limitations are balanced out by the major strengths of the study: randomization to mode, repeated measures with a short interval for test-retest reliability, and a longer survey with measures across multiple domains of the cancer survivorship experience.

Our findings suggest that multi-item measures should generally be used over single-item measures with AYAs due to better reliability, but single-item measures may be preferable when mode effects are large. Additional research is warranted to develop very brief multi-item measures and to examine how best to balance reliability with participant burden and mode effects. These very brief multi-item measures could then help promote research to improve quality of life and care experiences of AYAs.

The authors thank the patients of Kaiser Permanente Northern California for helping to improve care through the use of information collected through our electronic health record systems and the KPWHRI Survey Research Program, who were instrumental in conducting this research.

Authors’ Contributions

S.M.W.J.: Conceptualization, methodology, formal analysis, writing—original draft, writing—review and editing. T.H.M.K.: Conceptualization, supervision, funding acquisition, writing—review and editing. E.E.H.: Conceptualization, supervision, funding acquisition, writing—review and editing. H.B.N.: Conceptualization, supervision, funding acquisition, writing—review and editing. E.S.O’M.: Conceptualization, data curation, writing—review and editing. M.F.G.: Conceptualization, methodology, writing—review and editing. K.J.W.: Conceptualization, supervision, writing—review and editing. C.A.M.S.: Conceptualization, writing—review and editing. A.C.K.: Conceptualization, writing—review and editing. C.A.L.: Conceptualization, data curation, writing—review and editing. L.H.K.: Conceptualization, supervision, funding acquisition, writing—review and editing. J.C.: Conceptualization, methodology, supervision, funding acquisition, data curation, writing—review and editing.

Footnotes

Author Disclosure Statement

No competing financial interests exist.

Funding Information

Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health under Award Number P01CA233432. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Supplemental Material

References

de Boer

AGEM

, van Lanschot

JJB

, Stalmeier

PFM

, et al. Is a single-item visual analogue scale as valid, reliable and responsive as multi-item scales in measuring quality of life? Qual Life Res, 2004; 13(2):311–320.

Diamantopoulos

, Sarstedt

, Fuchs

, et al. Guidelines for choosing between multi-item and single-item scales for construct measurement: A predictive validity perspective. J of the Acad Mark Sci, 2012; 40(3):434–449.

Schnittker

, Bacak

. The increasing predictive validity of self-rated health. PLoS One, 2014; 9(1):e84933.

Reich

, Chatzigeorkidis

, Zeidler

, et al. Tailoring the cut-off values of the visual analogue scale and numeric rating scale in itch assessment. Acta Derm Venereol, 2017; 97(6):759–760.

Angulo-Brunet

, Viladrich

, Pallarès

, et al. Can multi-item measures and single-item measures be trusted to assess self-determination theory constructs in the elderly? Psicothema, 2020; 32(4):583–589.

Sarstedt

, Wilczynski

. More for less? A comparison of single-item and multi-item measures. Die Betriebswirtschaft, 2009; 69(2):211.

Carrière

, Donayre Pimentel

, Bou Saba

, et al. Recovery expectations can be assessed with single-item measures: Findings of a systematic review and meta-analysis on the role of recovery expectations on return-to-work outcomes after musculoskeletal pain conditions. Pain, 2023; 164(4):e190–e206.

Kwon

, Trail

. The feasibility of single-item measures in sport loyalty research. Sport Management Review, 2005; 8(1):69–89.

Loo

. A caveat on using single‐item versus multiple‐item scales. Journal of Managerial Psychology, 2002; 17(1):68–75.

10.

Gardner

, Dunham

, Cummings

, Pierce

. Focus of attention at work: Construct definition and empirical validation. Journal of Occupational Psychology, 1989; 62(1):61–77.

11.

Aaronson

, Alonso J Fau- Burnam

, Burnam A Fau- Lohr

, et al. Assessing health status and quality-of-life instruments: Attributes and review criteria. Qual Life Res, 2002; 11(3):193–205.

12.

Roberts

. Mixing modes of data collection in surveys: A methodological review. NCRM Methods Review Papers Volume NCRM/008. 2007.

13.

Gelles-Watnick

. Americans’ Use of Mobile Technology and Home Broadband. Pew Research Center; 2024.

14.

Antoun

, Katz

, Argueta

, Wang

. Design Heuristics for effective smartphone questionnaires. Social Science Computer Review, 2018; 36(5):557–574.

15.

Smith

, Keegan

, Hamilton

, et al. Understanding care and outcomes in adolescents and young adult with Cancer: A review of the AYA HOPE study. Pediatr Blood Cancer, 2019; 66(1):e27486.

16.

Jacobsen

, Ransom

. Implementation of NCCN distress management guidelines by member institutions. J Natl Compr Canc Netw, 2007; 5(1):99–103.

17.

Byrom

, Doll

, Muehlhausen

, et al. Measurement equivalence of patient-reported outcome measure response scale types collected using bring your own device compared to paper and a provisioned device: Results of a randomized equivalence trial. Value Health, 2018; 21(5):581–589.

18.

Mauz

, Hoffmann

, Houben

, et al. Mode equivalence of health indicators between data collection modes and mixed-mode survey designs in population-based health interview surveys for children and adolescents: Methodological study. J Med Internet Res, 2018; 20(3):e64.

19.

Nichols

, Wernli

, Chawla

, et al. Challenges and opportunities of epidemiological studies to reduce the burden of cancers in young adults. Curr Epidemiol Rep, 2023; 10(3):115–124.

20.

Dillman

, Phelps

, Tortora

, et al. Response rate and measurement differences in mixed-mode surveys using mail, telephone, interactive voice response (IVR) and the Internet. Soc Sci Res, 2009; 38(1):1–18.

21.

Jones

, Du

, Panattoni

, Henrikson

. Development of a questionnaire to assess worry about affording healthcare in an international sample. Qual Life Res, 2019; 28:S54. S.

22.

Agency for Health care Research and Quality. Medical Expenditure Panel Survey. Rockville, MD: Agency for Health care Research and Quality; 2014. Available from: http://meps.ahrq.gov/mepsweb/

23.

Reeve

, Hays

, Bjorner

, et al.; PROMIS Cooperative Group. Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Med Care, 2007; 45(5 Suppl 1):S22–S31.

24.

Cella

, Tulsky

, Gray

, et al. The Functional Assessment of Cancer Therapy scale: Development and validation of the general measure. J Clin Oncol, 1993; 11(3):570–579.

25.

Park

, Kirchhoff

, Nipp

, et al. Assessing health insurance coverage characteristics and impact on health care cost, worry, and access: A report from the childhood cancer survivor study. JAMA Intern Med, 2017; 177(12):1855–1858.

26.

Shrout

, Fleiss

. Intraclass correlations: Uses in assessing rater reliability. Psychol Bull, 1979; 86(2):420–428.

27.

Cohen

. Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull, 1968; 70(4):213–220.

28.

Byrt

, Bishop

, Carlin

. Bias, prevalence and kappa. J Clin Epidemiol, 1993; 46(5):423–429.

29.

Tourangeau

, Smith

. Asking sensitive questions: The impact of data collection mode, question format, and question context. The Public Opinion Quarterly, 1996; 60(2):275–304.

30.

Presser

, Stinson

. Data collection mode and social desirability bias in self-reported religious attendance. Am Sociol Rev, 1998; 63(1):137–145.

31.

Lyons

, Wareham

, Fau- Lucas

, et al. SF-36 scores vary by method of administration: Implications for study design. J Public Health Med, 1999; 21(1):41–45.

32.

Buchanan

. Chapter 5 - Potential of the internet for personality research. In: Birnbaum

., editor. Psychological Experiments on the Internet. Academic Press: San Diego; 2000. pp. 121–140.

33.

Baron

, Siepmann

. Chapter 10 - Techniques for creating and using web questionnaires in research and teaching. In: Birnbaum

., editor. Psychological Experiments on the Internet. Academic Press: San Diego; 2000. pp. 235–265.

34.

Cho

, Larose

. Privacy issues in internet surveys. Social Science Computer Review, 1999; 17(4):421–434.

35.

Christmann

, Van Aelst

. Robust estimation of Cronbach’s alpha. J Multivar Anal, 2006; 97(7):1660–1674.

36.

Tang

, Cui

, Babenko

., editors. Internal Consistency: Do We Really Know What It Is and How to Assess It? Journal of Psychology & Behavioral Science; 2013.

37.

Hays

, Schalet

, Spritzer

, Cella

. Two-item PROMIS(R) global physical and mental health scales. J Patient Rep Outcomes, 2017; 1(1):2.

38.

Breen

, Ritchie

, Schofield

, et al. The Patient Remote Intervention and Symptom Management System (PRISMS)—a Telehealth- mediated intervention enabling real-time monitoring of chemotherapy side-effects in patients with haematological malignancies: Study protocol for a randomised controlled trial. Trials, 2015; 16:472.

39.

Smith

, McCarthy

, Anderson

. On the sins of short-form development. Psychol Assess, 2000; 12(1):102–111.

40.

Steptoe

, Gibson

, Hamer

, Wardle

. Neuroendocrine and cardiovascular correlates of positive affect measured by ecological momentary assessment and by questionnaire. Psychoneuroendocrinology, 2007; 32(1):56–64.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.03 MB

0.00 MB