Abstract
Smartphones have become very popular globally, and smartphone ownership has overtaken conventional cell phone ownership in many countries in recent years. With this rapid rise in smartphone penetration, researchers are looking at ways to conduct web surveys using smartphones. This is particularly true of student populations where smartphone penetration is very high and web surveys are already the norm. However, researchers are raising concerns about selection biases and measurement differences between PC and smartphone respondents. Questions also remain about comparisons to traditional interviewer-administered approaches. We designed an experimental comparison between a PC web survey, a smartphone web survey and a computer-assisted telephone interviewing (CATI) survey. This study was conducted using an annual survey of students at a large university in South Korea. The CATI (interviewer-administered) survey had a higher response rate, lower margins of error, and better representation of the student population than the two web (self-administered) modes, but at a higher cost. The CATI survey also had lower rates of item nonresponse. More significant differences were found between the modes for sensitive questions than for nonsensitive ones. This suggests that CATI surveys may still have a role to play in surveys of college students, even in a country with high rates of mobile technology adoption.
Keywords
Introduction
Survey researchers are rapidly coming to grips with the changing landscape of mobile communication technologies, exploring both the opportunities and challenges presented by the new mobile devices (see Link et al., 2014). Much of this work has focused on adapting web surveys to mobile devices (particularly smartphones; e.g., Buskirk & Andrus, 2012a, 2012b), but research has also focused on issues of coverage, nonresponse, and measurement error associated with the use of smartphones to complete surveys (for recent summaries, see Couper, Antoun, & Mavletova, 2017; Peterson, Griffin, LaFrance, & Li, 2017). These changes, along with the declining response rates and rising costs of traditional computer-assisted telephone interviewing (CATI), are again raising the question of what is the optimal mode of data collection for specific populations with high rates of smartphone and mobile web penetration. In the same way that cell phones changed the way telephone surveys are conducted, Internet-enabled smartphones are leading to changes in both telephone and web surveys.
Smartphone ownership has already overtaken conventional cell phone ownership in many countries. For example, in 2011, only 21.6% of South Koreans (ages 6 or over) owned smartphones, while 67.6% owned conventional cell phones. But only three years later, in 2014, the proportions had changed dramatically, to 73.4% and 19.0%, respectively (Korea Information Society Development Institute, 2016). In 2015, the smartphone ownership rate of South Korean adults was 88%, the highest in the world, followed by Australia (77%), the United States (72%), Spain (71%), the United Kingdom 68%, and Canada (67%; Jacob, 2016).
How people use their mobile devices is also changing and varies across different subgroups. For example, a study of U.S. adult smartphone owners found that short message service (SMS) text messaging was the most widely used smartphone feature (100% among ages 18–29, 92% among those 50 or older) but voice/video calling remained popular, even among young smartphone owners (93% among ages 18–29, 91% among those aged 30–49, and 94% among those age 50+), and e-mail continues to play an important role across all age groups (91% among ages 18–29, 87% among those 30+; see Smith, 2015). Similarly, a survey of South Korean smartphone owners reported that about half (51.2%) of their time on smartphones was spent on average in making voice/video calls (34.7%) or sending SMS text messages (16.5%), with the other half spent using other functions such as Internet, social networking service, and e-mail (Korea Internet & Security Agency, 2013). A survey on Internet use in South Korea reported that the proportion of smartphone owners who use e-mail on their smartphone has also been increasing (51.4% in 2013, 53.6% in 2014, and 61.4% in 2015; Korea Internet & Security Agency, 2015).
Against this backdrop of changing mobile device ownership and use, survey researchers are exploring ways to increase contact with and response from smartphone users to survey requests. Given this, and considering the popularity of text messaging and voice calling on smartphones, we designed an experimental study to compare a “traditional” PC web survey using e-mail invitations with two alternatives: A smartphone web survey with invitations sent via SMS text messages and a CATI survey using voice calling to smartphones.
This study used a survey of undergraduate students at a large university in South Korea. Smartphone use among this population is near-universal (98.8% in 2014; Korea Information Society Development Institute, 2014). Further, the availability of a list of registered students with contact information permits random selection and assignment to mode. A number of prior studies have used student samples to explore differences between new and traditional modes, including web versus mail (e.g.,Kwak & Radler, 2002; McCabe, Boyd, Couper, Crawford, & d’Arcy, 2002), web versus face to face (e.g., Heerwegh & Loosveldt, 2008), web versus telephone (e.g., Parks, Pardi, & Bradizza, 2006; Woo, Kim, & Couper, 2015), and web versus interactive voice response (IVR) versus telephone (e.g., Kreuter, Presser, & Tourangeau, 2008). We are aware of no study that has compared PC web, smartphone web, and telephone (CATI).
In this study, we explore differences in response rates, margins of errors, and representativeness of respondents between the three modes (PC web, smartphone web, and CATI) in a student population where coverage is not an issue. We also examine measurement differences between the modes, looking at both sensitive and nonsensitive questions.
Review of Prior Research
We do not attempt an exhaustive review of the survey mode literature, but instead briefly review the experimental comparisons of web versus CATI and the literature on PC web versus smartphone web. As noted above, while there are several studies comparing national web surveys to separate national telephone surveys (see, e.g., Chang & Krosnick, 2009; Yeager et al., 2011), there are relatively few experimental comparisons where the same sample is randomized to different modes.
Parks, Pardi, and Bradizza (2006) randomly assigned first-year women at a U.S. university to web or telephone survey, with those in the latter group first being invited by e-mail to call in before being followed up after 2 weeks. They achieved a 60.0% response rate for the web, compared with 45.7% for the phone. They reported no significant differences in reported use of alcohol, cigarettes, or illicit drugs. They did find slightly higher rates of alcohol-related negative consequences, suggesting that respondents were more forthcoming on the web.
Heerwegh and Loosveldt (2008) randomly assigned freshmen students at a university in Belgium to a web survey or face-to-face interview, obtaining a 52.5% response rate for the web and 90.4% for face to face. They reported higher rates of item missing data and lower levels of differentiation (i.e., higher straightlining) on the web. In addition, Heerwegh (2009) reported more socially desirable responses in the interviewer-administered mode, but only for 15 of 36 items compared.
Woo, Kim, and Couper (2015) randomly assigned student at a South Korean University to a web or telephone survey, obtaining response rates of 21% for web and 81% for telephone. They found higher rates of agreement with records for the web respondents, but lower item nonresponse among telephone respondents. They also found some evidence of more socially desirability responses on the telephone, but this was not always consistent.
Kreuter, Presser, and Tourangeau (2008) conducted an experiment among alumni of a U.S. university. After an initial telephone screener, sample persons were randomly assigned to web, telephone (CATI) or IVR for the main survey. Response rates were 94.7% for CATI, 56.8% for web, and 61.1 for IVR. They found significantly higher rates of reporting socially undesirable characteristics in the self-administered mode (web and IVR), but not differences socially desirable characteristics. They also found higher rates of item missing data for CATI and IVR than web. Comparing survey responses to administrative data, they found that web respondents were significantly less likely than CATI respondents to misreport in a socially desirable direction.
Finally, Milton, Ellis, Davenport, Burns, and Hickie (2017) randomly assigned 101 respondents age 16–25 from a larger study in Australia to complete both a web survey and a telephone interview. They report higher rates of differences between modes (with higher rates of disclosure on the web) for items rated as highly sensitive by respondents.
Turning briefly to comparisons of PC web and smartphone web, the research was recently summarized in Couper, Antoun, and Mavletova (2017). Smartphone web respondents generally have lower response rates, higher break-off rates, and longer completion times than their PC web counterparts. Few reliable differences have been found with regard to response distributions, especially for sensitive questions (see Mavletova & Couper, 2013; Toninelli & Revilla, 2016). Given this, we expect to see bigger differences between CATI and web than between the two web groups.
Study Design and Method
For this study, we used the Dongguk University Student Life Survey (DUSLS) conducted by the Survey & Health Policy Research Center (SHPRC) since 2005. An earlier study (Woo et al., 2015) used the 2010 survey to compare conventional cell phone and web surveys. The purpose of the DUSLS is to inform university-wide administrative policies and procedures by measuring students’ time use, educational experiences, education satisfaction levels, financial status, and health behavior. The modes of the DUSLS have changed with the rapid spread of conventional cell phones. It was carried out using both paper and pencil self-administered surveys and cell phone interviews until 2008, and using cell phone interviews only since 2009 (Woo et al., 2015).
Sample Design
The sampling frame is a list of registered students (N = 12,730) provided by the university. The vast majority (99.3%) of students provided both a smartphone number and an e-mail address on the list, and we excluded the small number who did not provide both. Colleges or schools were used as strata, and a sample of 2,500 students was selected using stratified simple random sampling. Different sampling fractions were used in each of the 11 strata, using a Neyman allocation (see Groves et al., 2009, pp. 120–121), and design weights are used to reflect this. Sampled students were randomly assigned to each mode in unequal proportions due to different expected survey costs as follows: 1,000 to PC web, 1,000 to smartphone web, and 500 to CATI. A total of 54 sample cases identified as having invalid or dormant e-mail addresses in the prenotification were replaced with cases from a reserve sample selected in the same way as the main sample.
Data Collection Protocol
We used repeated efforts described in Table 1 to maximize response rates in each of the three modes. A survey announcement on the university website (www.dongguk.edu) and prenotifications via SMS or e-mail were used for all three modes. The PC web group was sent an invitation and two follow-ups via e-mail containing a unique URL to the survey website as well as two follow-ups via SMS without a URL. The smartphone web group was sent equivalent invitations and reminders via text messaging (SMS) with a URL as well as two follow-ups via e-mail without a URL. Thus, a total of five contacts via e-mail or SMS were attempted for the two web modes, while the CATI survey had at least six call attempts. Incentives were not offered in any mode. The PC web group was asked to answer only using a desktop, laptop, or tablet computer, while the smartphone web group was asked to answer only using a smartphone. This instruction was included in the prenotification, invitation, and reminders.
Design Features for Prenotification, Contact, Invitation, and Follow-Ups.
Note. SMS = short message service (text messaging service)
The survey announcement was kept on the university website during the survey period from November 20 to December 21, 2014. The SMS and e-mail prenotifications were sent from November 20 to 22. The invitations for the two web modes were sent on November 24, and CATI started on the same day. The first and second follow-ups for the two web modes were sent on December 1 and December 8, respectively. The two web modes were open for 4 weeks through December 21, while the CATI survey lasted 11 days, until December 4.
Survey Software
For the PC and smartphone web modes, we used SurveyMonkey (see https://ko.surveymonkey.com), which is a well-known online survey system available in the Korean language. This system allows surveys to be optimized for mobile devices (smartphones and tablets). Given that the surveys may appear differently according to screen size or operating systems, we tested the survey on a variety of devices to check for any problems completing the survey. SurveyMonkey does not provide user agent strings (see Callegaro, 2010; Lugtig & Toepoel, 2015), containing information on which device or browser or operating system was used. We used the Blaise software (see http://blaise.com/products/general-information) for the CATI mode. The CATI survey was conducted by the SHPRC in Dongguk University, using a centralized telephone facility.
Questionnaire
The DUSLS included 64 questions covering a variety of topics including time use, opinions on university life and courses, health status, online access for health information, and demographic information. There were four open-ended questions in the survey. Two of the items are considered key to the researchers at the university (see Table 3). The survey contained some sensitive questions (e.g., ever considered suicide, satisfaction with the university or major, and grade point average [GPA]). For the two web groups, we also asked what kind of device was used, both at the beginning and end of the survey. Most questions had short response lists, and there were no grid or matrix questions. The PC and smartphone web surveys were designed with 1–4 questions per page.
Results
Devices Used for Response in web Surveys
As mentioned earlier, the two web groups were asked to use the assigned devices. However, some respondents did not follow these instructions, as shown in Table 2. Of the 382 respondents in the PC web group, 337 (88.2%) reported using a PC (desktop, laptop, or tablet) both at the beginning and end of the survey. Ten respondents started on a smartphone but finished on a PC, while 29 started on a smartphone and ended on a smartphone. Our analyses are restricted to the 337 who reported starting and ending on a PC; 165 of these cases (49.0%) were consistent laptop users, 156 (or 46.3%) were consistent desktop users, while 11 (3.3%) were consistent tablet users, and the remaining 5 (1.5%) switched between PC devices. In the smartphone, web group 411 of the 444 respondents (92.6%) reported starting and ending on a smartphone, as requested. One used a PC throughout and 3 switched, while 29 did not answer the device question at the end. Again, we restrict our analyses to the 411 consistent smartphone users.
Distribution of Devices Used for Responding to the Web Surveys.
Response Rates and Margins of Error
We compared response rates (using RR6 from American Association for Public Opinion Research, 2015) and margins of error for key items in the DUSLS between the three modes. As shown in Table 3, 337 cases in the PC web group and 411 cases in the smartphone web group included in our analyses completed (answered more than 80% of the 49 main questions) or partially completed (answered at least 50% of the questions) the survey, whereas in the CATI group, there were more interviews completed (431) despite the sample being half the size of the other groups. We note that the smartphone web survey had a higher completion rate (41%) than the PC web survey (34%), somewhat contrary to recent literature (see Couper et al., 2017; Mavletova & Couper, 2015). There were no early break-offs (answered less than 50%) in any of the conditions.
Survey Dispositions, Response Rates, and Margin of Errors for Key Items.
Note. RR6 = response rate 6; CATI = computer-assisted telephone interviewing.
aAmerican Association for Public Opinion Research
The CATI survey had a substantially higher response rate (86%) than either web groups (34% and 41%). Despite the smaller initial sample size, the CATI survey also has slightly lower margin of errors for the 2 key survey items (bottom of Table 3) than in the web surveys, largely due to the higher completion rate. If we assume the same initial sample size (n = 1,000) for CATI, the expected margins of error (assuming simple random sampling) would be 3.0% and 2.0%, respectively, for those items.
Sample Representativeness
To access whether respondents in the three modes are representative of the population of students, we compared the differences between sample and population distributions with regard to the three variables available on the frame for all students. Table 4 shows the respondent and population percentages and the signed differences. For colleges or schools, we can see that respondents in the CATI group are more representative than the web groups (the sum of the absolute differences is 6.8% for CATI, 11.0% for smartphone web, and 11.6% for PC web). But for school year, the PC web group is more representative of the population (with the sum of the absolute differences being 2.2%, compared to 5.2% for CATI and 8.1% for smartphone web). However, for gender, the smartphone web group is more representative, followed by the CATI group. Thus, we see a slight advantage for CATI but also variation in representation across the three modes.
Comparison Between Respondent and Population Distributions.
Note. LSA = literature, science, and the arts; CATI = computer-assisted telephone interviewing.
bSome colleges have 5th and 6th year students.
Comparison of Responses to Sensitive Questions Between Modes
Many surveys include sensitive questions. They tend to produce comparatively high item nonresponse rates or larger measurement error (lower accuracy), and are often affected by mode of administration. Questions can be considered sensitive if respondents perceive them as intrusive (e.g., asking about income or religion), if the questions raise fears about the potential repercussions of disclosing the information (e.g., a question about use of marijuana), or if they trigger social desirability concerns (e.g., an inquiry about voting; Kreuter et al., 2008; Tourangeau, Rips, & Rasinski, 2000; Tourangeau & Yan, 2007). Based on the consistent finding in the literature that self-administered modes (mail or web) are less subject to socially desirable responding than interviewer-administered modes, we expect higher reports of undesirable characteristics and lower reports of desirable characteristics in the web modes than in CATI. Similarly, consistent with the literature comparing sensitive questions on PC or smartphone web (Mavletova & Couper, 2013; Toninelli & Revilla, 2016), we expect no differences between the two web groups.
The survey included 24 main close-ended questions (excluding conditional questions) deemed a priori to be highly or moderately sensitive and prone to under- or overreporting due to intrusiveness or fears of disclosure or social desirability. All 24 questions had statistically significant differences in at least one response category between modes, but the patterns were not always consistent.
Table 5 shows the results for eight selected examples of sensitive questions. The first set of columns in Table 5 shows the distributions of design-weighted estimates within mode. Given the small expected differences between the two web modes, we also combine them in the “both web” column. The second set of columns show tests of significance ( p values below .10) between pairs of modes, using Rao and Scott (1987) χ2 tests to adjust for design weights. As expected, we see more significant differences between PC web and CATI, and between smartphone web and CATI (and, by extension, between both web groups combined and CATI). Generally, we see higher reports of socially undesirable behaviors on the web: more suicidal ideation (8.5% for web vs. 5.1% for CATI), more reports of almost no books read for leisure (56.2 vs. 35.7%), and more reports of no exercise (48.1% vs. 34.3%). We also see lower reports of social desirable behaviors: satisfaction with university (8.6% vs. 14.9% very satisfied).
Differences in Responses to Selected Sensitive Questions Between Modes.
Note. Both web = PC web and smartphone web combined. SP = smartphone; CATI = computer-assisted telephone interviewing.
Comparison of Responses to Nonsensitive Questions Between Modes
There are 25 main close-ended questions deemed a priori to be nonsensitive in the DUSLS. Nine selected examples are shown in Table 6. In general (consistent with the literature), we see fewer significant differences by mode, compared to those in Table 5 for sensitive questions. However, CATI had significantly higher reports of consulting with an academic advisor (27.3% vs. 20.8% for web). Accommodation type also seems to differ between the two web modes as well as between web and CATI. The other 16 nonsensitive questions not presented in Table 6 show similar patterns.
Differences in Responses to Selected Nonsensitive Questions Between Modes.
Note. Both web = PC web and smartphone web combined. SP = smartphone.
Item Nonresponse
Item nonresponse was relatively rare in all three modes. Table 7 shows the count of item nonresponse to 49 main close-ended questions in the DUSLS by mode. We see no significant difference between the two web modes, but both have significantly higher proportions of cases with one or more missing responses than CATI. We also examined individual items and did not find any noteworthy differences.
Distribution of Item Nonresponse to Main Questions.
Note. SP = smartphone; CATI = computer-assisted telephone interviewing.
Nonresponse to Open-Ended Questions
The survey included two open-ended numeric questions (age and GPA). Only one respondent in each mode did not answer the age question. But GPA, considered a sensitive question, had different item missing data rates by mode: 5.6% in PC web, 9.2% in smartphone web, and 4.6% in CATI. The difference between PC web and smartphone web is not significant ( p = .0587), while that between the smartphone web and CATI reached traditional significance levels ( p = .0088).
Comparison of Completion Times Between Modes
A consistent finding from recent studies is that respondents who complete the survey on mobile web devices (smartphones) take longer on average than those who use a PC (Couper & Peterson, 2016), with some exceptions (see Couper et al., 2017, for a review). In our study PC web, smartphone web, and CATI respondents took an average of 10.1, 11.0, and 16.8 min (medians of 6.5, 7.2, and 16.5), respectively, to complete the survey, after truncating cases longer than 6 hr in the web modes. CATI took significantly longer ( p < .01) than either web mode. Although smartphone web respondents took slightly longer than PC web respondents, the difference was not statistically significant. We attribute this in part to mobile optimization and to the fact that the population we study is highly facile with smartphones.
Measurement Error
We have three questions where the survey responses can be compared with university records, permitting an evaluation of measurement error between modes. These questions are whether a student is registered for a double major or minor (yes/no), whether they are a teacher certification program (TCP; yes/no), and their GPA in the last semester. Table 8 presents the rates of agreement between the survey responses and the university records, among those who answered the questions. For GPA, we examine agreement at three levels of precision.
Measurement Error by Mode.
Note. CATI = computer-assisted telephone interviewing; GPA = grade point average.
Agreement rates for whether the student is taking a double major and whether they are enrolled in a TCP are high and do not differ significantly between modes. However, across all levels of precision, the answers to the GPA question provided in the web survey modes agree with the records at significantly ( p < .01) higher rates than in the telephone mode. Further, across all levels of precision, there is a tendency to overreport GPA in CATI relative to web. For example, at the first decimal, 33.6% of CATI respondents report a higher GPA, compared to 21.4% for PC web and 18.1% for smartphone web. This is consistent with a social desirability hypothesis and findings in the literature.
Survey Costs
As mentioned earlier, the CATI survey was conducted in the SHPRC’s telephone facility. The telephone interviewers had a total of 200 hr of training (including using a CATI system and interviewing skills) at a rate of 6,500 KRW (about US$5.65 at current rates) per hour, and worked a total of 188 hr at the same rate, resulting in a total cost of 2,522,000 KRW (about US$2,193). Supervisors worked a total of 320 hr at a rate of 16,250 KRW (about US$14.13) per hr, at a total cost of 5,200,000 KRW (US$4,522). This produces a total labor cost of 7,722,000 KRW (US$6,715) for training, interviewing, and supervision. The charge for telephone calls was 530,000 KSW (US$461). Operation and maintenance costs of about 600,000 KRW (US$522) were charged for the survey. Thus, the overall costs were 8,852,000 KRW (US$7,697) for CATI. This amounts to about 20,538 KRW (US$17.86) per completed CATI case.
The two web surveys were also managed at SHPRC. Survey specialists worked a total of 200 hr on each survey at a rate of 18,750 KRW (US$16.30) per hour, which is a total cost of 3,750,000 KRW (US$3,261). The cost for using the software in each survey was 500,000 KRW (US$435). There was an additional cost of about 200,000 KRW (US$173) in each web survey for the use of commercial services to send e-mails and SMS for invitations and follow-up messages to sampled students. The overall cost for each web survey was about 4,450,000 KRW (US$3,870), which is about half that of the telephone survey. The estimated cost per complete was 13,244 KRW (US$11.52) for the PC web survey and 10,854 KRW (US$9.44) for the smartphone web survey, reflecting the higher response rate for the latter.
Discussion
We experimentally compared two self-administered modes (PC web and smartphone web) and one interviewer-administered mode (CATI) with respect to nonresponse error and measurement error in a sample of university students in South Korea. Given the near-universal access to smartphones and the availability of a sampling frame, we ignore coverage error in our study.
The CATI survey achieved a substantially higher response rate than the two web modes, also producing a larger number of respondents despite an initial sample size half that of the web modes. This resulted in lower margins of error for key survey items and better representativeness of respondents than in the web surveys. For questions deemed a priori to be more sensitive, we found larger differences between the three modes, with most of the differences being between the two web modes and telephone. This is consistent with the literature that finds differences between self- and interviewer-administered modes on questions that may be subject to social desirability bias or presence of interviewer effects. For questions deemed to be less sensitive, we find few significant differences between modes.
Item nonresponse rates were significantly lower in CATI, again consistent with the literature. However, the CATI interviews took significantly longer on average than the web surveys. We found no differences in completion time between PC web and smartphone web, contrary to the literature (see Couper et al., 2017). We found high rates of agreement with administrative records for two variables, with no differences between modes. However, we found higher rates of agreement with the records for GPA in the two web modes relative to CATI, with a tendency for CATI respondent to overreport their GPA relative to the records (see Kreuter et al., 2008). Finally, the CATI survey cost significantly more to field, but the difference per completed case is not as large, given the higher response rate to the CATI survey.
Comparing the two web modes, the survey targeted at smartphones (using SMS invitations and restricting the analysis to those who reported completing the survey on a smartphone) resulted in a higher response rate at lower cost than the PC web survey, with no notable differences in data quality.
We note that this study was conducted among students in South Korea, a population with very high smartphone penetration. The results may not translate to the United States or other Western countries where telephone surveys have largely been abandoned for student populations, mostly for cost reasons. But the results suggest we should not dismiss CATI too quickly, whether as a standalone mode or as a supplement to the web. Restricting the survey to smartphone users only had no noticeable negative effects on response rates or data quality. This further suggests that focusing attention on smartphone web surveys for populations with high smartphone ownership rates may be fruitful.
Footnotes
Authors’ Note
The authors wish to thank Woohyun Yoo at the Department of Communication Studies, Incheon National University, Christopher Antoun in the Joint Program in Survey Methodology, University of Maryland, and many others for their assistance.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the research program of Dongguk University, 2017.
