Abstract
The objective of this article is to analyze the quality of the information collected by a self-administered survey addressed to citizens of Andalusia, who were offered the possibility of answering using the post or Internet. The study showed the advantages of using web-based self-administered questionnaires. Web surveys showed a low number of unanswered questions, more detailed answers to open questions, and longer answers to questions than those generated from paper questionnaires. The five open questions with text showed longer answers from the web survey, around 63 characters more. In the open questions with numerical answers, the use of drop-box (select list) generated better response than the use of a blank space left in the paper questionnaire.
Introduction
When surveys were first conducted, questionnaires were administered through face-to-face interviews, the leading mode of data collection in surveys from the initial applications of this technique until the 1980s (Fowler & Mangione, 1990; Groves, 2012; Rossi, Wright, & Anderson, 1983; Wright & Mardsen, 2010). Since then, new innovations and social conditions (Couper, 2005, 2011; Dillman, Smyth, & Christian, 2009) have produced a notable decline in the use of face-to-face surveys in favor of quicker and more economical modes, such as telephone interviews, up to the end of the 20th century, and the self-administered web surveys conducted today.
Several researchers suggest that web surveys attract lower item non-response rates (Boyer, Olson, Calatone, & Jackson, 2002; Denscombe, 2006; Kwak & Radler, 2002; Schaefer & Dillman, 1998; Tourangeau, Rips, & Rasinski, 2000), although there is no unanimous position on this issue. Some studies placed higher rates of non-response in web surveys (Lozar Manfreda, Bosnjak, Berzelak, Haas, & Vehovar, 2008), as well as noting that there is a higher number of interviewees who leave the survey incomplete (Brecko & Carstens, 2006). Other experts consider that the quality of response varies according to the type of questions used (Smyth, Dillman, Christian, & Stern, 2005), whether the questions require qualitative or quantitative answers (Lozar Manfreda & Vehovar, 2002; Stangl, 2004), or whether the questionnaire contains open or closed questions (Reja, Lozar Manfreda, Hlebec, & Vehovar, 2003). Concerning this point, a large number of researchers think that too little attention has been paid to the quality of responses of web questionnaires and, more specifically, to “the way that Web questionnaires might affect respondents’ willingness to complete individual items within a questionnaire” (Denscombe, 2009, p. 282).
The objective of this article is to analyze the quality of the information collected by a postal and web survey given to a general population, something not very common in the literature, as the majority of the studies have used specific samples (students, etc.). The population that was the object of this study are citizens of Andalusia (a region in Southern Spain) with the right to vote while being resident in a foreign country, who were offered the possibility of answering using the post or Internet.
As a starting point, the fact that Internet users do not read the questions in a questionnaire carefully but instead scan the text was considered. This happens because respondents have a number of windows open at the same time (software and websites), and due to the need to read quickly in order to process the large amount of information available on the Internet. This can lead to problems in understanding the meaning of the questions, which results in the incorrect completion of the questionnaire.
The proposition to be investigated is to show that the greater “pace” (or speed) of the web mode, together with the large number of distracting elements, will produce a poor completion of the questionnaire (De Leeuw, 2005; Heerwegh, 2009; Heerwegh & Loosveldt, 2008; Holbrook, Green, & Krosnick, 2003; Tourangeau et al., 2000). This would produce responses of lower quality than that achieved with the “traditional” postal survey received in paper form, which is completed by interviewees in a calmer manner, with fewer distractions around them. The quality of response to the questionnaire will be defined by the number of non-responses, the amount of information given in response to open questions (in which respondents, in principle, are not limited in their answers), and the choice—in closed questions—of the “easiest” categories.
Theoretical Background and Expectations
One of the most used theoretical reference points concerning the quality of response to questionnaires is the Survey Satisficing by Krosnick (1991, 1999), based on the proposals of Tourangeau (1984) on the cognitive process that takes place when answering a questionnaire. From his point of view, answering a survey requires a significant cognitive effort, to the extent that, in each question, the interviewee must pass through four stages: (1) interpreting the meaning of each question, (2) searching and retrieving all the information saved in their memory, (3) integrating the information into an opinion or judgment, and (4) expressing this opinion appropriately (Krosnick, 1991; Tourangeau & Rasinski, 1988). The Survey Satisficing theory postulates that some people who have expressed a wish to participate in a survey change their mind as the interview progresses, “their motivation to work hard has evaporated” (Krosnick, 1991, p. 214) so that—instead of withdrawing their cooperation—they opt to continue answering questions but with a minimum effort. This means that these interviewees, instead of following the cognitive process described above, do not think as deeply about the meaning of the questions, and search their memory for the most “adequate” response for their interviewer.
A respondent lacking the necessary motivation to answer may change their attitude by the use of nonverbal communication techniques on the part of the interviewer in face-to-face interviews, or by the interviewer’s explanation of the advantages of cooperating in the research (Holbrook et al., 2003). On the contrary, a respondent could lose the motivation to participate in a survey when they feel the loss of intensity in a form of communication that lacks nonverbal communication, as it happens in telephone and self-administered surveys. In the specific case of telephone surveys, it must be taken into account that the interview process is very fast due to the need to avoid silences, which ends up generating bad feelings among those being interviewed, as they had agreed to collaborate because they believed that it would be a pleasant conversation (Green, Krosnick, & Holbrook, 2001; Holbrook et al., 2003). This hurried feeling, together with the possibility of doing other things while they are answering the questionnaire (multitasking), prevents them from carrying out the cognitive process necessary to respond to each question, which means that a larger number of questions do not accurately reflect what the respondent thinks.
In the self-administered questionnaire, there is a large social distance between the researcher and the person being interviewed that fosters the respondent’s honesty, thus reducing the number of “socially desirable” answers (Holbrook et al., 2003). The greater social distance characteristic of self-administered surveys varies depending on the medium used to answer: a physical questionnaire (paper) or an intangible questionnaire (electronic-virtual). The postal survey involves a closer relationship between the researcher and the respondents to the extent that the latter receive various materials (from the interviewer), such as a personalized letter of introduction, an envelope, a questionnaire, thank you cards, and so on. We could say that these materials are “borrowed” as, at least some, will have to be returned to the researcher once completed. Web surveys imply a greater social distance in that the respondent does not receive a “physical object.” This greater social distance reduces the social desirability and could encourage a sincere response (Green et al., 2001; Johnson, Fendrich, & Mackesy-Amiti, 2012; Kreuter, Presser, &Tourangeau, 2008), but, at the same time, it reduces the link between the respondent and the researcher. This link is reduced due to extrinsic characteristics of the Internet, such as multitasking (caused by having too many windows open) and the high speed of data processing. The greater social distance and the effects of the channel (the Internet) lead to the hypothesis that less effort is employed by those interviewed on the Internet to respond to a questionnaire, which would translate into a greater choice of first-option responses and a greater choice of affirmative answers (weak satisficing). These respondents would also show how a greater number of don’t know answers, lower differentiation in the use of scales, and a greater choice of easy answers (strong satisficing). In short, the satisficing theory has important implications for the issue at stake in this article, to the extent that the mode in which the questionnaire is administered may result in substantial differences in the quality of the responses.
Sample and Fieldwork
The data used in this article come from a research study (Moscoso et al., 2010) commissioned and funded by the Regional Government of Andalusia, and carried out by the Institute of Advanced Social Studies, a center belonging to the CSIC (Consejo Superior de Investigaciones Científicas—Spanish High Council for Scientific Research), an institution that is well known and highly regarded in Spain. The objective of the research was to understand the situation of people from Andalusia who were resident in other countries. The questionnaire contained 56 numbered questions with up to 116 variables.
More than half of the questionnaire (35 questions) was made up of forced choice questions presented vertically, the majority having five or two response choices, using round radio buttons in the web questionnaire (DeRouvray & Couper, 2002). There were also nine multiple-choice questions with square checkboxes in the web-based questionnaire, five check-all-that-apply questions, and the four remaining had a forced-choice format, with a yes/no choice for each item (Dillman et al., 2009; Smyth et al., 2005). There were no filter questions.
The questionnaire also had 12 open questions. In the open numerical questions, space was left blank in the postal questionnaire that was large enough to write several-digit numbers. In the web-based questionnaire, two of these questions were responded to with drop-boxes or a select list following Couper’s (2008) recommendations. The open questions that required a phrase were designed to collect up to three answers. Each answer had two lines available in the postal questionnaire. In the web-based questionnaire, a one-line text area (Couper, 2008; Dillman et al., 2009) was made available with no indication of space.
All these questions were included in a six-page questionnaire in whose design recommendations by experts in self-administered questionnaires were used (Dillman, 1978, 2008; Dillman et al., 2009; Mangione, 1995), following the unified mode construction principles (Dillman et al., 2009). No question—whether open or closed—would have the options “don’t know” or “no answer,” in a similar way as in previous pieces of research with similar aims (among others Fricker, Galesic, Tourangeau, & Yan, 2005). With the aim of avoiding effects caused by the different ways of viewing the questionnaire, it was decided that all the respondents would “view the same,” regardless of the mode chosen to respond. The web-based questionnaire allowed the possibility of moving from one question to another without having to answer the previous question and we did not use of the “forced answer” option (the interviewee could skip one or several questions).
The sample was selected randomly from people over the age of 18 with the right to vote in Andalusia who were resident abroad. The official updated reference framework for Spanish citizens living in other countries is the El Censo Electoral de Residentes en el Extranjero (The Electoral Census of Residents in Other Countries). From a universe of 144,007 emigrants from Andalusia who were resident in other countries in March 2008, 15,657 people were selected systematically, with the aim of achieving a sample size of at least 2,400 people. The members of the selected sample were contacted by post, and three channels were made available for them to answer the questionnaire: postal questionnaire (postage-paid), digital questionnaire, and CATI (computer-assisted telephone interview) paid for by the receiver.
The implementation procedures were analogous to those in similar research (Dillman et al., 2009, chap. 7; Ilieva, Baron, & Healey, 2002; Watson, Lissitz, & Rudner, 2006), although no monetary incentive was used nor was the mode of contact changed. Specifically, each person selected received an invitation envelope by post with a personalized letter of introduction, their ID number, a questionnaire, and a postage-paid envelope. The letter of introduction, addressed to a specific person and signed by the Vice-Councillor for Local Administration and Security of the Government of Andalusia (Viceconsejero de Gobernación del Gobierno de Andalucía), explained the objectives of the study, encouraged the addressees to cooperate, and referred to the confidentiality of their answers citing a Spanish law. It went on to explain that the questionnaire could be responded to by post (using the prepaid envelope), by Internet, indicating the questionnaire number (the link was shown in the letter) 1 or by telephone. Having received only 2,198 completed surveys, a second round of 3,698 questionnaires was sent out, finally achieving a total of 2,493 completed questionnaires.
Results
Response Rate
The 15,657 letters sent, of which 2,476 never reached their destination, generated 2,493 responses, which constitutes a cooperation rate (COOP2) of 18.9%. This is reduced to 15.9% when the response rate formula (RR6) is used (American Association for Public Opinion Research, 2011).
Having a sampling framework with data of the universe makes it possible to identify the differences between the theoretical sample and that actually obtained. A comparison between both distributions reveals a low number of European migrants and a larger number of residents in Latin America, with similar differences, but in the reverse (10%; that 63% of Andalusian migrants who live in Europe responded to 53% of the questionnaires; whereas out of the migrants living in Latin America, 30% responded to 41% of the questionnaires). There were few differences in the rest of the countries. The gender of the respondents was the same to the planned sample, while regarding age, a lower rate of participation was seen among the young and a greater participation among the old.
The analysis of the response obtained by each mode showed that the large majority of the questionnaires, 2,083, were received by post (83.6%), 359 by Internet (14.4%) and 51 people opted to reply by telephone (Diment & Garrett-Jones, 2007).
The effect of contacting by post, using a paper questionnaire inside the envelope, could explain the high number of people who replied using the paper questionnaire, but the low response of web questionnaires can also be explained by the questionnaire not being immediately accessible. The interviewee had to turn on the computer, connect to the Internet, type the web site, and enter the ID number to get identified. Many of the interviewees decide to postpone answering, and this produced “a break in the response process” (Dillman et al., 1994; Griffin et al., 2001). This “break in the response process” (Griffin et al., 2001) on many occasions results in failure to answer, as some respondents forget or lose interest.
Paying attention to the demographic profile of interviewee could explain this further (Miller, Kobayashi, Caldwell, Thurston, & Collett, 2002). The use of the mail survey was the main choice of those older than 65 years, people with lower education level, and those living in European countries. It is important to clarify that almost two of three had access to the Internet in their home and that half of the sample used it at least once a week. The web survey was mainly used by the under 44 age-group, with medium-high education level, and resident in Latin American countries (mainly for the reasons expressed above).
An analysis of the response quality revealed that 24 of the respondents had answered less than half of the questionnaires. 2 These questionnaires were eliminated, applying a much more stringent criterion than that used by Teclaw, Osatuke, Yanovsky, Moore, and Dyrenforth (2010), which included questionnaires with between only 51% and 80% completed responses and considered them as “incomplete.” Finally, the small number of people who chose the telephone survey prevented any meaningful conclusions, and therefore those questionnaires were eliminated from the analysis, which focused on the 2,419 people who answered the questionnaire by post or Internet, filling in at least half of the questionnaire.
Response Quality
Based on the Survey Satisficing proposals described in the first section, response quality will be operationally defined in terms of three criteria: (1) number of partial item non-responses to the questions in the questionnaire, differentiated between non-responses to open and closed questions; (2) length of answers to open questions; and (3) choice of first response options (primacy effect). It is important to take into account that our measures of quality do not cover quality in the sense of accuracy, honesty, or openness of responses.
We used significant differences in means and proportions to determine whether significant differences existed between each question in both the postal and the Internet modes of response.
Item Non-response: It is important to remember that the questions did not allow the respondent to choose a non-response category. This criterion records unanswered questions due to an absence of information, because the respondent did not want to answer or as a result of carelessness. Table 1 show that those who responded to the postal survey left more than 11 questions unanswered, nearly 5 more than in the web survey. This difference is basically due to the large number of non-responses to closed questions (5.4), which is 4 times lower in the web survey. In addition, better collaboration was observed in the web survey in the open questions with text, in line with what is found in most of the international literature on the subject (among others, Israel & Lamm, 2012; Millar & Dillman, 2012).
Number of Non-responses.
**Significant at .01.***Percentages represent an average of the item non-response rates for all respondents to each mode, for all items in the questionnaire.
Converting the number of non-responses into percentages with respect to the total number of answers to the questionnaire (Messer, Michelle, & Dillman, 2012; Millar & Dillman, 2012) provided more accurate information, and revealed that 9.3% of the postal survey questions were left unanswered, as opposed to 4.9% of those in the web survey. This makes it possible to verify that the percentage of non-response in the postal survey was shared almost equally between the closed and open questions; regardless of how easy it is to answer closed questions. In fact, the low number of non-responses to open questions was surprising, since the majority of studies have indicated that they tend to have poorer response (among others, Israel & Lamm, 2012).
The majority of the questionnaire was made up of single-response closed questions but, as mentioned earlier, there were also eight multiple-choice questions, four forced-choice format questions, and another four check-all-that-apply questions. Various pieces of research (among others, Dillman, Smyth, Christian, & Stern, 2003; Rasinski, Mingay, & Bradburn, 1994; Smyth, Dillman, Christian, & Mcbride, 2009; Smyth, Dillman, Christian, & O'Neill, 2010; Smyth et al., 2005; Stern, Smyth, & Mendez, 2012; Thomas & Klein, 2006) recommend the use of single-response closed questions, as they obtain the most answers and also prevent most respondents from choosing the first options provided in the answers to check-all-that-apply questions (primacy effect). The first column of Table 2, which shows the average number of non-responses collected for each question, differentiated by the mode used, shows a greater number of non-responses in the forced-choice questions.
Average Number of Non-responses to Multiple-Choice Questions.
**Significant at .01. ***Number in brackets shows the maximum number of responses to each question.
Starting with the check-all-that-apply questions, the first question of Table 2 reveals that the question with the most possible answers (G8) was, logically enough, the one that obtained the best results. The non-response rates, as can be seen in the second and third column of Table 2, are always higher in the postal survey than in the web survey, although only two variables present a significant difference.
A study of the distribution of the non-responses answer unveils a greater selection of first response categories, except for the question “with whom did you emigrate” (G8) and “who stayed” (G9); as parents and siblings were placed as first and second choice, these were selected most often. This situation showed some differences with respect to the majority of the international literature on the subject, as it highlights that the primacy effect is more pronounced in these questions. In the other two questions, “health cover” (D7) and “help with the payment of medicines” (D8), all the response categories were identified, with the third and fifth being the most chosen.
Contrary to what is indicated by the majority of the literature, in this study it was observed that non-response rates increased considerably in the forced-choice questions, with the poorest response being that to the question about the demand for services provided by institutions, which had seven possible answers. As in the previous case, the web survey obtained more answers than the postal survey, showing significant differences in the four questions.
Regarding open numerical questions, it must be remembered that in two of these, drop-boxes were used (select list) in the web survey, 3 leaving respondents to write whatever they wanted to in the postal survey. This generated very low rates of non-response (2.0% and 0.6% respectively) in the web survey, as opposed to 8% and 13.8% in the postal survey. The differences between both surveys reduced considerably when a (blank) space was left for the answer in the web survey, as was the case in the rest of the questions. The item non-response in the question concerning income reached 14.6% in the postal survey and 9% in the web survey; and the number of journeys to Andalusia was not answered by 3.5% and 1.7% of those interviewed, respectively. Considering only respondents born in Andalusia, 3.0% of those who used the postal survey did not answer the question concerning the year in which they left or when they started to live in that country, a percentage that went down to 0.6% in the web survey.
These differences between the modes of administering the survey could have been caused not so much by differences in the mode of response of the questionnaire, but rather by sociodemographic differences in the sample. With the aim of dispelling these doubts, a number of regressions were carried out to uncover whether differences in the non-response of both surveys remain even when controlling for the influence of sociodemographic variables (Table 3). After controlling for gender, age, educational level, being born in Andalusia, and country of residence (taking into account the continent), the number of unanswered questions was lower in the web survey (–0.113); with a greater decrease in closed questions. 4 As shown in other studies (among others Messer et al., 2012), older people left more questions unanswered, and those born in Andalusia provided poorer responses to open questions.
Multivariate Analysis of Data Quality. Dependent Variable: Number of Unanswered Questions.
**Significant at .01.
A large number of research studies (among others, Couper & Bosnjak, 2010; Davidov & Depner, 2011; Denscombe, 2009; Fricker & Schonlau, 2002; Ilieva et al., 2002; Kwak & Radler, 2002; Reja et al., 2003) have indicated that web-based surveys, in addition to having a greater number of answers to open questions, obtain longer answers, which are also more detailed and more elaborate. To test this statement, the verbatim responses noted by each respondent were analyzed and subsequently checked for the length of answer by counting the number of characters. The longest answers were those referred to the questions about the respondent’s main problems or needs, followed by their occupation. Table 4 reflects the longest responses provided by the web survey for four of the five questions considered, although only three showed significant differences.
Length of answers to open questions.
*Significant at .05. **Significant at .01.
Average Number of Characters per Word.
**Significant at .01.
Primacy Effect in Questions With More Than Four Categories.
**Significant at .01.
The length of the answers, measured by considering the number of characters in the words used to answer the questions, can also be used to consider the type of words used by each respondent. The information in Table 5 shows that those interviewed by way of the Internet used longer words, which we attribute to the use of a more efficient language, that is, a lower use of articles and prepositions. 5 This is true in four of the five open questions used.
Primacy effect, choice of the first response category: To identify the presence of this effect the demographic questions and the four forced-choice dichotomy questions were discarded, leaving only the forced-choice questions and the five check-all-that-apply questions. The count of those choices with a value of 1 was slightly higher in the web survey, and this disappeared when second-category choices were included. When considering both together, the web survey showed a more pronounced primacy effect than the paper survey (see Table 6).
Conclusion and Discussion
Contrary to the initial proposals to the effect that the respondents interviewed on the web would only scan the text in the questions, the study showed the advantages of using web-based self-administered questionnaires. Although the primacy effect is higher in web surveys, these showed a low number of unanswered questions, more detailed answers to open questions, and longer answers to questions than those generated from paper questionnaires. The 11.4 unanswered questions in the postal survey (9.3% of the questionnaire) was reduced to less than 6 in the web-based survey, a decrease that is much greater when comparing the closed questions (5.40 and 1.29, respectively), as opposed to the tendency shown by other researchers (Denscombe, 2009). Another difference with respect to the findings by the majority of research studies is the fact that forced-choice questions showed greater non-response rates than check-all-that-apply questions. The differences in non-response persisted even when the influence of the demographic variables was eliminated. The five open questions with text showed longer answers from the web survey, around 63 characters more, the questions concerning “main needs or problems” showing the greatest difference. The respondents who answered the open questions on the web surveys used more efficient language, that is, they reduced the number of articles, focusing their answer on substantive words. In the open questions with numerical answers, the use of drop-box (select list) generated better response than the use of a blank space left in the paper questionnaire.
An analysis of other pieces of research that use self-administered web surveys together with surveys with an interviewer (face-to-face or telephone interviews) have shown a poorer completion rate of the former, attributable to factors such as multitasking (Heerwegh, 2009; Heerwegh & Loosveldt, 2008; Holbrook et al., 2003, p. 83), the reading of questions rapidly (scanning effect), and the greater social distance with the researcher, and different results from the web survey compared to the postal survey.
We consider that the way of inviting someone to participate in the survey could partly explain the excellent results from the web survey (Bradley, 1999; Dillman et al., 2009; Heerwegh, 2009; Heerwegh & Loosveldt, 2008; Holmberg et al., 2010; Muñoz Leiva et al., 2009). An electronic invitation sent by e-mail or an alert on the computer, received at a time when the user is using multiple applications, involves an interruption of the daily tasks that could result—on many occasions—in texts being scanned, and in inaccurate answers. On the contrary, an invitation to participate in a survey in the form of a letter received by ordinary post “forces” the respondent to turn on the computer (or another device) to fill in the questionnaire, which results in them paying more attention and having better motivation to answer the questionnaire. In this case, the person selected could choose to respond on paper or to do it by using the web. A detailed analysis of the information from the questionnaire shows that the respondents born in Spain responded mainly by mail, while the children of immigrants responded mainly online. Something similar happened with those who stated that they identified with their Andalusian roots. In light of this information, it seems that, at least in this study, those who were most motivated, most involved, chose to answer by mail.
These results confirm the findings of the majority of international research on the subject, although not many studies have been carried out on a general population. The need to count on populations with a high level of Internet access have led many researchers to use specific populations, such as company employees (Couper et al., 1999; Parker, 1992; Jones & Pitt, 2000), professional groups (Cobanoglu et al., 2001), students (Davidov & Depner, 2011; Denscombe, 2009), teachers (Converse et al., 2008), university graduates (Heerwegh, 2009; Heerwegh & Loosveldt, 2008; Holmberg & Lorenc, 2008; Kreuter et al., 2008; Kwak & Radler, 2002; Werner, 2005), and so on, with the result that—for many of them—caution is recommended when attempting to generalize the results to a wider population.
Finally, this study has several limitations. It is important to note that the population who received the letter with the questionnaire were born in Andalusia and, in many cases, have lived abroad for many years. Therefore, results may vary on the basis of different populations. The way of inviting participation, with a letter signed by the vice-councillor of the Government of Andalusia, may have increased the number of responses. The interviewees received only one letter. The response rate could significantly increase, if respondents were to be contacted several times, or if an incentive or a telephone reminder were provided.
Footnotes
Acknowledgment
With special thanks to Julian Thomas.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study is a part of a wider research financed by the Ministerio de Economía y Competitividad (Government of Spain), ref. CSO2012-34257. This research note is a revised version of a paper presented at the 2012 Spanish Federation of Sociology (Federación Española de Sociología) meeting, Universidad Complutense, Madrid.
