Abstract
In an experimental split ballot design, I test four different ranking techniques (drag and drop, numbering, arrows, and most–least) to explore potential effects on substantive answers, dropouts, and item nonresponse and response time between the groups. As an example, I use six items from Inglehart’s materialism–postmaterialism index. Data come from 1,225 members of an access panel who entered the set of items to be rank ordered. With respect to sex, education, and age, there are no significant differences between the four experimental groups. However, the groups differ extensively in response time, item nonresponse, and estimation of the percentage of materialists and postmaterialists. Drag and drop is shown to be the best-suited method for collecting rank data in web surveys.
Introduction
Web surveys are increasingly applied in both the social sciences and the marketing. In 1998, only 1% of all surveys were performed online in Germany, the number increased to 5% in 2002, 10% in 2003, 22% in 2005, and 32% in 2009 (http://www.adm-ev.de, as of March 10, 2011). The advantages and disadvantages of online surveys compared to other data collection methods have often been comprehensively documented and discussed (e.g., Couper 2001; Couper et al. 2004; Dillman 2007; Fricker and Schonlau 2002); compared to face-to-face, telephone, and mail surveys, online surveys have the advantage of being cheaper, faster, and independent in terms of time and space. The disadvantage at present is that Internet access is biased by sociodemographic characteristics such as age and educational level; therefore, online surveys are not generally applicable. But this limitation will lessen and disappear with the increasing numbers of households having Internet access.
In contrast to paper-and-pencil surveys and to other computerized questionnaires, web surveys provide a large variety of response options and input formats such as videos, audio techniques, and interactive questions (Couper et al. 2004; Couper et al. 2007; Dillman 2007). Modern software for online surveys offers numerous possibilities that are not available in surveys conducted by mail, face-to-face, or telephone.
Many web surveys investigated methodological effects due to response formats (Couper et al. 2004; Smyth et al. 2006; Smyth et al. 2008; Tourangeau et al. 2004), spacing between response categories (Christian et al. 2009), response time (Christian et al. 2009; Heerwegh, 2003), order of response categories, and primacy effects (Galesic et al. 2008; Malhorta 2008). To my knowledge, to date no one has investigated the effects of different ranking formats. Modern software allows the use of various response formats that are not applicable in face-to-face, telephone, or traditional mail surveys.
In an experimental split ballot design, I test four different ranking formats (drag and drop, numbering, arrows, and most–least) to explore possible effects on data quality, dropout, item nonresponse, and response time between the groups. As an example, I use six statements from Inglehart’s materialism–postmaterialism index (Inglehart 1971; Inglehart and Abramson 1999); for the wording of the question and the response options, see Figure 1. The materialism–postmaterialism index has been used in particular in the applied social sciences, often as independent variable/variables in regression models to explain various social phenomena. To simplify the task of ranking items, which is especially necessary in telephone surveys (see below), the items for measuring materialism and postmaterialism are often subdivided into two or three subsets with four items only. A further simplification asks for the first and second choice only, as, for example, in the world value surveys (http://www.worldvaluessurvey.org/index_surveys, as of March 10, 2011). However, using web surveys and modern software, there is no need for such a simplification.

Sorting items by arrows
The data were collected via an online access panel in late 2007. In total, 1,225 panel members entered the ranking question, 1,066 of whom filled in the ranking question completely. The study is explorative since up to now there is no comparable study in this area on web surveys and most of my expectations are drawn from face-to-face surveys or from experiences with other web surveys.
Literature Review
In survey research, whether face-to-face, via telephone, mail or web, questions or items are instruments to elicit responses. With respect to web surveys, Dillman (2007) and Ganassali (2008), among others, discussed the general structure and length of the questionnaire, the intensity of illustrations, the question wording, the interactivity, and the response format as important factors that may influence the quality of responses.
Even when scholars are interested in ranking information, they sometimes apply rating techniques or simplify the task (e.g., asking for the first and second choices only), since ranking techniques have a number of disadvantages. To summarize the main objections against them, I refer to Ovadia (2004:405). First, ranking questions cannot easily be asked over the telephone. Second, requiring respondents to order a long list of items might take more time than asking for individual ratings. Third, rankings result in data that cannot be analyzed with standard statistical methods because of the interdependence of the ranks. The first objection is not applicable to the given example—web surveys are not conducted by telephone. The second objection might hold for a long list; however, for short lists of items, say four or six, as in the case of Inglehart’s materialism–postmaterialism scale, there should be no significant difference in time to cause an effect on the number of dropouts. The third objection is valid, but several advanced statistical methods for analyzing ranking data are included in the standard statistical packages, for example, correspondence analysis (examples for analyzing rank data are given by Blasius and Graeff (2009), Greenacre (1993, 2007), and Thiessen and Blasius (2002).
In one of his early articles, Inglehart (1971) used a very simple ranking procedure. From four items, two with materialistic and two with postmaterialistic goals, respondents had to select the two they preferred most. Respondents who selected the two materialistic items were classified as materialists; those who selected one of each were classified as mix types; and those who selected the two postmaterialistic items were classified as postmaterialists. Without going into further details about the pros and cons of rating and ranking procedures and how to analyze them, I summarize that ranking procedures are an important instrument in survey research. This leads me to ask how to conduct ranking data in web surveys. In the following, I assess the advantages and disadvantages of four different input formats. All four ranking formats were piloted to make sure that they run under different browser installations.
Experimental Design
A natural way of asking for preferences is to ask respondents to order a list of items from most to least preferred. In face-to-face, mail, and web surveys, this procedure can be applied relatively easily. In mail and web surveys, the entire list of items can be shown to the respondents on a single page. In face-to-face surveys, cards can be used, showing each item on a single card, and the respondents can be asked to order the set of items according to their preferences. Currently, visual aid not available in telephone surveys; respondents must remember the entire set of items and remember which items have already been ranked and which have not. Therefore, in telephone interviews, this procedure is limited to a very small number of items.
Using the computational possibilities in professional web tools, there are several possibilities to support the sorting process, in addition to showing the entire list of items on a single page. I discuss the four most different ones. However, as Neubarth (2008:59) pointed out, an unlimited number of variations exist in principle within all of these ranking formats. I used a fixed order of items for all methods and all respondents to isolate the effects of the different ranking formats.
The first method is arrow ranking, which allows the respondents to move single items up and down (Figure 1). Starting with the given initial order, respondents are requested to resort the set of items using the mouse; each single click changes the position of the selected item by one step. The respondents see the results of the individual steps; once they are satisfied with a given order, they can stop the procedure.
The advantage of arrow ranking is that every step is visualized; respondents can always see the resulting order of items. The time people need to complete this task will depend on the individual abilities in handling the mouse and on the number of changes. The latter might lead to semioptimal solutions since every change needs a click and takes additional time. To complete this procedure quickly, respondents might stop when they are minimally satisfied with the solution, without searching for the best one. Since arrow ranking usually starts with an initial order of items, there will be no missing values except for dropouts. When no changes were made, the initial order remained as the target solution, but the respondents were alerted that they had not changed any item. The same warning was given for the other response formats when respondents did not answer or filled out the questionnaire incorrectly or incompletely.
The second possibility of ranking items is drag and drop, in which respondents have to move the items from a source list on the left to a target list on the right. As the items get moved from left to right, they can be reordered (Figure 2). This procedure should be relatively fast since each item requires only one movement. Further, respondents see the actual order of items in the target list at each step. Finally, there should be a relatively small number of incomplete answers since respondents see at each step which items have not been moved from left to right. Arrow ranking and drag and drop are interactive and relatively similar to the use of cards in face-to-face surveys.

Sorting items by drag and drop
The third possibility is to split the task into small steps. In the case of six items to be rank ordered, this version stretches over three screens. In the first step, respondents chose the item they preferred most and the item they preferred least, resulting in rank places one and six. In the next step, respondents were asked once more to select those items from the remaining ones they preferred most and least (rank places two and five). In the final step, respondents were asked to pick the item they preferred of the two remaining ones. This strategy is called most–least ranking (Figure 3) and was applied by McCarty and Shrum (1997, 2000), for example. Since there are three screens to open, and respondents have to read items two and three times, this method will consume more time than the others; in consequence, the number of dropouts and questions filled in incompletely might become relatively large. Further, because it is easier to select the best objects than the worst (see Thiessen and Blasius 1998), dropouts and incomplete questions should be associated with the level of education.

Sorting items with most–least procedure
The fourth procedure I apply is numbering (Figure 4). Similar to mail surveys, respondents were asked to order the items by numbers: “1” goes to the items respondents prefer most; “2” goes to the items respondents prefer second most; and so on. The procedure should be relatively easy to apply: The mouse is unnecessary and no decision is necessary on which item the respondent prefers least. Therefore, numbering should consume relatively little time. As with most–least ranking, the disadvantage is that the order of items is not visualized, respondents have to imagine the order from the numbers (numbering) or from the selection steps (most–least).

Sorting items by numbers
Comparing the four ranking formats, I expect that most–least ranking will take the longest time for the respondents since the question is subdivided into three parts that are shown on separate pages. As a further consequence, I expect the percentage of ranking questions filled out incompletely or incorrectly to be the highest among the four groups. Using arrows will take more time the more positions from the initial configurations of items have to be changed; with six items, there are up to a maximum of five clicks when moving from position 1 to 6 or from position 6 to 1. As a consequence, respondents might change the given order of items to a lower degree than they would have done to find the best solution; this holds especially for initial positions 1 and 6.
As is well known from the literature (Dillman 2007), item positioning, the graphical design of presenting the items, as well as the input format, can impact the solution. The same might hold true for the manner of asking for rank orders: The way of collecting rank data might affect the substantive solution. If the ranking method has no effect, all experimental groups should, because of randomization, provide similar solutions in terms of the percentage of materialists and postmaterialists.
Data
Data were collected by means of a professional web tool (www.globalpark.com). Respondents were recruited via a professional online access panel, which contains a random sample of the German online population, recruited either online or by a telephone survey. More than 10,000 persons received an e-mail invitation that contained a link to the survey. When entering the questionnaire, respondents were randomly allocated to one of the four experimental groups. Respondents did not have to answer every question to get to the subsequent question (i.e., they could skip questions). Respondents who did not answer the ranking question or filled it out in an incorrect way were given the opportunity to go back and give a formally correct answer. The study was conducted in late 2007, with a response rate of just below 10%, which is not exceptionally low for an access panel (see Couper et al. 2007:627). The survey contained 15 questions in total, most of them regarding current political issues to attract the target population. The ranking question containing the six Inglehart items (three of them measuring postmaterialistic values, the remaining three measuring materialistic ones) was the fifth one. Since I am interested in the effects caused by the different ranking formats, all respondents received the same initial order of items: economic growth, freedom of speech, more say in government, fighting rising prices, maintain order, and a less impersonal society.
In a split ballot design, the survey participants were randomly allocated to one of the four experimental groups; except for the different response format for collecting the materialism–postmaterialism items, there were no differences in the remaining questions and question formats. Of the members of the access panel who received an e-mail invitation, 1,356 entered the first page of the questionnaire and 1,225 of them (90.3%) entered the set of items to be rank ordered. No significant differences between the four experimental groups were found on sex, education, and age. The sample contains 58.7% men and 41.3% women; the average age is 29.4 years. With respect to educational level, 13.2% respondents have a low educational level (9 years of school), 30.6% have a medium level (10 years of school), and 56.2% have a high level (13 years of school, university entrance level).
Findings
Table 1 shows the dropout and item nonresponse patterns on the ranking question for the four experimental groups. It shows that the dropout rates on the ranking question range between 3.6% and 6.4%. While the values for numbering, most–least, and drag and drop are quite similar, arrow ranking seems to perform somewhat better.
Dropouts from the Survey and Item Nonresponse within the Ranking Questions, in Percentages
Note: χ2= 168.1 with df = 6; p < .001; Cramer's V = .26.
With respect to incomplete responses to the ranking question, there are highly significant differences between the four groups. As mentioned before, the arrow method does not contain any missing value by design. Of the 296 persons who answered this question, only two kept the scale unchanged; I took these two cases as valid responses. With the drag-and-drop format, to provide an incomplete response, one would have to stop the procedure before the task is apparently completed, which happened in two cases. Numbering provides 6.4% incomplete/incorrect responses, and the highest value by far belongs to most–least ranking, with 23.9% (most of them, 68 from 71 respondents, filled in the first page only). Note again that in these cases, the respondents were alerted about the questions that were answered incompletely or incorrectly (to give an example for an incorrectly filled-in question: in numbering, the same rank place was assigned twice).
There are several reasons for the relatively large amount of incomplete or incorrect responses in the most–least group. It could be that respondents found the task of choosing the “least” parts of the questions too difficult (for a similar finding, see Thiessen and Blasius 1998). It could be just a matter of excessive download time or that respondents felt bored when reading the same items the second (or the third) time. But it could also be that respondents were unable (or unwilling) to differentiate between two or more items and stopped the procedure, ignoring the given warning. In the following, I consider only those cases with complete rank information.
Comparing the average response times, the most–least version took significantly more time than the other three types (see Table 2). This finding was expected since it takes more time to download three pages than one—and it takes additional time to read the items two or three times. Of the remaining three input formats, arrow ranking took an average more time than drag and drop, which might be a simple matter of physics: In drag and drop, every item has to move one time only (from left to right), while the arrow version requires up to five clicks. The average response time for numbering is lowest since there is only one page and no need for using the mouse since input is performed by keyboard. These findings are in accordance with my expectations.
Analysis of Variance (ANOVA) on Answer Time (in Seconds) by Experimental Groupa
Note: F = 75.3; p < .001; η = .42
aSeven cases were left out since they were classified as outliers; response times were clearly above five minutes.
With respect to the mean number of changes from the initial list to the target list, I show the solution for each item and for the entire set of items for the four groups (Table 3). Since the maximum number of possible changes depends on the position in the initial item list, the maximum value for economic growth and less impersonal society is higher (with five positions) than for the other four items, I only compare the values for the four experimental groups within the single items. With respect to economic growth, in the arrow version, the average number of changes is 2.21 positions, while for numbering, it is 2.82 positions. In other words, in the arrow ranking, economic growth lost 2.21 positions on average, in numbering, it lost 2.82 positions. The item less impersonal society, initially given on the last position, increased in the arrow ranking by 1.95 positions on average, but by 2.76 positions in the most–least ranking.
Mean Number of Changed Positions, by Item (Ordered by Given Position)
Comparing the average numbers of changes shows that the values for most–least, numbering, and drag and drop are almost equal for all items. The values for the arrow ranking are similar for the four items initially given on positions 2–5, but they are significantly lower for economic growth and less impersonal society, which is a strong indication that in arrow ranking a substantial number of respondents did not move the first and last item to their “true” position.
In the next step, I investigate differences in the substantive findings. Tables 4 and 5 show how often the single items have been placed on the first and on the last position. Both tables reveal large amounts of variation between the four experimental groups. (The same holds for the remaining four positions; except for the third position, all tables are significant at the 0.1% level.) While in arrow-ranking freedom of speech was mentioned by 31.8% of the respondents first, numbering just every tenth put this item in position 1. For the last position, the values are revised: Only 6.1% from the arrow-ranking group put freedom of speech in last position, while 19.8% in numbering did so. This supports the previous finding that there is some kind of “cognitive ordering” in the arrow-ranking group. For example, it might not be appropriate to keep economic growth in position 1, so a large number of respondents chose the nearest more appropriate neighbor for the top position (i.e., freedom of speech). This solution is also in accordance with the finding of Galesic et al. (2008), who concluded that respondents take more time studying the first items than the later ones.
Percentages of First Position in Ranking by Experimental Group
Note: χ2 = 97.7 with df = 15; p < .001; Cramer's V = .18.
Percentages of Last Position in Ranking by Experimental Group
Note: χ2 = 114.3 with df = 15; p < .001; Cramer's V = .19.
Comparing the two interactive methods with respect to the first position, drag and drop is relatively different from arrow ranking. For example, from the drag-and-drop group, 21.9% put “maintain order” in position 1 but from arrow ranking only 6.8% did so. Again, in the arrow version, the distance from position 5 to 1 is four clicks. This is a distance that is relatively large and respondents might not recognize the options on the bottom of the list. In contrast, in the drag-and-drop version, there are no significant differences in the time for bridging the distance when moving the items from left to right; every item will be touched only once (economic growth—arrow ranking: 16.9%; drag and drop: 9.6%).
Pertaining to the last position, most–least ranking differs most from the other three experimental groups. In this group, economic growth was chosen most often as least preferred (33.3%, compared to 13.9% in arrow ranking), while almost nobody chose freedom of speech (2.9%, compared to 19.8% in numbering). This solution also supports the finding that it is easier to select issues one likes than those one dislikes (see Thiessen and Blasius 1998)—and it might be easier for some respondents to select an issue such as economic growth as least preferable than an issue such as freedom of speech. With respect to the single positions, numbering and drag and drop seem to raise the smallest number of ranking format effects.
Finally, I compared the value orientations of the four ranking versions (Table 6). Using the same algorithm as Inglehart (1971), materialists are those respondents who put a materialistic value in the first two positions, while postmaterialists put a postmaterialistic value in the first two positions. Further, postmaterialists predominantly chose a postmaterialistic item for the first position and a materialistic one for the second, while predominantly materialists chose a materialistic item for the first position and a postmaterialistic one for the second. Given my sampling procedure, no significant differences should occur between the four experimental groups, but as shown, I detected many differences caused by the ranking format, which affected the solution (Table 6). Specifically, arrow and most–least ranking produced more postmaterialists, while drag and drop and numbering produced more materialists.
Materialists and Postmaterialists by Version of Ranking, in Percent
Note. χ2 = 54.5 with df = 9; p < .001; Cramer's V = .13.
The literature reports very different estimations in the share of materialists and postmaterialists. Most often these differences are explained by time period (e.g., if the prices are stable, fighting against rising prices should be no important issue), by the list of items used (e.g., fighting unemployment), or by the method applied (e.g., ranking or rating). In my example, I assigned the respondents randomly to the four experimental groups, therefore, the effects cannot be explained by respondent characteristics. Since I used a fixed initial order for all respondents, the effects are caused by the different ranking formats; for example, in arrow ranking, respondents may simplify their task using the second best answer. Using a different initial order might lead to weaker or even stronger solutions. Randomizing the initial order will partly overlap the ranking effects, however, they will appear.
Conclusion
Employing a split ballot design, I randomly allocated survey participants from an access panel to four experimental groups to test different input formats for collecting ranking data: arrow ranking, most–least, numbering, and drag and drop. In total, 1,225 respondents took part in the survey and 1,066 of them filled in the ranking questions completely. As an example, I used six items from the materialist–postmaterialist scale from Inglehart (1971), which have been applied in a large number of studies; three of them measuring postmaterialistic values, the other three measuring materialistic values. I studied four different ranking formats to evaluate their advantages and disadvantages for application in web surveys.
With arrow ranking, there was no possibility to answer the question incompletely. In cases where no changes were carried out, the initial solution was saved as target solution (after giving a warning that no changes had been made); this happened in 2 of the 307 cases. The response time was somewhat larger than for numbering and drag and drop, but clearly shorter than for most–least. One main problem in arrow ranking seems to be the movement from the top to the bottom locations and vice versa; both have been removed to a significantly lower degree than in the other three experimental groups. These solutions are in accordance with the assumption that respondents simplify the tasks.
Most–least ranking seems to be even more problematic. For this group, I found a relatively large percentage of dropouts and a large percentage of questions filled in incompletely/incorrectly. Both findings might be caused by either repetition of items or by the longer download times (three pages instead of one) or by the fact that respondents are not able (not willing) to differentiate between all items. Further, respondents may have problems with selecting those items they prefer least; for the item on position 6 in the target ranking, most–least shows very different results for the selected items than the other three experimental groups. The difficulty in selecting items that are preferred least had already been reported in other surveys (see Thiessen and Blasius 1998). According to these findings, I cannot recommend the most–least technique in its present form. However, it should be discussed whether respondents should be allowed to give the same rank places twice or even more often.
Neither most–least ranking nor numbering visualizes the items in the selected order; respondents have to imagine it from the numbers, which increases the task difficulty, which, in turn, causes more problems for respondents with low education (Blasius and Thiessen 2001), who relatively often simplify tasks (Thiessen and Blasius 2008). Finally, drag and drop seldom produces incomplete responses, which might be a positive consequence of the eye-friendly visual design. The response time is faster than in arrow ranking because the items move only once from left to right; there is no need for several clicks. The target order is visualized and easy to check at every step. To summarize, when only a small number of items are involved, drag and drop is probably the method best suited for conducting ranking data in web surveys.
Footnotes
Acknowledgments
The author would like to thank Victor Thiessen, Dalhousie University, Halifax, for his comments on a previous version of the article. The author also thanks Susanne Rauch for her assistance in conducting the web survey.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author disclosed receipt of the following financial support for the research, authorship and/or publication of this article: The work reported in this article was supported by the research fund he receives from the University of Bonn.
