Articles/Chapters

Abstract

Full presentations of many of the entries below have already been distributed to BMS subscribers and RC33 members over the BMS-RC33 distribution list¹

Antonio Arcos, María del Mar Rueda, Manuel Trujillo and David Molina, “Review of Estimation Methods for Landline and Cell Phone Surveys”, Sociological Methods & Research, 2015, 44: 458-85. The rapid proliferation of cell phone use and the accompanying decline in landline service in recent years have resulted in substantial potential for coverage bias in landline random-digit-dial telephone surveys, which has led to the implementation of dual-frame designs that incorporate both landline and cell phone samples. Consequently, researchers have developed methods to allocate samples and combine the data from the two frames. In this article, we review point and interval estimation methods of proportions that can be used to analyze overlapping dual-frame surveys. We use data from the survey of attitudes toward immigrants and immigration (Opinions and Attitudes of the Andalusian Population regarding Immigration survey), a dual-frame telephone survey conducted in Andalusia, Spain, to explore these different statistical adjustments for combining landline and cell phone samples. Our application obtains good results for calibration, fixed weight, pseudo-empirical likelihood, and single-frame procedures. We recommend that one of these internally consistent estimators be used in practice. The results of these methods of estimation show that the negative image toward immigration continues to spread.

Alexandru Cernat, “The Impact of Mixing Modes on Reliability in Longitudinal Studies”, Sociological Methods & Research, 2015, 44: 427-57. Mixed-mode designs are increasingly important in surveys, and large longitudinal studies are progressively moving to or considering such a design. In this context, our knowledge regarding the impact of mixing modes on data quality indicators in longitudinal studies is sparse. This study tries to ameliorate this situation by taking advantage of a quasi-experimental design in a longitudinal survey. Using models that estimate reliability for repeated measures, quasi-simplex models, 33 variables are analyzed by comparing a single-mode CAPI design to a sequential CATI-CAPI design. Results show no differences in reliabilities and stabilities across mixed modes either in the wave when the switch was made or in the subsequent waves. Implications and limitations are discussed.

Beatriz Larraz, “Decomposing the Gini Inequality Index - An Expanded Solution with Survey Data Applied to Analyze Gender Income Inequality”, Sociological Methods & Research, 2015, 44: 508-33. The aim of this article is to propose a new breakdown of the Gini inequality ratio into three components (within-group inequality, between-group inequality, and intensity of transvariation between groups to the total inequality index). The between-group inequality concept computes all the differences in salaries between men and women. The main contribution is to show this relationship when dealing with non-integer frequency data appearing on survey data. This article also proves the relationship between the very intuitive Gini concentration ratio and the very commonly used Gini index in the case of discrete distributions with a finite number of observations. The former refers to the relationship between the cumulative proportion of population and income, while the latter is the absolute mean difference divided by twice the arithmetic mean. This article complements the inequality decomposition literature that has stemmed from Gini’s work. The research is applied to analyze gender income inequality in a Spanish region using Spanish Structure of Earnings Survey data from 2010.

R. I. M. Dunbar, Valerio Arnaboldi, Marco Conti and Andrea Passarella, “The Structure of Online Social Networks Mirrors those in the Offline World”, Social Networks, 43: 39-47. We use data on frequencies of bi-directional posts to define edges (or relationships) in two Facebook data sets and a Twitter data set and use these to create ego-centric social networks. We explore the internal structure of these networks to determine whether they have the same kind of layered structure as has been found in offline face-to-face networks (which have a distinctively scaled structure with successively inclusive layers at 5, 15, 50 and 150 alters). The two Facebook data sets are best described by a four-layer structure and the Twitter data set by a five-layer structure. The absolute sizes of these layers and the mean frequencies of contact with alters within each layer match very closely the observed values from offline networks. In addition, all three data sets reveal the existence of an innermost network layer at 1.5 alters. Our analyses thus confirm the existence of the layered structure of ego-centric social networks with a very much larger sample (in total, >185,000 egos) than those previously used to describe them, as well as identifying the existence of an additional network layer whose existence was only hypothesized in offline social networks. In addition, our analyses indicate that online communities have very similar structural characteristics to offline face-to-face networks.

Nicole Watson and Roger Wilkins, “Design Matters - The Impact of CAPI on Interview Length”, Field Methods, 2015, 27(3): 244-64. Computer-assisted personal interviewing (CAPI) offers many attractive benefits over paper-and-pencil interviewing. There is, however, mixed evidence on the impact of CAPI on interview length, an important survey outcome in the context of length limits imposed by survey budgets and concerns over respondent burden. In this article, recent experimental and quasi-experimental evidence derived from a large, nationally representative household panel study is used to investigate CAPI’s impact on interview length. We find that effects very much depend on how CAPI is implemented, including the hardware and software adopted, the extent and nature of dependent data, and even interviewer workloads – a finding that helps explain the conflicting results from previous studies. Overall, our study leads us to the conclusion that, absent dependent data, CAPI is likely to increase interview lengths, but the potential reductions from dependent data are very large, such that even modest levels can lead to net reductions in interview lengths.

Tina Glasner, Wander van der Vaart and Wil Dijkstra, “Calendar Instruments in Retrospective Web Surveys”, Field Methods, 2015, 27(3): 265-83. Calendar instruments incorporate aided recall techniques such as temporal landmarks and visual time lines that aim to reduce response error in retrospective surveys. Those calendar instruments have been used extensively in off-line research (computer-aided telephone interviews, computer assisted personal interviewing, and paper and pen interview), and have been shown to increase the quality of retrospectively collected life course data. The goal of our study was to investigate if calendar recall aids can also improve data quality in Web surveys. In a methodological field experiment, we evaluated the effects of adding visual feedback and personal landmarks to our questionnaire with regard to response/break off rates, completeness of retrospective reports, interview duration, and respondent evaluations. The study included 1,451 respondents from a probability-based Internet panel who were randomly assigned to one of the four conditions of the experiment. The results indicate that in the Web-based calendar tool, visual feedback properties exerted the most influence on data quality.

Anja Mohorko and Valentina Hlebec, “Effect of a First-time Interviewer on Cognitive Interview Quality”, Quality & Quantity, 49(5): 1897-918. The authors were interested in the learnability of cognitive interview techniques when performed by non-experienced, newly instructed interviewers; moreover, they wanted to understand the most common problems experienced. During a five-year period, 120 methodology students performed 612 cognitive interviews and analyzed 17 different survey questionnaires. The precise documentation of their assignment served as a detailed data base of qualitative and quantitative information on their experiences. The results present the first-time interviewers’ ability to accurately perform and analyze a cognitive interview. The authors show the most common mistakes and issues in all stages of the interviewing process, the influence of different interviewers’ and respondents’ characteristics, and the effect of the technique on the interview’s success.

Xiangju Qin, Pádraig Cunningham and Michael Salter-Townshend, “The Influence of Network Structures of Wikipedia Discussion Pages on the Efficiency of WikiProjects”, Social Networks, 43: 1-15. As a platform for discussion and communication, talk pages play an essential role in Wikipedia to facilitate coordination, sharing of information and knowledge resources among Wikipedians. In this work, we explore the influence of network structures of these pages on the efficiency of WikiProjects. Project efficiency is measured as the amount of work done by project members in a quarter. The study uses the comments on WikiProject talk pages to construct communication networks. The structural properties of these networks are studied using ideas from social network theory. We develop three hypotheses about how network structures influence project effectiveness and examine the hypotheses using a longitudinal data set of 362 WikiProjects. The evaluation suggests that an intermediate level of cohesion with a core of influential users dominating network flow improves effectiveness for a WikiProject, and that greater average membership tenure relates to project efficiency in a positive way. We discuss the implications of this analysis for the future management of WikiProjects.

Bella Struminskaya, Edith de Leeuw and Lars Kaczmirek, “Mode System Effects in an Online Panel Study - Comparing a Probability-based Online Panel with Two Face-to-Face Reference Surveys”, Methods, Data, Analyses, 2015, 9(1): 3-56. One of the methods for evaluating online panels in terms of data quality is comparing the estimates that the panels provide with benchmark sources. For probability-based online panels, high-quality surveys or government statistics can be used as references. If differences among the benchmark and the online panel estimates are found, these can have several causes. First, the question wordings can differ between the sources, which can lead to differences in measurement. Second, the reference and the online panel may not be comparable in terms of sample composition. Finally, since the reference estimates are usually collected face-to-face or by telephone, mode effects might be expected. In this article, we investigate mode system effects, an alternative to mode effects that does not focus solely on measurement differences between the modes, but also incorporates survey design features into the comparison. The data from a probability-based offline-recruited online panel is compared to the data from two face-to-face surveys with almost identical recruitment protocols. In the analysis, the distinction is made between factual and attitudinal questions. We report both effect sizes of the differences and significances. The results show that the online panel differs from face-to-face surveys in both attitudinal and factual measures. However, the reference surveys only differ in attitudinal measures and show no significant differences for factual questions. We attribute this to the instability of attitudes and thus show the importance of triangulation and using two surveys of the same mode for comparison.

Michael R. Elliott, Qi Dong and Trivellore E. Raghunathan, “Combining Information from Multiple Complex Surveys”, Survey Methodology, 2014, http://www5.statcan.gc.ca/olc-cel/olc.action?objId=12-001-X201400214089&objType=47&lang=en&limit=0. This manuscript describes the use of multiple imputation to combine information from multiple surveys of the same underlying population. We use a newly developed method to generate synthetic populations non-parametrically using a finite population Bayesian bootstrap that automatically accounts for complex sample designs. We then analyze each synthetic population with standard complete-data software for simple random samples and obtain valid inference by combining the point and variance estimates using extensions of existing combining rules for synthetic data. We illustrate the approach by combining data from the 2006 National Health Interview Survey (NHIS) and the 2006 Medical Expenditure Panel Survey (MEPS).

K. R. W. Brewer, “Three Controversies in the History of Survey Sampling”, Survey Methodology, http://www5.statcan.gc.ca/olc-cel/olc.action?objId=12-001-X201300211883&objType=47&lang=en&limit=0. The history of survey sampling, dating from the writings of A. N. Kiaer, has been remarkably controversial. First Kiaer himself had to struggle to convince his contemporaries that survey sampling itself was a legitimate procedure. He spent several decades in the attempt, and was an old man before survey sampling became a reputable activity. The first person to provide both a theoretical justification of survey sampling (in 1906) and a practical demonstration of its feasibility (in a survey conducted in Reading which was published in 1912) was A. L. Bowley. In 1925, the ISI meeting in Rome adopted a resolution giving acceptance to the use of both randomization and purposive sampling. Bowley used both. However, the next two decades saw a steady tendency for randomization to become mandatory. In 1934, Jerzy Neyman used the relatively recent failure of a large purposive survey to ensure that subsequent sample surveys would need to employ random sampling only. He found apt pupils in M. H. Hansen, W. N. Hurwitz and W. G. Madow, who together published a definitive sampling textbook in 1953. This went effectively unchallenged for nearly two decades. In the 1970s, however, R. M. Royall and his coauthors did challenge the use of random sampling inference, and advocated that of model-based sampling instead. That in turn gave rise to the third major controversy within little more than a century. The present author, however, with several others, believes that both design-based and model-based inference have a useful part to play.

Piet J. H. Daas, Marco J. Puts, Bart Buelens and Paul A. M. van den Hurk, “Big Data as a Source for Official Statistics”, Journal of Official Statistics, 2015, 31(2): 249-62. More and more data are being produced by an increasing number of electronic devices physically surrounding us and on the Internet. The large amount of data and the high frequency at which they are produced have resulted in the introduction of the term “Big Data”. Because these data reflect many different aspects of our daily lives and because of their abundance and availability, Big Data sources are very interesting from an official statistics point of view. This article discusses the exploration of both opportunities and challenges for official statistics associated with the application of Big Data. Experiences gained with analyses of large amounts of Dutch traffic loop detection records and Dutch social media messages are described to illustrate the topics characteristic of the statistical analysis and use of Big Data.

José Luis Molina, Sören Petermann and Andreas Herz, “Defining and Measuring Transnational Social Structures”, Field Methods, 2015, 27(3): 223-43. Transnational social fields and transnational social spaces are often used interchangeably to describe and analyze emergent structures of cross-border formations. In this article, we suggest measuring two key aspects of these social structures: embeddedness and span of migrants’ personal networks. While clustered graphs allow assessing transnational embeddedness, the standardized diversity index can be used to show variation in the number of countries reported in personal networks. The measures will be exemplified with the data collected in Barcelona from three groups (Chinese, Sikh and Filipino, N = 25 in each group, 30 alters by ego).

Footnotes

1

bms-rc33@services.cnrs.fr