Review of Estimation Methods for Landline and Cell Phone Surveys

Abstract

The rapid proliferation of cell phone use and the accompanying decline in landline service in recent years have resulted in substantial potential for coverage bias in landline random-digit-dial telephone surveys, which has led to the implementation of dual-frame designs that incorporate both landline and cell phone samples. Consequently, researchers have developed methods to allocate samples and combine the data from the two frames. In this article, we review point and interval estimation methods of proportions that can be used to analyze overlapping dual-frame surveys. We use data from the survey of attitudes toward immigrants and immigration (Opinions and Attitudes of the Andalusian Population regarding Immigration survey), a dual-frame telephone survey conducted in Andalusia, Spain, to explore these different statistical adjustments for combining landline and cell phone samples. Our application obtains good results for calibration, fixed weight, pseudo-empirical likelihood, and single-frame procedures. We recommend that one of these internally consistent estimators be used in practice. The results of these methods of estimation show that the negative image toward immigration continues to spread.

Keywords

dual-frame surveys jackknife proportion estimation telephone surveys variance estimation

Introduction

Traditionally, surveys have been carried out using three main methods of data collection: face-to-face interviews, mail surveys, and telephone interviews. Over the last 20 years, the picture has changed sharply. Telephone surveys have become a popular mode of data collection, especially following the creation and development of computer-assisted telephone interviewing (CATI) systems. Telephone interviews are often considered a less costly alternative to e-mail and face-to-face interviews and the population coverage reaches acceptable levels.

From 2000 to the present, there has been a steady increase in the use of telephone surveys, which have replaced all other data collection methods (the majority of which were face-to-face interviews). The telephone survey presents numerous advantages compared to a face-to-face one. In some subject areas (e.g., electoral studies), face-to-face surveys have been completely ousted by telephone interviewing. Moreover, studies have reported improved results from phone surveys compared with face-to-face interviews (Abascal, García, and Landaluce 2012; Díaz de Rada 2011).

However, telephone surveys also present some drawbacks with regard to coverage, due to the absence of a telephone in some households and the generalized use of mobile phones, which are sometimes replacing fixed (land) lines entirely (see Pasadas et al. 2011; Trujillo, Domínguez, and Pasadas 2005; Vicente, Reis, and Santos 2009). The potential for coverage error as a result of the exponential growth of the cell phone-only population has led to the development of dual-frame surveys. In these designs, a traditional sample from the landline frame is supplemented with an independent sample from the banks of numbers designated for cell phones.

By drawing samples from both cell phones and landline phones instead of from a single frame, it is possible to reduce survey costs, improve the coverage of the overall sample (Brick et al. 2006; Busse and Fuchs 2012; Lu et al. 2013), and even potentially increase response rates, depending on the specific survey being conducted (Opsomer 2011).

Some surveys have used a screening dual-frame survey design, in which people belonging to the landline telephone frame are removed from the cell phone frame before sampling commences, and only people living in cell phone-only households are interviewed (Brick, Edwards, and Lee 2007). No new statistical methods are required to estimate totals in such a survey, since essentially a stratified sample is taken.

The screening approach can introduce a potential for bias due to nonsampling errors (Kennedy 2007), and in many cases, it may not be possible or practical to remove list-frame units from the landline frame before sampling (it is not known beforehand whether a household member sampled using one frame also belongs to the other one).

Instead, in an overlapping dual-frame survey, independent probability samples are taken from frame A (the landline frame) and frame B (the cell phone frame). Information from the samples must be combined to estimate population quantities, and there are many options for estimators. The estimation of a population total for dual-frame surveys was first investigated by Hartley (1962, 1974). Lund (1968) and Fuller and Burmeister (1972) subsequently improved on Hartley’s results, and Bankier (1986) and Skinner (1991) have proposed alternative estimation techniques. More recently, Skinner and Rao (1996), Lohr and Rao (2006), Mecatti (2007), Rao and Wu (2010), Singh and Mecatti (2011), and Ranalli et al. (2013) have considered new multiple frame estimators for the population total. These methods are usually formulated under an ideal dual-frame survey setup (two frames can cover the entire target population).

In the analysis of a social survey, the response variables encountered are often discrete. For example, this would be the case for public opinion research, marketing research, and government survey research. In these cases, the estimation of a proportion is a commonly used statistic for summarizing data (the proportion of voters in favor of a presidential candidate, the unemployment rate, etc.). The customary sample proportion is calculated as the percentage of individuals with a specific attribute divided by the total number of individuals in the sample. At the time of data collection, the sizes of the two frames are known. However, these two frames, in conjunction, do not usually cover the entire population, as many people do not belong to either of them. If the population size is unknown and must be estimated, the estimation for proportions is more complex than that for a total, and yet this problem has hardly been discussed in the literature on multiple frames. In this article, we estimate the size of the conjunction of two frames and the proportion of interest in the population, using the methods described in the third section.

After describing the Opinions and Attitudes of the Andalusian Population regarding Immigration (OPIA) survey in the second section, in the third section, we consider the problem of the estimation of a proportion in our dual-frame telephone survey and then examine the effect of various estimation strategies designed to reduce the sampling error. In the fourth section, we present a jackknife technique variance estimation for all estimators considered. The fifth section presents the results of the different estimation strategies in our survey data set. Finally, in the sixth section, we conclude with some thoughts about methods that could be used in future surveys that sample both landline and cell phone numbers.

Survey of OPIA 2013

The 2013 survey of OPIA is a population-based survey conducted by the Institute of Social Studies of Andalusia (IESA), a public scientific research institute specializing in the social sciences. Its aim is to reflect the opinions of the Andalusian population with regard to various aspects of immigration and refugee policies in Spain and toward immigrants as a group. This survey was conducted in a period characterized by one of the most severe economic crises in the modern history of Andalusia, which has dramatically increased rates of unemployment, a situation that has notably changed attitudes toward immigration in Andalusia. This survey is based on a sample of persons drawn from both landline and cell phone frames.

Population Coverage Through Landlines and Cell Phones in Andalusia

In Andalusia, the proportion of survey subjects only reachable by landline communication has decreased to below 10 percent. In economic good times, and due to rising numbers of Internet connections, the proportion of people only reachable by cell phone also declined. However, in recent years, this proportion has risen to around 20 percent. The number of people not reachable by phone now only represents a residual percentage of the population (less than 2 percent; Figure 1, Table 1).

Figure 1.

Evolution of landline and cell phone coverage for people over 16 years old.

Table 1.

Coverage in 2013.

Both	69.4%
Cell only	9.6%
Land only	19.7%
No phone	1.3%

Source: INE = National Institute of Statistics. Survey of information technologies in households.

The distributions of landlines and cell phones vary considerably depending on the age of the population. Figure 2 shows that, taking into account only people for whom the availability of a landline depends on their own decision, that is, not considering people living with their parents, the younger the population, the higher the percentage having only a cell phone. This value exceeds 40 percent for people aged less than 33 years.

Figure 2.

Percentage of people with only cell phone, by age.

A worrying issue in this respect, due to the difficulties posed in correcting it, is the income gap between those with only a cell phone and the rest of the population (Vicente and Reis 2009). In Figure 3, it can be seen, taking into account the age and the state of emancipation, that there are very large differences in the percentages of people who have only a cell phone, depending on personal income. For example, for people living independently and aged between 30 and 44 years, 60 percent of individuals have only a cell phone when their household income is below 900 euros, and this percentage is 10 percent when their income exceeds 2,500 euros.

Figure 3.

Percentage of population with only cell phone, by income, age, and emancipation.

In this survey, the IESA decided to carry out telephone interviews with adults using both landlines and cell phones. Taking into account the time and budget available, 2,402 interviews were performed by qualified interviewers, specially trained in survey techniques. The number of interviews to be conducted via landline and via cell phone was determined by calculating the optimum proportion (in the sense of minimum variance) for each type of telephone, taking into account the costs (Pasadas and Trujillo 2013) and the percentage of possession of each type of device (following Hartley 1962). As a result, the sample sizes ascertained were 1,919 for landlines and 483 for cell phones. The interviews were carried out by the Statistics and Surveys sections of IESA from April 22 to May 13, 2013, using Computer Assisted Telephone Interviewing (CATI) data input techniques.

Descriptions of Frames and Sampling Designs

Following Hartley’s (1962) classical notation, two samples are drawn independently from two frames, A and B. Let $a = A \cap \overline{B}$ , $b = \overline{A} \cap B$ , and $a b = A \cap B$ , where $\overline{(\cdot)}$ denotes the complement of a set. From frame A, land phone, a stratified sample $s_{A}$ of size $n_{A}$ was drawn. Probability-based random-digit-dial (RDD) telephone survey is performed in frame B, cell phone, and a sample $s_{B}$ of size $n_{B}$ is drawn using a simple random sampling without replacement design (SRSWOR).

Sample sizes of land (A) and cell (B) phones are $n_{A} = 1, 919$ and $n_{B} = 483$ . Domain sample sizes are as follows: In the overlapping population $n_{a b} = 1, 727$ for the sample $s_{a b} = (s_{A} \cap a b)$ , $n_{b a} = 237$ for the sample $s_{b a} = (s_{B} \cap a b)$ , $n_{b} = 246$ for the cell phone sample $s_{b} = s_{B} \cap b$ , and $n_{a} = 192$ for the land phone sample $s_{a} = s_{A} \cap a$ . The total sample is $s = s_{A} \cup s_{B} = s_{a} \cup s_{a b} \cup s_{b a} \cup s_{b}$ , and its size is $n = n_{A} + n_{B} = n_{a} + n_{a b} + n_{b a} + n_{b} = 2, 402$ .

At the time of data collection, frame sizes of land (A) and cell (B) phones were $N_{A} = 4, 982, 920$ and $N_{B} = 5, 707, 655$ and the total population size was $N = 6, 350, 916$ . The domain population sizes were $N_{a b} = 4, 339, 659$ for the overlap domain, $N_{a} = 643, 261$ for land phones, and $N_{b} = 1, 367, 996$ for cell phones. (Source: ICT-H 2012, Survey on the Equipment and Use of Information and Communication Technologies in Households, INE, National Statistical Institute, Spain, Table 2)

Table 2.

Sample Sizes for the OPIA Survey.

	Land	Cell	Total
Both	1,727	237	1,964
Cell		246	246
Land	192		192
Total	1,919	483	2,402

Note: OPIA = opinions and attitudes of the andalusian population regarding immigration. Land and cell in the columns refer to the frame from which the units were chosen, while in the rows, they refer to frame in which the units actually reside.

The land-phone sample was also stratified by provinces in the region of Andalusia, as shown in Table 3.

Table 3.

Stratification in Land-phone Sample.

Province	Almería	Cádiz	Córdoba	Granada	Huelva	Jaén	Málaga	Sevilla
$N_{h}^{A}$ *	353,787	767,370	508,258	558,087	308,941	423,548	872,011	1,190,918
$n_{h}^{A}$	262	210	252	256	275	263	207	194

*Those estimates can be found on the National Institute of Statistics (INE) website: http://www.ine.es/.

Cell phone interviews were carried out with no control over the distribution by provinces owing to the difficulty of determining the location of this type of telephone. Hence, more interviews were performed in the most populated provinces than in the less populated ones.

Initial Weighting Adjustments

This section describes the procedures used to create the weights for each sample. The base weights are the ratio of the number of telephone numbers in the frame to the number sampled. The weights were further adjusted to account for people who had multiple chances of being sampled because they had more than one telephone number.

First-order inclusion probabilities were computed from a stratified random design in frame A and modified taking into account the number of fixed lines ( $L_{h k}$ ) and adults in the household ( $A_{h k}$ ) as follows: $π_{h k}^{A} = \frac{n_{h}^{A} L_{h k}}{N_{h}^{A} A_{h k}}$ . The design weights were computed as $d_{h k}^{A} = 1 / π_{h k}^{A}$ for all h and k. A simple random sample without replacement, SRSWOR, was drawn from frame B and first-order inclusion probabilities were computed and modified, given the number of cell phone numbers per individual ( $M_{k}$ ) as $π_{k}^{B} = \frac{n_{B} M_{k}}{N_{B}}$ , for all k. The design weights were computed as $d_{k}^{B} = 1 / π_{k}^{B}$ .

Estimation in Dual-frame Telephone Surveys

We consider the problem of estimating the population proportion $P = N^{- 1} \sum_{k = 1}^{N} y_{k}$ , where $y_{k}$ is an attribute indicator for unit k, that is, $y_{k} = 1$ if unit k has the attribute of interest, and $y_{k} = 0$ otherwise. The number of population units belonging to the group of interest is denoted by $Y = \sum_{k = 1}^{N} y_{k}$ .

If the population size is known, an estimator $\hat{P}$ of the population can be easily obtained from the total estimator $\hat{Y}$ as the ratio $\hat{P} = \hat{Y} / N$ . In cases where the population size is unknown, $\hat{P} = \hat{Y} / \hat{N}$ is an estimator of P, where $\hat{N}$ is an estimate of the population size N (this situation can arise in practice when, e.g., the sampling frames available do not cover the entire target population). We now present an overview of the estimation procedures of $\hat{Y}$ used in this survey.

Single-frame Approach

Bankier (1986) and Kalton and Anderson (1986) proposed estimators that treated all the observations as if they had been sampled from a single frame, with adjusted weights in the intersection domain relying on the inclusion probabilities for each frame. In those situations, as in our example, in which we know the inclusion probability of the units in the sample under both sampling designs, the weights are defined as follows for all units in frame A and in frame B:

d_{k}^{s f} = \{\begin{matrix} d_{k}^{A} & i f & k \in a \\ {(1 / d_{k}^{A} + 1 / d_{k}^{B})}^{- 1} & i f & k \in a b \\ d_{k}^{B} & i f & k \in b \end{matrix} .

Note that the units in the overlap domain, which are expected to be selected with a probability $(π_{k}^{A} + π_{k}^{B})$ , have equal weights in frame A and in frame B.

Single-frame estimator (SF)

Kalton and Anderson’s (1986) single-frame estimator is given by:

{\hat{Y}}^{S F} = \sum_{k = 1}^{n} d_{k}^{s f} y_{k} .

The single-frame weights are the same for all response variables, and so the estimators are internally consistent. For complex surveys, however, single-frame estimators may not be efficient. Skinner (1991) provides a theoretical study of the efficiency of the raking ratio estimator for multiple-frame survey.

For the calculation of an unbiased estimator of the variance of a single-frame (SF) estimator, we adopted the approach proposed by Rao and Skinner (1996)

\hat{V} ({\hat{Y}}^{S F}) = \hat{V} ({\tilde{z}}_{k}^{A}) + \hat{V} ({\tilde{z}}_{k}^{B}),

where ${\tilde{z}}_{k}^{A} = δ_{k} (a) y_{k} + (1 - δ_{k} (a)) y_{k} \frac{π_{k}^{A}}{π_{k}^{A} + π_{k}^{B}}$ , ${\tilde{z}}_{k}^{B} = δ_{k} (b) y_{k} + (1 - δ_{k} (b)) y_{k} \frac{π_{k}^{B}}{π_{k}^{A} + π_{k}^{B}}$ and $\hat{V} (\cdot)$ denotes the Horvitz–Thompson variance estimator (see Särndal, Swensson, and Wretman 1992) with $δ_{k} (a) = 1$ if $k \in a$ and 0 otherwise, $δ_{k} (a b) = 1$ if $k \in a b$ and 0 otherwise, $δ_{k} (b a) = 1$ if $k \in b a$ and 0 otherwise and $δ_{k} (b) = 1$ if $k \in b$ and 0 otherwise.

Calibration estimator (CAL)

In the OPIA survey, $N_{A}$ , $N_{B}$ , and $N_{a b}$ are all known. We can define a calibration estimator on $(N_{a}, N_{a b}, N_{b})$ :

{\hat{Y}}^{C A L} = \sum_{k = 1}^{n} w_{k}^{c a l} y_{k},

with weights $w^{c a l}$ verified to be close to the design weights $d_{k}^{s f}$ and that reproduce the known totals $(N_{a}, N_{a b}, N_{b})$ , that is, ${\hat{N}}_{a}^{C A L} = \sum_{k = 1}^{n} w_{k}^{c a l} δ_{k} (a) = N_{a}$ , ${\hat{N}}_{b}^{C A L} = \sum_{k = 1}^{n} w_{k}^{c a l} δ_{k} (b) = N_{b}$ , and ${\hat{N}}_{a b}^{C A L} = \sum_{k = 1}^{n} w_{k}^{c a l} δ_{k} (a b) = N_{a b}$ . All the distance measures taken to define “closeness” provide the same set of calibration weights, because the minimization problem has an analytic solution irrespective of the distance function employed (see Ranalli et al. 2013, for details).

An estimator of the variance of calibration estimator can be obtained using the residuals of regression of y on $x = (δ_{k} (a), δ_{k} (a b), δ_{k} (b))$ as the y variable in equation (24).

Single-frame raking ratio (SFRR)

The SF estimator does not use any auxiliary information about the population totals $N_{A}$ and $N_{B}$ , but can be adjusted through any of the raking ratio estimations. Skinner (1991) and Rao and Skinner (1996) showed that the raking procedures in fact converge to give the explicit estimator

{\hat{Y}}^{S F R R} = \frac{N_{A} - {\hat{N}}_{a b}^{R R}}{{\hat{N}}_{a}^{S F}} {\hat{Y}}_{a}^{S F} + \frac{{\hat{N}}_{a b}^{R R}}{{\hat{N}}_{a b}^{S F}} {\hat{Y}}_{a b}^{S F} + \frac{N_{B} - {\hat{N}}_{a b}^{R R}}{{\hat{N}}_{b}^{S F}} {\hat{Y}}_{b}^{S F},

where ${\hat{N}}_{a b}^{R R}$ is the smallest root of the quadratic equation ${\hat{N}}_{a b}^{S F} x^{2} - [{\hat{N}}_{a b}^{S F} (N_{A} + N_{B}) + {\hat{N}}_{a}^{S F} {\hat{N}}_{b}^{S F}] x + {\hat{N}}_{a b}^{S F} N_{A} N_{B} = 0.$

If $N_{a b}$ is not known, a calibration estimator can be defined on $(N_{A}, N_{B})$ :

{\tilde{Y}}^{C A L} = \sum_{k = 1}^{n} {\tilde{w}}_{k}^{c a l} y_{k},

with weights ${\tilde{w}}^{c a l}$ verified to be close to the design weights $d_{k}^{s f}$ and that reproduce the known totals $(N_{A}, N_{B})$ , that is, ${\tilde{N}}_{A}^{C A L} = \sum_{k = 1}^{n} {\tilde{w}}_{k}^{c a l} δ_{k} (A) = N_{A}$ and ${\tilde{N}}_{B}^{C A L} = \sum_{k = 1}^{n} {\tilde{w}}_{k}^{c a l} δ_{k} (B) = N_{B}$ . This estimator is the same as SFRR in equation (5) if the “raking” method is used in calibration.

The variance for the single-frame calibration estimator is then determined using the residuals of regression of y on $x = (δ_{k} (A), δ_{k} (B))$ as the y variable in equation (24).

Dual-frame Approach

In situations in which we do not know the inclusion probability of the units in the sample under both sampling designs, dual-frame methods can be considered. For comparison, these methods are also considered in our example.

We can write

Y = Y_{a} + η Y_{a b} + (1 - η) Y_{b a} + Y_{b},

where $Y_{a} = \sum_{j \in a} y_{j}$ , $Y_{a b} = \sum_{j \in a b} y_{j}$ , $Y_{b a} = \sum_{j \in b a} y_{j}$ , and $Y_{b} = \sum_{j \in b} y_{j}$ .

Fixed weight adjustment (FWA)

The simplest weight modification to preserve approximate unbiasedness, as described by Hartley (1962), yields

\hat{Y} (θ) = {\hat{Y}}_{a} + θ {\hat{Y}}_{a b} + (1 - θ) {\hat{Y}}_{b a} + {\hat{Y}}_{b} .

Brick et al. (2006) used $θ = 1 / 2$ in their study of a dual-frame survey in which frame A was a landline telephone frame and frame B was a cell phone frame. For this purpose, the value of $θ = 1 / 2$ is frequently recommended (see, e.g., Mecatti 2007). This estimator is denoted by

{\hat{Y}}^{F W A} = {\hat{Y}}_{a} + (1 / 2) {\hat{Y}}_{a b} + (1 / 2) {\hat{Y}}_{b a} + {\hat{Y}}_{b} .

In order to calculate an estimator of the variance, we have taken into account that samples from frames A and B are drawn independently and that the value for θ is fixed. Thus,

\hat{V} (\hat{Y} (θ)) = \hat{V} ({\hat{Y}}_{a} + θ {\hat{Y}}_{a b}) + \hat{V} ((1 - θ) {\hat{Y}}_{b a} + {\hat{Y}}_{b}),

where equation (24) is used to compute the variance estimations.

Hartley (HAR; 1962, 1974) proposed choosing θ in equation (8) so that the variance of $\hat{Y} (θ)$ would be minimized. The optimizing value of θ is given by:

θ_{o p t} = \frac{V ({\hat{Y}}_{b a}) + c o v ({\hat{Y}}_{b}, {\hat{Y}}_{b a}) - c o v ({\hat{Y}}_{a}, {\hat{Y}}_{a b})}{V ({\hat{Y}}_{a b}) + V ({\hat{Y}}_{b a})},

and the estimator has the form

{\hat{Y}}^{H A R} (θ_{o p t}) = {\hat{Y}}_{a} + θ_{o p t} {\hat{Y}}_{a b} + (1 - θ_{o p t}) {\hat{Y}}_{b a} + {\hat{Y}}_{b} .

However, this optimal estimator is a function of the variances and covariances of the estimated domain totals and then the optimal estimates will differ for different response variables. In cases where estimation of $θ_{o p t}$ is outside $[0, 1]$ , approximation

θ_{o p t} ≃ \frac{V ({\hat{Y}}_{b a})}{V ({\hat{Y}}_{a b}) + V ({\hat{Y}}_{b a})},

can be used instead. In our example, using equation (24) to estimate the three variances found in the latter expression of $θ_{o p t}$ , we can obtain an estimation for the $θ_{o p t}$ without using second-order inclusion probabilities. The variance estimator for the Hartley estimator can be obtained by replacing θ in equation (10) for the $θ_{o p t}$ value given in equation (11).

Fuller and Burmeister (1972) proposed modifying Hartley’s estimator by incorporating additional information regarding estimation of the overlap domain. The resulting estimator is as follows:

{\hat{Y}}^{F B} (β) = {\hat{Y}}_{a} + β_{1} {\hat{Y}}_{a b} + (1 - β_{1}) {\hat{Y}}_{b a} + {\hat{Y}}_{b} + β_{2} ({\hat{N}}_{a b} - {\hat{N}}_{b a}),

where $β_{1}$ and $β_{2}$ are selected to minimize $V ({\hat{Y}}_{F B} (β))$ . In this case, and as with Hartley’s estimator, a new set of weights must be calculated for each response variable, leading to the inconsistency of the estimator. Optimum values depend on covariances among the Horvitz–Thompson estimators and it is also possible to obtain values of $β_{1}$ outside $[0, 1]$ . Moreover, it is not possible to estimate the population size N using the FB estimator, because the minimization process requires the inversion of a singular matrix.

Pseudo Maximum Likelihood (PML) Skinner and Rao (1996) proposed modifying the maximum likelihood estimator for a simple random sample suggested by Fuller and Burmeister (1972) to obtain a PML estimator for a complex design. The PML estimator, unlike the Hartley and Fuller-Burmeister estimators, is linear in y and is of the form

{\hat{Y}}^{P M L} (θ) = \frac{N_{A} - {\hat{N}}_{a b}^{P M L} (θ)}{{\hat{N}}_{a}} {\hat{Y}}_{a} + \frac{{\hat{N}}_{a b}^{P M L} (θ)}{{\hat{N}}_{a b} (θ)} {\hat{Y}}_{a b} (θ) + \frac{N_{B} - {\hat{N}}_{a b}^{P M L} (θ)}{{\hat{N}}_{b}} {\hat{Y}}_{b},

where ${\hat{Y}}_{a b} (θ) = θ {\hat{Y}}_{a b} + (1 - θ) {\hat{Y}}_{b a}, {\hat{N}}_{a b} (θ) = θ {\hat{N}}_{a b} + (1 - θ) {\hat{N}}_{b a}$ and ${\hat{N}}_{a b}^{P M L} (θ)$ is the smallest root of the quadratic equation

$[θ / N_{B} + (1 - θ) / N_{A}] x^{2} - [1 + θ {\hat{N}}_{a b} / N_{B} + (1 - θ) {\hat{N}}_{b a} / N_{A}] x + {\hat{N}}_{a b} = 0$ . Skinner and Rao (1996) suggested choosing θ to minimize the asymptotic variance of ${\hat{N}}_{a b}^{P M L} (θ)$ , with

\hat{θ} = \frac{N_{a} N_{B} \hat{V} ({\hat{N}}_{b a})}{N_{a} N_{B} \hat{V} ({\hat{N}}_{b a}) + N_{b} N_{A} \hat{V} ({\hat{N}}_{a b})},

or estimate it as

θ ≃ \frac{V ({\hat{N}}_{b a})}{V ({\hat{N}}_{a b}) + V ({\hat{N}}_{b a})} .

In practice, the variances in equation (16) are unknown and must be estimated from the data. The PML estimator uses the same set of weights for each response variable and thus avoids some of the difficulties associated with the Hartley and Fuller-Burmeister estimators.

To estimate the variance of the PML estimator, we followed the method proposed by Rao and Skinner (1996), which provides a consistent estimator of variance in the form

\hat{V} ({\hat{Y}}^{P M L}) = \hat{V} ({\tilde{z}}_{k}^{A}) + \hat{V} ({\tilde{z}}_{k}^{B}),

where, in this case, ${\tilde{z}}_{k}^{A} = y_{k} - \frac{{\hat{Y}}_{a}}{{\hat{N}}_{a}}$ if $k \in s_{a}$ and ${\tilde{z}}_{k}^{A} = θ (y_{k} - \frac{{\hat{Y}}_{a b}}{{\hat{N}}_{a b}}) + \hat{λ} \hat{ϕ}$ if $k \in s_{a b}$ , where θ is calculated according to equation (16), $\hat{λ} = \frac{{\hat{Y}}_{a b}}{{\hat{N}}_{a b}} - \frac{{\hat{Y}}_{a}}{{\hat{N}}_{a}} - \frac{{\hat{Y}}_{b}}{{\hat{N}}_{b}}$ and $\hat{ϕ} = \frac{n_{A} {\hat{N}}_{b}}{n_{A} {\hat{N}}_{b} + n_{B} {\hat{N}}_{a}}$ . Similarly, we can define ${\tilde{z}}_{k}^{B} = y_{k} - \frac{{\hat{Y}}_{b}}{{\hat{N}}_{b}}$ if $k \in s_{b}$ and ${\tilde{z}}_{k}^{B} = (1 - θ) (y_{k} - \frac{{\hat{Y}}_{b a}}{{\hat{N}}_{a b}}) + \hat{λ} (1 - \hat{ϕ})$ if $k \in s_{b a}$ .

Pseudo Empirical Likelihood (PEL). Recently, Rao and Wu (2010) extended the PEL approach proposed by Wu and Rao (2006) from one-frame surveys to dual-frame surveys following a stratification approach and considering an estimation of the population mean of y,

{\hat{Y}}^{P E L} (θ) = (N_{a} / N) {\hat{Y}}_{a} + (θ) (N_{a b} / N) {\hat{Y}}_{a b} + (N_{a b} / N) (1 - θ) {\hat{Y}}_{b a} + (N_{b} / N) {\hat{Y}}_{b},

where $θ \in (0, 1)$ is a fixed constant to be specified and ${\hat{\overset{ˉ}{Y}}}_{a} = \sum_{k \in s_{a}} {\hat{p}}_{a k} y_{k}$ , ${\hat{\overset{ˉ}{Y}}}_{b} = \sum_{k \in s_{b}} {\hat{p}}_{b k} y_{k}$ , and ${\hat{\overset{ˉ}{Y}}}_{a b} = \sum_{k \in s_{a b}} {\hat{p}}_{a b k} y_{k} = {\hat{\overset{ˉ}{Y}}}_{b a}$ . The weights maximize the PEL and verify $\sum_{k \in s_{a}} p_{a k} = 1$ , $\sum_{k \in s_{a b}} p_{a b k} = 1$ , $\sum_{k \in s_{b a}} p_{b a k} = 1$ , $\sum_{k \in s_{b}} p_{b k} = 1$ , and the additional constraint induced by the common domain mean ${\overset{ˉ}{Y}}_{a b} = {\overset{ˉ}{Y}}_{b a}$ . In this case, we use the same estimation for θ as the one proposed in equation (17).

Instead of calculating the explicit variance of the estimator, confidence intervals are obtained using the bisection method described by Wu (2005). This method constructs intervals in the form $θ | r_{n s} (θ) < χ_{1}^{2} (α)$ , where $χ_{1}^{2} (α)$ is the $1 - α$ quantile from a $χ^{2}$ distribution with one degree of freedom and $r_{n s}$ represents the pseudo empirical log likelihood ratio statistic, which can be obtained as the difference of two PEL functions.

Jackknife Variance Estimation

We also use jackknife estimation to determine the variance of the estimators compared (Wolter 2007). The variance estimators presented in the third section can be computed in many different ways, depending on the specific estimator. Moreover, in small samples, they may poorly estimate the variability of estimators because they estimate the asymptotic variance rather than the exact variance. Instead, the jackknife approach is a common method for variance estimation that can be used whatever the estimator. Thus, estimated variances obtained through this method can be used to compare the efficiencies of the estimators. For the sake of brevity, in this section dual or SF estimators are denoted by ${\hat{Y}}_{c}$ .

In the case of a stratified design, as in frame A, let frame A be divided into H strata and let stratum h have $N_{A h}$ observation units of which $n_{A h}$ are sampled. Then, a jackknife variance estimator of ${\hat{Y}}_{c}$ with an approximate finite-population correction is given by:

V_{J}^{A} ({\hat{Y}}_{c}) = \sum_{h = 1}^{H} (1 - \frac{n_{A h}}{N_{A h}}) \frac{n_{A h} - 1}{n_{A h}} \sum_{i \in s_{A h}} {({\hat{Y}}_{c}^{A} (h i) - {\overline{Y}}_{c}^{A h})}^{2},

where ${\hat{Y}}_{c}^{A} (h i)$ is the value taken by estimator ${\hat{Y}}_{c}$ after dropping unit i of stratum h from sample $s_{A h}$ , ${\overline{Y}}_{c}^{A h}$ is the average of these $n_{A h}$ values.

If we consider a nonstratified design, as in frame B, the jackknife estimator for the variance of ${\hat{Y}}_{c}$ with an approximate finite-population correction may be given by

V_{J}^{B} ({\hat{Y}}_{c}) = \frac{n_{B} - 1}{n_{B}} (1 - \frac{n_{B}}{N_{B}}) \sum_{i \in s_{B}} {({\hat{Y}}_{c}^{B} (i) - {\overline{Y}}_{c}^{B})}^{2},

where ${\hat{Y}}_{c}^{B} (i)$ is the value taken by estimator ${\hat{Y}}_{c}$ after dropping unit i from $s_{B}$ and ${\overline{Y}}_{c}^{B}$ is the average of ${\hat{Y}}_{c}^{B} (i)$ values (see Wolter 2007).

For any estimator ${\hat{Y}}_{c}$ in the single- or dual-frame approach, we compute ${\hat{Y}}_{c} (i), i = 1, \dots, n$ . Then, the pseudo values ${\hat{Y}}_{c} (i)$ are separated into those from frame A and from frame B and $V_{J}^{A}$ and $V_{J}^{B}$ are computed. Finally, due to the independence, $V_{J} = V_{J}^{A} + V_{J}^{B}$ .

Results for the OPIA Survey

To examine the performance of the dual-frame estimation methods in practice, we applied them to the dataset from the OPIA survey.

Three main variables are included in this study, related to “goodness of immigration,” “amount of immigration,” and “confidence in immigration.” The variables are the answers to the following questions:

And in relation to the number of immigrants currently living in Andalusia, do you think there are … ? Too many, A reasonable number, Too few

In general, do you think that for Andalusia, immigration is … ? Very bad, Bad, Neither good nor bad, Good, Very good

In general, how much confidence do you have in immigrants? None at all, Very little, It depends, Quite a lot, Very much.

Each category of each variable is treated separately as an attribute, so that for any of the attributes of interest, $y_{k I} = 1$ if the kth individual presents the attribute I and $y_{k I} = 0$ otherwise. The proportions for all the main variables are computed using ${\hat{P}}_{I} = \frac{{\hat{Y}}_{I}}{\hat{N}}$ , where ${\hat{Y}}_{I}$ is the estimated total of units in the population with the attribute of interest I and $\hat{N}$ is an estimate of the population size N. For example, using the single-frame estimator (2), we estimate the population total and the population size as:

{\hat{Y}}_{I}^{S F} = \sum_{k = 1}^{n} d_{k}^{s f} y_{k I} a n d {\hat{N}}^{S F} = \sum_{k = 1}^{n} d_{k}^{s f},

respectively, and similarly for the other estimators. For the FB estimator, the matrix to solve the minimum variance is singular in estimating the population size N and this estimator is not included.

The weights $w_{k}^{c a l}$ of the calibration estimator (4) verify that

\begin{aligned} \sum_{k = 1}^{n} w_{k}^{c a l} δ_{k} (a) \end{aligned} = 643261, \sum_{k = 1}^{n} w_{k}^{c a l} δ_{k} (a b) = 1, 367, 996, \sum_{k = 1}^{n} w_{k}^{c a l} δ_{k} (b) = 4, 339, 659.

As Särndal (2007) says, the calibration gives a unique weighting system, one that is perfectly clear and transparent and applicable to all study variables.

In the dual-frame approach, there is no single ${\hat{θ}}_{o p t}$ for the HAR estimator, since it depends on the values of each study variable. For the PEL estimator, the value for $\hat{θ}$ in equation (17; applicable to all study variables) is $\hat{θ} = 0.729684$ , whereas with the FWA estimator we use $θ = 1 / 2$ . For the PML estimator, the value for θ in equation (16) is $\hat{θ} = 0.620662$ .

All dual-frame estimators have one thing in common: the weighting of the estimations for the overlap domain, either with 1/2 or with one of estimations of θ in equation (11), (13), or (16). In SF estimators, the weighting is given by probabilities under both sampling designs.

All the estimators considered in this article require estimates of the domain sizes $N_{a}$ , $N_{b}$ , and $N_{a b}$ . The estimates for the sizes of the population domains are obtained using the Horvitz–Thompson estimator. For domain a, the population size $N_{a}$ is estimated by ${\hat{N}}_{a} = \sum_{k = 1}^{n_{A}} d_{k}^{A} δ_{k} (a)$ where $δ_{k} (a) = 1$ if $k \in a$ and 0 otherwise. For domain ab, there are two options: (a) the population size $N_{a b}$ is estimated by ${\hat{N}}_{a b} = \sum_{k = 1}^{n_{A}} d_{k}^{A} δ_{k} (a b)$ where $δ_{k} (a b) = 1$ if $k \in a b$ and 0 otherwise and (b) the population size $N_{a b}$ is estimated by ${\hat{N}}_{b a} = \sum_{k = 1}^{n_{B}} d_{k}^{B} δ_{k} (a b)$ . For domain b, the population size $N_{b}$ is estimated by ${\hat{N}}_{b} = \sum_{k = 1}^{n_{B}} d_{k}^{B} δ_{k} (b)$ where $δ_{k} (b) = 1$ if $k \in b$ and 0 otherwise.

In a similar way, we denote the Horvitz–Thompson estimator of any y variable in domain a as ${\hat{Y}}_{a} = \sum_{k = 1}^{n_{A}} d_{k}^{A} δ_{k} (a) y_{k}$ and similarly for the others. In the present survey, the following results are obtained:

The variances in Table 4 are computed using Deville’s (1993) method to avoid second-order probabilities (although in this case it is possible to easily compute them). This method yields, given a y variable whose population total Y is estimated using the Horvitz–Thompson estimator based on a sample s, $\hat{Y} = \sum_{s} y_{k} / π_{k}$ , the following variance estimator:

\hat{V} (\hat{Y}) = \frac{1}{1 - \sum_{k \in s} a_{k}^{2}} \sum_{k \in s} (1 - π_{k}) {(\frac{y_{k}}{π_{k}} - \sum_{l \in s} a_{l} \frac{y_{l}}{π_{l}})}^{2},

Table 4.

Estimates of Domain Sizes and Coefficients of Variation.

Domain	Estimate	CV
a	493,776	0,084
ab	4,646,468	0,020
ba	3,117,703	0,049
b	3,227,202	0,047

Note: CV = coefficients of variation.

where $a_{k} = (1 - π_{k}) / \sum_{l \in s} (1 - π_{l})$ .

Tables 5 –7 show the point and 95 percent confidence level estimation of proportions of the main variables. Two different sets of confidence intervals are calculated: one, based on the jackknife variance estimation described in the fourth section and the other, based on the variance estimations described in the third section. Other tables could be obtained if finite population correction factors were used in jackknife variance estimation, but they are not included here because the results would be very similar.

Table 5.

Point and 95 Percent Confidence Level Estimation of Proportions Using Several Methods for Variance Estimation.

In general, do you think that for Andalusia, immigration is … ?
		Jackknife variance			Analytical variance
Estimator	PROP	LB	UB	LEN	LB	UB	LEN
	Very bad
SF	13.72	11.57	15.87	4.31	11.57	15.87	4.30
SFRR	13.90	11.84	15.95	4.11	11.09	16.12	5.03
CAL	13.35	11.69	15.01	3.33	10.72	15.97	5.25
FWA	13.70	11.66	15.74	4.08	11.68	16.30	4.62
HAR	13.44	11.66	15.23	3.58	11.64	15.98	4.34
PML	13.87	11.76	15.97	4.21	11.57	16.87	5.30
PEL	13.62	11.71	15.53	3.82	12.89	15.86	2.97
	Bad
SF	47.24	43.72	50.77	7.05	43.79	50.70	6.91
SFRR	47.39	44.43	50.35	5.92	43.98	51.14	7.16
CAL	46.92	44.52	49.33	4.81	43.18	50.66	7.48
FWA	45.48	42.24	48.72	6.49	43.10	50.16	7.06
HAR	46.16	43.19	49.12	5.93	43.06	49.93	6.87
PML	46.43	43.02	49.84	6.82	42.96	50.95	7.99
PEL	45.95	43.24	48.65	5.41	45.22	50.79	5.57
	Neither good nor bad
SF	4.85	3.54	6.16	2.61	3.54	6.16	2.61
SFRR	4.47	3.42	5.51	2.09	3.08	6.19	3.11
CAL	4.75	3.75	5.74	1.99	3.13	6.37	3.24
FWA	4.20	3.18	5.23	2.05	3.17	5.88	2.71
HAR	4.60	3.59	5.62	2.03	3.09	5.68	2.59
PML	4.34	3.30	5.38	2.08	2.44	5.43	2.99
PEL	4.33	3.30	5.36	2.06	2.81	5.21	2.40
	Good
SF	28.35	25.56	31.14	5.58	25.58	31.13	5.55
SFRR	28.22	25.87	30.57	4.70	25.00	31.33	6.33
CAL	28.98	26.86	31.11	4.25	25.68	32.29	6.61
FWA	30.46	27.74	33.19	5.45	25.98	31.85	5.87
HAR	29.93	27.52	32.34	4.82	25.71	31.32	5.61
PML	29.36	26.92	31.81	4.89	25.19	31.81	6.62
PEL	29.96	27.49	32.43	4.94	25.05	29.96	4.91
	Very good
SF	2.18	1.36	3.00	1.63	1.36	3.00	1.64
SFRR	2.10	1.41	2.79	1.38	1.12	3.08	1.96
CAL	2.16	1.51	2.82	1.31	1.14	3.19	2.05
FWA	2.14	1.35	2.93	1.58	1.25	3.03	1.78
HAR	2.11	1.43	2.78	1.35	1.29	2.94	1.65
PML	2.08	1.36	2.80	1.44	0.98	3.05	2.07
PEL	2.12	1.35	2.88	1.53	1.39	2.54	1.15

Note: SF = single-frame estimator; SFRR = single-frame raking ratio; CAL = calibration estimator; FWA = fixed weight adjustment; HAR = Hartley; PML = pseudo maximum likelihood; PEL = pseudo empirical likelihood. Main variable: “Goodness of Immigration.”

Table 6.

Point and 95 Percent Confidence Level Estimation of Proportions Using Several Methods for Variance Estimation.

In relation to the number of immigrants currently living in Andalusia, do you think there are … ?
		Jackknife variance			Analytical variance
Estimator	PROP	LB	UB	LEN	LB	UB	LEN
	Too many
SF	42.31	38.95	45.66	6.71	38.97	45.64	6.67
SFRR	40.69	37.90	43.48	5.59	37.79	44.90	7.11
CAL	40.97	38.61	43.34	4.74	37.26	44.69	7.43
FWA	40.26	37.28	43.24	5.97	39.10	45.94	6.84
HAR	39.92	37.26	42.59	5.33	38.75	45.42	6.67
PML	40.44	37.29	43.59	6.30	38.12	45.86	7.74
PEL	41.05	38.37	43.73	5.36	39.78	43.93	4.15
	A reasonable number
SF	45.81	42.44	49.19	6.74	42.51	49.12	6.61
SFRR	47.85	44.99	50.72	5.73	43.05	50.15	7.10
CAL	47.03	44.63	49.43	4.79	43.32	50.74	7.42
FWA	47.91	44.59	51.23	6.64	42.03	48.82	6.79
HAR	48.43	45.41	51.45	6.04	42.02	48.63	6.61
PML	47.95	44.88	51.02	6.14	41.82	49.35	7.53
PEL	46.72	44.00	49.43	5.43	43.44	47.96	4.52
	Too few
SF	6.06	4.53	7.59	3.06	4.52	7.59	3.07
SFRR	5.39	4.15	6.63	2.48	3.87	7.49	3.62
CAL	5.62	4.50	6.74	2.25	3.73	7.51	3.78
FWA	5.19	3.99	6.39	2.40	4.38	7.62	3.24
HAR	5.34	4.22	6.46	2.23	4.38	7.47	3.09
PML	5.33	4.09	6.56	2.47	3.74	7.35	3.62
PEL	5.49	4.27	6.72	2.45	4.51	6.63	2.12

Table 7.

Point and 95 Percent Confidence Level Estimation of Proportions Using Several Methods for Variance Estimation.

In general, how much confidence do you have in immigrants?
		Jackknife variance			Analytical variance
Estimator	PROP	LB	UB	LEN	LB	UB	LEN
	None at all
SF	7.15	5.56	8.75	3.18	5.56	8.75	3.19
SFRR	7.66	6.07	9.24	3.18	5.59	9.36	3.77
CAL	7.17	5.91	8.43	2.51	5.20	9.14	3.93
FWA	7.15	5.67	8.64	2.98	5.47	8.88	3.41
HAR	7.01	5.70	8.31	2.61	5.48	8.68	3.20
PML	7.36	5.76	8.97	3.21	5.87	9.80	3.93
PEL	7.14	5.73	8.54	2.81	7.14	9.37	2.23
	Very little
SF	35.67	32.43	38.90	6.47	32.46	38.88	6.42
SFRR	34.61	31.83	37.40	5.56	31.46	38.42	6.96
CAL	34.34	32.04	36.65	4.61	30.71	37.98	7.27
FWA	34.09	31.16	37.02	5.86	32.80	39.46	6.66
HAR	33.65	31.01	36.29	5.28	32.37	38.80	6.43
PML	34.44	31.36	37.52	6.16	32.32	39.85	7.53
PEL	34.71	32.09	37.32	5.22	34.14	38.33	4.19
	Quite a lot
SF	35.02	32.06	37.98	5.92	32.09	37.94	5.85
SFRR	36.45	33.80	39.10	5.31	32.36	38.98	6.62
CAL	36.55	34.27	38.84	4.57	33.10	40.01	6.91
FWA	38.18	35.14	41.23	6.09	32.08	38.21	6.13
HAR	38.12	35.39	40.84	5.45	32.07	37.96	5.89
PML	37.34	34.63	40.05	5.42	31.80	38.72	6.92
PEL	37.06	34.44	39.67	5.22	32.72	37.06	4.34
	Very much
SF	12.24	10.13	14.34	4.20	10.13	14.34	4.21
SFRR	10.94	9.30	12.59	3.28	8.85	13.75	4.90
CAL	11.34	9.82	12.86	3.05	8.78	13.90	5.12
FWA	10.80	9.10	12.50	3.40	9.95	14.40	4.45
HAR	10.90	9.36	12.44	3.08	9.94	14.16	4.22
PML	10.90	9.22	12.57	3.35	8.58	13.56	4.98
PEL	11.18	9.48	12.88	3.40	9.36	12.10	2.74
	It depends
SF	7.29	5.73	8.85	3.13	5.73	8.85	3.12
SFRR	6.87	5.71	8.04	2.33	5.64	9.33	3.69
CAL	7.63	6.38	8.87	2.49	5.70	9.56	3.85
FWA	6.68	5.42	7.94	2.51	5.18	8.43	3.25
HAR	7.25	6.03	8.47	2.44	5.17	8.27	3.10
PML	6.71	5.53	7.90	2.37	4.72	8.33	3.61
PEL	7.05	5.73	8.36	2.63	5.27	8.27	3.00

For the outcomes shown in Tables 5–7, we obtained the following findings:

There are no important differences between the estimates produced with the single- or dual-frame approach.

Among all the estimation strategies, the calibration method performs best and produces the smallest confidence interval. Calibration estimation can be implemented easily using existing software for SF populations. There are several R packages for obtaining estimations using the calibration technique, as the Sampling package.

The jackknife method often produces better intervals than methods based on the estimated variance given by the authors (except for the PEL intervals)

At the time of data collection, the frame sizes for land phones (A) and cell phones (B) were $N_{A} = 4, 982, 920$ and $N_{B} = 5, 707, 655$ and the overlap domain was size $N_{a b} = 4, 339, 659$ . We also studied the effect on estimation of using different values for frame and overlap domain sizes extracted from different sources. For this purpose, we considered the three sets of sizes shown in Table 8. The data were obtained from the Survey on the Equipment and Use of Information and Communication Technologies in Households (conducted by the Spanish National Institute of Statistics) and from the IESA Households Survey conducted in 2012 and 2013. Using four of the estimators described in the third section, we computed the three possible estimations, the average values, and the coefficients of variation. The results of this are shown in Tables 9 and 10 for the three main variables.

Table 8.

Frame Sizes.

	ICT-H	ICT-H	IESA-SH
	2012	2013	2012
$N_{A}$	4,982,920	4,507,662	4,880,574
$N_{B}$	5,707,655	6,073,789	6,098,453
$N_{a b}$	4,339,659	3,983,443	4,266,797

Note: IESA-SH, survey in households, IESA. ICT-H, survey on the equipment and use of information and communication technologies in households, INE.

Table 9.

Average (AVG) and Coefficient of Variation (CV) of Four Point Estimations.

In general, do you think that for Andalusia, immigration is … ?
		ICT-H	ICT-H	IESA-SH
Estimator	AVG	2012	2013	2012	CV
	Very bad
SFRR	13.96	13.90	13.95	14.04	0.51
CAL	13.45	13.35	13.47	13.54	0.71
PML	13.85	13.87	13.82	13.85	0.18
PEL	13.71	13.62	13.72	13.78	0.59
	Bad
SFRR	47.24	47.39	47.19	47.14	0.28
CAL	47.02	46.92	47.05	47.08	0.18
PML	46.28	46.43	46.10	46.32	0.36
PEL	46.10	45.95	46.14	46.22	0.30
	Neither good nor bad
SFRR	4.51	4.47	4.52	4.53	0.71
CAL	4.77	4.75	4.77	4.80	0.53
PML	4.40	4.34	4.46	4.41	1.37
PEL	4.38	4.33	4.38	4.43	1.14
	Good
SFRR	28.26	28.16	28.31	28.30	0.30
CAL	28.80	28.98	28.76	28.66	0.57
PML	28.06	27.77	27.86	28.55	1.52
PEL	29.72	29.96	29.67	29.53	0.74
	Very good
SFRR	2.12	2.10	2.13	2.14	0.98
CAL	2.17	2.16	2.17	2.17	0.27
PML	2.11	2.08	2.13	2.11	1.19
PEL	2.12	2.12	2.12	2.13	0.27

Table 10.

Average (AVG) and Coefficient of Variation (CV) of Four Point Estimations.

In relation to the number of immigrants currently living in Andalusia, do you think there are … ?
		ICT-H	ICT-H	IESA-SH
Estimator	Avg	2012	2013	2012	CV
	Too many
SFRR	40.82	40.69	40.81	40.96	0.33
CAL	41.34	40.97	41.38	41.68	0.86
PML	40.46	40.44	40.46	40.53	0.12
PEL	41.42	41.05	41.45	41.75	0.85
	A reasonable number
SFRR	47.86	47.85	47.88	47.86	0.03
CAL	46.69	47.03	46.65	46.39	0.69
PML	48.03	47.95	48.10	48.03	0.16
PEL	46.40	46.72	46.36	46.11	0.66
	Too few
SFRR	5.44	5.39	5.44	5.49	0.92
CAL	5.74	5.62	5.75	5.85	2.01
PML	5.38	5.33	5.39	5.41	0.77
PEL	5.62	5.49	5.63	5.74	2.23
In general, how much confidence do you have in immigrants?
		ICT-H	ICT-H	IESA-SH
Estimator	Avg	2012	2013	2012	CV
	None at all
SFRR	7.58	7.66	7.56	7.53	0.90
CAL	7.17	7.17	7.18	7.16	0.14
PML	7.27	7.36	7.16	7.28	1.39
PEL	7.14	7.14	7.15	7.13	0.14
	Very little
SFRR	34.73	34.61	34.70	34.87	0.38
CAL	34.71	34.34	34.76	35.04	1.01
PML	34.46	34.44	34.34	34.61	0.40
PEL	35.06	34.71	35.10	35.36	0.93
	Quite a lot
SFRR	36.45	36.45	36.49	36.40	0.12
CAL	36.12	36.55	36.06	35.75	1.12
PML	37.51	37.34	37.61	37.58	0.39
PEL	36.59	37.06	36.53	36.19	1.20
	Very much
SFRR	11.14	10.94	11.16	11.32	1.71
CAL	11.59	11.34	11.60	11.82	2.07
PML	11.09	10.90	11.15	11.22	1.52
PEL	11.44	11.18	11.45	11.68	2.19
	It depends
SFRR	6.73	6.87	6.73	6.58	2.16
CAL	7.53	7.63	7.52	7.45	1.20
PML	6.74	6.71	6.68	6.82	1.09
PEL	6.99	7.05	6.98	6.93	0.86

The estimates obtained by each method, using different values of frame sizes obtained from three sources, are, in general, similar. It is concluded that the estimators are only slightly influenced by the source used to estimate the population sizes for landline and cell phones.

Conclusion

This article addresses some of the issues involved in using dual-frame methods for landline and cell phone surveys. Multiple-frame surveys are very useful when it is not possible to guarantee complete coverage of the target population and may result in considerable cost savings in comparison with an SF design with comparable precision. However, this technique is not often applied by national statistical agencies or by private survey agencies due to its complexity and the difficulties inherent in analyzing multiple-frame surveys with standard survey software.

Several estimators have been proposed and the first question to be considered is how to choose the most suitable one for this application.

Calibration, fixed weight, PML, and SF estimators all give internal consistency, since the same set of adjusted weights is used for all variables. In our application, good results were obtained with these procedures. We recommend that an internally consistent estimator be used. With repeated surveys, the simplicity and transparency of a fixed-weight estimator may be preferred. Fixed-weight adjustments may make year-to-year comparisons easier in an annual survey, where the domain proportions are relatively constant over time. Fixed-weight estimators are also more amenable to weight adjustments for nonresponse and domain misclassification. Standard survey software may then be used to estimate population proportions and totals using the modified weights.

On the other hand, variance estimation is more complicated when dual-frame estimators are used. Resampling methods such as jackknife estimation may then be used to estimate variances. Jackknife intervals are easy to compute and give accurate intervals.

The dual-frame estimates obtained from the variables considered in this study suggest that the use of different values for frame and overlap domain sizes extracted from different sources had no substantial impact on the level of efficiency obtained.

In this study, the use of auxiliary variables was not considered for estimating the study variables. The use of demographic variables such as age, income, or emancipation in the calibration and PEL methods can improve the estimates, because these variables can have a considerable impact on the distribution of landlines and cell phones.

We also highlight the need to implement these methods in both commercial and noncommercial software for survey estimation. In this respect, we are now working on an R package for point and interval estimation for a two-frame estimator.

Finally, let us note that the results obtained in applying these methods in the OPIA survey indicate that negative views toward immigration continue to spread and that currently 59 to 61 percent of those surveyed in Andalusia state that immigration is bad or very bad for the region (in the previous edition of the study, in 2011, the corresponding figure was 58 percent, and in the first such survey, in 2005, it was only 51 percent). Perceptions regarding the number of immigrants, however, have changed in the opposite direction: There is now a lower percentage of people who say there are too many immigrants (from 51 percent in 2011 to current levels of 40–42 percent), while the other scores have risen slightly.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was partially supported by Ministerio de Educación y Ciencia (grant MTM2012-35650, Spain) and by Consejería de Economía, Innovación, Ciencia y Empleo (grants SEJ2954 and HUM1413 Junta de Andalucía).

References

Abascal

Rada

Vidal Díaz de

García

Landaluce

. 2012. “Face to Face and Telephone Surveys in Terms of Sampling Representativeness: A Multidimensional Analysis.” Quality and Quantity 46:303–13.

Bankier

M. D.

1986. “Estimators Based on Several Stratified Samples With Applications to Multiple Frame Surveys.” Journal of the American Statistical Association 81:1074–79.

Brick

J. M.

Dipko

Presser

Tucker

Yuan

. 2006. “Nonresponse Bias in a Dual Frame Sample of Cell and Landline Numbers.” Public Opinion Quarterly 70:780–93.

Brick

J. M.

Edwards

W. S.

Lee

. 2007. “Sampling Telephone Numbers and Adults, Interview Length, and Weighting in the California Health Interview Survey Cell Phone Pilot Study.” Public Opinion Quarterly 71:793–813.

Busse

Fuchs

. 2012. “The Components of Landline Telephone Survey Coverage Bias. The Relative Importance of No-phone and Mobile-only Populations.” Quality and Quantity 46:1209–25.

Deville

J. C.

1993. Estimation de la variance pour les enquêtes en deux phases. Manuscript. Paris, France: INSEE.

Díaz de Rada

. 2011. “Face-to-face Versus Telephone Surveys on Political Attitudes: A Comparative Analysis.” Quality and Quantity 45:817–27.

Fuller

W. A.

Burmeister

L. F.

. 1972. “Estimation for Samples Selected From Two Overlapping Frames.” In Proceedings of the American Statistical Association, Social Statistics Section, pp. 245–49. Montreal, Canada: American Statistical Association.

Hartley

H. O.

1962. “Multiple Frame Surveys.” In Proceedings of the American Statistical Association, Social Statistics Section, pp. 203–6. Montreal, Canada: American Statistical Association.

10.

Hartley

H. O.

1974. “Multiple Frame Methodology and Selected Applications.” Sankhya 36:99–118.

11.

Kalton

Anderson

D. W.

. 1986. “Sampling Rare Populations.” Journal of the Royal Statistical Society, Ser. A 149:65–82.

12.

Kennedy

2007. “Evaluation the Effects of Screening for Telephone Service in Dual Frame RDD Surveys.” Public Opinion Quarterly 71:750–71.

13.

Lohr

Rao

J. N. K.

. 2006. “Estimation in Multiple-Frame Surveys.” Journal of the American Statistical Association 101:1019–30.

14.

Sahr

Iachan

Denker

Duffy

Weston

. 2013. “Design and Analysis of Dual-Frame Telephone Surveys for Health Policy Research.” World Medical & Health Policy 5:217–32.

15.

Lund

R. E.

1968. “Estimators in Multiple Frame.” In Proceedings of the American Statistical Association, Social Statistics Section, pp. 282–88. Pittsburgh, USA: American Statistical Association.

16.

Mecatti

2007. “A Single Frame Multiplicity Estimator for Multiple Frame Surveys.” Survey Methodology 33:151–58.

17.

Opsomer

2011. “Innovations in Survey Sampling Design: Discussion of Three Contributions Presented at the U.S. Census Bureau.” Survey Methodology 37:227–31.

18.

Pasadas

Trujillo

. 2013. “Afijación óptima basada en costes para muestras telefónicas recogidas en marcos duales.” 1st Southern European Conference on Survey Methodology (SESM) and VI Congreso de Metodología de Encuestas Barcelona, Spain, December 12–14.

19.

Pasadas

Trujillo

Sánchez

Reche

. 2011. “La incorporación de las líneas móviles al marco muestral de las encuestas telefónicas: Pertinencia, métodos y resultados.” Metodología de Encuestas 13:33–54.

20.

Ranalli

M. G.

Arcos

Rueda

Teodoro

. 2013. “Calibration Estimators in Dual Frame Surveys.” ArXiv e-print:1312.0761 [stat.me].

21.

Rao

J. N. K.

Skinner

. 1996. “Estimation in Dual Frame Surveys with Complex Designs.” Proceedings of the Survey Method Section, Statistical Society of Canada, Toronto, Ontario, 63–68.

22.

Rao

J. N. K.

. 2010. “Pseudo Empirical Likelihood Inference for Multiple Frame Surveys.” Journal of the American Statistical Association 105:1494–503.

23.

Särndal

C. E.

2007. “The Calibration Approach in Survey Theory and Practice.” Survey Methodology 33:99–119.

24.

Särndal

C. E.

Swensson

Wretman

. 1992. Model Assisted Survey Sampling. New York: Springer-Verlag.

25.

Singh

A. C.

Mecatti

. 2011. “Generalized Multiplicity-Adjusted Horvitz-Thompson Estimation as a Unified Approach to Multiple Frame Surveys.” Journal of Official Statistics 27:633–50.

26.

Skinner

1991. “On the Efficiency of Raking Ratio Estimation for Multiple Frame Surveys.” Journal of the American Statistical Association 86:779–84.

27.

Skinner

Rao

J. N. K.

. 1996. “Estimation in Dual Frame Surveys with Complex Designs.” Journal of the American Statistical Association 91:349–56.

28.

Trujillo

Domínguez

J. A.

Pasadas

. 2005. “Mobile Phones and their Impacts on Survey Data.” European Association for Survey Research Conference Barcelona, Spain, July 18–22.

29.

Vicente

Reis

. 2009. “The Mobile-only Population in Portugal and its Impact in a Dual Frame Telephone Survey.” Survey Research Methods 3:105–11.

30.

Vicente

Reis

Santos

. 2009. “Using Mobile Phones for Survey Research: A Comparison with Fixed Phones.” International Journal of Market Research 51:613–33.

31.

Wolter

K. M.

2007. Introduction to Variance Estimation. New York: Springer.

32.

2005. “Algorithms and R Codes for the Pseudo Empirical Likelihood Method in Survey Sampling.” Survey Methodology 31:239–43.

33.

Rao

J. N. K.

. 2006. “Pseudo Empirical Likelihood Ratio Confidence Intervals for Complex Surveys.” The Canadian Journal of Statistics 34:359–75.