On model-based nowcasting for highly disaggregated levels

Abstract

Nowadays, national and international organizations experience an increasing demand for timely and disaggregated socio-economic indicators. More recently, this demand extends to the request for nowcasting indicators. Small Area Estimation has a long tradition in indicator prediction for high levels of disaggregation; but when speaking of ‘prediction’, this notation refers to the fact that the centre of interest is a random parameter. Prediction of future values, or similarly, nowcasting has hardly been studied so far. Yet, mixed models based Small Area Estimation is designed for imputing (missing) values, and these models can easily account for temporal correlation. Therefore, model assisted nowcasting would be a natural extension. This article reviews existing methods under this perspective to highlight the necessary ingredients, and then propose nowcasting procedures for highly disaggregated indicators that could already be used with the today’s available software.

Keywords

Disaggregated forecasting nowcasting in small areas time dependent mixed models

1. Introduction

Statistical offices and statistics units of different organizations experience an increasing demand for both, more timely and further disaggregated socio-economic indicators. This is required for providing empirical evidence for policy makers to monitor developments, and evaluate political interventions. The demand is fuelled by the increased awareness of the potentials that (new) data sources and statistical techniques can have. This demand is reflected in the recent initiatives of the Committee of the Chief Statisticians of the United Nations to build and foster competences in indicator nowcasting for their institutions.

The maybe most common definition of nowcasting is given in the Handbook of economic forecasting [1]: to forecast the present, the near or the recent past. Although the Handbook on rapid estimators [2] of eurostat works with a somewhat narrow definition, it provides an interesting review of existing methods for those kind of forecasting problems. This problem is repeatedly described as ‘presenting timely estimates based on incomplete sampling’. It is remarkable that the handbook authors put all emphasize on the difficulties arising from the demand for more timely indicators on higher frequencies. Without downplaying this problem, we would like to add the difficulties arising from the demand for timely indicators on higher levels of disaggregation. Such demanded we face for instance for several of the socio-economic SDG (sustainable development goal) indicators.1 We do not refer to the much discussed temporal disaggregation, but to the segregation of indicators along clusters like small areas, age or gender. Therefore, this article is intended to open the discussion on small area nowcasting (in which ‘areas’ is synonymously used for all kind of clusters).

We believe that this is interesting in several respects; not only for offering more timely small area estimates. Thinking of nowcasting, it is clear that not all disaggregated contemporaneous data are available at the moment when this information is required to construct a relevant aggregate. This causes the so-called missing data problem, and consequently falls in the category of problems that small area estimation (SAE henceforth) tries to attack; this strong relation is worked out and outlined in detail in [3]. Furthermore, while the handbook [2] mentions model-based approaches as being the cheapest alternative for producing timely estimates, it then concentrates – almost exclusively – on forecast methods based on time series. Sure, model-based SAE can (and so we do in this article) also be combined with time series modelling, but it borrows its strength from the combination of smart modelling with good predictors, not from long time series. This was first shown and discussed in [4], then further developed and applied by [5]. This can have various advantages in practice. Another natural link with SAE is the discussion inside the forecasting community about the question if for indicators that are aggregates, one should directly forecast the aggregate. This refers to so-called area level models, see for example [6] for predicting the number of wild fires by this approach. Alternatively, one could first predict the so-called disaggregates to afterwards aggregate them. Another alternative is to mix both: Similar to MIDAS for temporal disaggregation – see [7] for details on this method – there exist alternative mixed approaches, in which an aggregate nowcast is produced by conditioning on both, aggregates and disaggregates; see [8] also for further discussion. This latter point is also related to another issue. In the handbook [2], one chapter briefly mentions the potentials and problems of large data sets which may provide more (say $p$ ) predictors than time points (say $T$ ), especially when the time series are short. Then disaggregation into $D$ small areas becomes rather a boon than a bane because exploiting the data structure can remedy the dimensionality problem: The model based SAE approach allows for small samples with many predictors as long as $D$ is at least moderately large.

More specifically, models for SAE can be classified along the level of aggregation of the target variable. The unit-level models describe the behaviour of a response variable in the population of the units, their predicted values (using this model) are subsequently taken to construct estimators of area parameters. In contrast, area-level models explain the behaviour of aggregate values (of the variable of interest) such that their predictions can directly be used as estimators of the wanted population parameter. This can substantially reduce the variance of the final estimate, though at the cost of an increase in bias. The basic SAE unit-level model is the nested error regression model (NERM). [9] applied this to survey and satellite data for doing predictions for county crop areas in the United States. An area-level linear mixed model with random area effects was first proposed by [10] to estimate average per-capita income in small places inside the US. Both models have been adapted to temporal data, and applied to the estimation of socio-economic small area parameters. For example, [11, 5] applied AR(1) extensions of the NER, respectively the Fay-Herriot model. They used these extensions to estimate average annual incomes and poverty indicators in small areas.

The aim of this paper is to initiate the closure of the gap between the standard times series based nowcasting on the one side, and the prediction ideas of SAE on the other side. Mixed model based SAE goes back to the seventies and eighties. It was the increasing demand for reliable statistics regarding socio-demographic groups or geographical regions that contributed to their development. This topic is experiencing a new revival with the above mentioned increased data awareness. This is also reflected in the frequency of reviews and books. For example, article [12] gives a review on new developments and open problems; the book [13] gives a broad review without many technical details; in contrast, the book [14] gives all technical derivations but concentrates on some of the most popular methods, the reading article [15] describes the entire SAE procedure in practice, and [16] is a book of many SAE applications for studying poverty. Nowadays, SAE is, among others, widely applied to assess the need for implementing health or educational programs, for environmental planning as well as the allocation of subsidies in less developed regions.

Methodology of SAE borrows a lot from the mixed effects modelling, see [17], where the extra between-area-variation of data is captured by some area-specific random effects. In linear mixed models (LMM), empirical best linear unbiased predictors (EBLUP) and empirical Bayes estimators are widely recognized methods for obtaining small area predictions. To evaluate the accurateness of a prediction, it is crucial to measure its variability. Traditionally, one provides Mean Squared Error (MSE) estimates. This has been widely discussed in the literature, see for example [18, 19, 20] for different approaches of MSE estimation in the SAE-LMM context. As practitioners may find prediction intervals more informative than the MSE, several authors worked on their construction, see e.g. [21, 22, 23] for bootstrap versions, and [24] for an analytical approximation (specifically for area level models).

In Section 2 we introduce the general model together with parameter and MSE estimators. Section 3 shows how this model is extended to include a time dimension. In Section 4 we consider its potential use for nowcasting. This is followed by a section of intensive simulation studies with more detail about algorithms and implementation. Section 6 concludes with a brief discussion of generalizations of the outlined ideas towards more flexible models.

2. Model-based small area estimation revisited

We first revisit some basic approaches in model based small area estimation using the most common notation. We concentrate on the estimation a population parameter, but already having in mind its use and extensions to the problem of nowcasting.

2.1 Modelling and estimation framework

Let $Y$ be the variable containing our indicators of interest, and $X$ the potential predictors including constant 1. For the sake of notation let $Y$ be one-dimensional but $X$ of dimension $p$ . Variable $Z$ is typically a subset of $X$ , in practice often just the constant. We suppose to be provided with observations $\{y_{dj},x_{dj}\}_{d=1,j=1}^{D,n_{d}}$ . Stacking all together, one could write down a linear mixed model (henceforth LMM) in matrix form:

$\displaystyle\bm{y}=\bm{X\beta}+\bm{Zu}+\bm{e},$ (1)

with $\bm{X}\in\mathbb{R}^{n\times p}$ , $\bm{Z}\in\mathbb{R}^{n\times q}$ being full column rank matrices for the fixed and the random part respectively. That is, $\bm{\beta}\in\mathbb{R}^{p+1}$ is a vector of fixed effects, and $\bm{u}\in\mathbb{R}^{q}$ a vector of random effects. Finally, vector $\bm{e}\in\mathbb{R}^{n}$ contains the individual deviations from the mean model. For convenience, $\bm{u}$ and $\bm{e}$ are assumed to be independent with $\bm{u}\overset{\textit{ind}}{\sim}_{q}{(\bm{0},\bm{G})}$ and $\bm{e}\overset{\textit{ind}}{\sim}_{n}{(\bm{0},\bm{R})}$ , i.e. so far without specifying the distributions. The covariance matrices $\bm{G}$ , $\bm{R}$ are assumed to have known structures although they may contain some unknown (co-variance) parameters stacked in $\bm{\theta}$ , see below.

The small area mixed effects model.

This structure is related to the next modelling step, which is to account for the wanted disaggregation along the small areas (or clusters) $d=1,\ldots,D$ . One specifies the model such that for each such area one can write

$\displaystyle\bm{y}_{d}=\bm{X}_{d}\bm{\beta}+\bm{Z}_{d}\bm{u}_{d}+\bm{e}_{d},d% =1,\dots,D,$ (2)

where $n_{d}$ is the number of units in area $d$ , $\bm{y}_{d}\in\mathbb{R}^{n_{d}}$ , $\bm{X}_{d}\in\mathbb{R}^{n_{d}\times(p+1)}$ and $\bm{Z}_{d}\in\mathbb{R}^{n_{d}\times q}$ . For the ease of presentation we start with a case in which $\bm{G}$ , $\bm{R}$ give a block diagonal covariance matrix (say, model LMMb). In Eq. (2), $D$ is the total number of domains, $\bm{\beta}\in\mathbb{R}^{p+1}$ an unknown common vector of regression coefficients, $\bm{u}_{d}\overset{\textit{ind}}{\sim}_{q_{d}}{(\bm{0},\bm{G}_{d})}$ and $\bm{e}_{d}\overset{\textit{ind}}{\sim}_{n_{d}}{(\bm{0},\bm{R}_{d})}$ . We have that $\bm{G}_{d}=\bm{G}_{d}(\bm{\theta})(q_{d}\times q_{d})$ and $\bm{R}_{d}=\bm{R}_{d}(\bm{\theta})(n_{d}\times n_{d})$ , depending on some (co-)variance parameters $\bm{\theta}=(\theta_{1},\linebreak\ldots,\theta_{h})^{t}$ . The LMM Eq. (1) can be easily retrieved applying the notation $\bm{y}=\mathrm{\textit{col}}_{1\leqslant d\leqslant D}\bm{y}_{d}=(\bm{y}_{1}^{% t},\dots,\bm{y}_{D}^{t})^{t}$ , $\bm{X}=\mathrm{\textit{col}}_{1\leqslant d\leqslant D}(\bm{X}_{d})$ and $\bm{Z}=\mathrm{\textit{col}}_{1\leqslant d\leqslant D}(\bm{Z}_{d})$ , $\bm{u}=\linebreak\mathrm{\textit{col}}_{1\leqslant d\leqslant D}\bm{u}_{d}$ , $\bm{e}=\mathrm{\textit{diag}}_{1\leqslant d\leqslant D}\bm{e}_{d}$ , $\bm{G}=\mathrm{\textit{diag}}_{1\leqslant d\leqslant D}\linebreak(\bm{G}_{d})$ , $\bm{R}=\mathrm{\textit{col}}_{1\leqslant d\leqslant D}(\bm{R}_{d})$ , $n=\sum_{d=1}^{D}n_{d}$ and $q=\linebreak\sum_{d=1}^{D}q_{d}$ . Under this setup one supposes that the variance-covariance matrix $\bm{V}$ is nonsingular $\forall\theta_{i}$ , $i=1,\dots,h$ and

$\displaystyle\mathbb{E}(\bm{y})=\bm{X\beta}\text{ and }$ (3) $\displaystyle\mathbb{V}\mathrm{ar}(\bm{y})=\bm{R}+\bm{ZGZ}^{t}=:\bm{V}(\bm{% \theta})=\bm{V}.$

Two important examples of Eq. (2) are extensively used in SAE: First, the Nested Error Regression Model (NERM) [9] given by the unit-level model

$\displaystyle y_{dj}=\bm{x}^{t}_{dj}\bm{\beta}+u_{d}+e_{dj},d=1,\dots,D,$ (4) $\displaystyle j=1,\dots,n_{d},$

where $y_{dj}$ is the quantity for the $j^{\text{th}}$ population unit in the $d^{\text{th}}$ area, $\bm{x}_{dj}=(1,x_{1,dj},\dots,x_{p,dj})^{t}$ , $u_{d}\overset{\textit{iid}}{\sim}(0,\sigma^{2}_{u})$ and $e_{dj}\overset{\textit{iid}}{\sim}(0,\sigma^{2}_{e})$ for $j=1,2,\dots,n_{d}$ . We have $\bm{y}_{d}=\linebreak(y_{d1},\dots,y_{dn_{d}})$ , $\bm{X}_{d}=\mathrm{\textit{col}}_{1\leqslant j\leqslant n_{d}}\bm{x}^{t}_{dj}$ , $q_{d}=1$ , $Z_{d}=\linebreak\bm{1}_{n_{d}}$ with $\bm{1}_{n_{d}}$ the $n_{d}$ vector of ones, $\bm{e}_{d}=(e_{d1},\dots,\linebreak e_{dn_{d}})^{t}$ , $\bm{\theta}=(\sigma_{e}^{2},\sigma_{u}^{2})^{t}$ , $\bm{R}_{d}(\bm{\theta})=\sigma_{e}^{2}\bm{I}_{n_{d}}$ with $\bm{I}_{n_{d}}$ the $n_{d}\times n_{d}$ identity matrix, and $\bm{G}_{d}(\bm{\theta})=\sigma_{u}^{2}$ .

The second example is the area-level Fay-Herriot Model (FHM) [10], given by

$\displaystyle y_{d}=\bm{x}^{t}_{d}\bm{\beta}+u_{d}+e_{d},d=1,\dots,D,$ (5)

where $\bm{x}_{d}=(1,x_{1,d},\dots,x_{p,d})^{t}$ , $u_{d}\overset{\textit{iid}}{\sim}N(0,\sigma^{2}_{u})$ and $e_{d}\overset{\textit{iid}}{\sim}N(0,\sigma^{2}_{e_{d}})$ with $\sigma^{2}_{e_{d}}$ ( $d=1,2,\dots,D$ ) being known. In this case, $n_{d}=q_{d}=1$ , $Z_{d}=1$ , $\bm{\theta}=\sigma^{2}_{u}$ , $\bm{R}_{d}(\sigma^{2}_{u})=\sigma^{2}_{e_{d}}$ .

The small area mixed effects estimator.

To become more specific, assume that a finite population $P$ of size $N$ is partitioned into $D$ sub-populations $P_{1},P_{2},\dots,P_{D}$ of sizes $N_{1},N_{2},\dots,N_{D}$ . With $Y$ being our random value of interest and $y_{dj}$ a realization of $Y$ in the $d^{\text{th}}$ area, where $j=1,\dots,N_{d}$ and $d=1\dots,D$ , let our target parameter be (for simplicity) the population mean of area $d$ . Clearly, this is defined as $\bar{Y}_{d}=N_{d}^{-1}\sum_{j=1}^{N_{d}}y_{dj}$ . Any extension to more complex linear combinations or proportions is almost straight forward. Extensions to other population parameters, including quantiles, is not straight but well studied. This holds in particular for the analysis of poverty and inequality [16]. Under the above introduced LMMb we can approximate the population mean of area $d$ by

$\displaystyle\mu_{d}=\bar{\bm{X}}^{t}_{d}\bm{\beta}+\bar{\bm{Z}}^{t}_{d}\bm{u}% _{d}$ (6)

where $\bar{\bm{X}}_{d}$ is a vector of known population means of the covariates for the $d^{\text{th}}$ area, and similarly $\bar{\bm{Z}}_{d}\in\mathbb{R}^{q}$ , $d=1\dots,D$ . Since $\bar{\bm{X}}_{d}$ and $\bar{\bm{Z}}_{d}$ can be replaced by any vector, Eq. (6) is an example of a general linear combination of fixed and random effects. It can be considered as our parameter of interest, no matter if under NERM or FHM. Under the former model, you draw a sample of size $n_{d}$ from $N_{d}$ elements in each area, observe values $\{y_{dj},\bm{x}_{dj}\}$ for $d=1,2,\dots,D$ and $j=1,2,\dots,n_{d}$ with $n=\sum_{d=1}^{D}n_{d}$ being the total number of units in the sample. Assume that there is no selection bias, and model Eq. (2.1) is valid for the population values. This holds for sampling designs which do not depend on the values of $\bm{y}$ , but only on $\bm{x}$ . It clearly includes the case of simple random sampling. If we do not have access to the units for the whole population, but obtain a direct estimator $\bar{y}_{d}$ of $\bar{Y}_{d}$ , then we can still use the second model, the FHM. That is, we model directly the area means as in Eq. (5) with $y_{d}\equiv\bar{y}_{d}$ , $\bm{x}_{d}\equiv\bar{\bm{X}}_{d}$ and $u_{d}\equiv\bar{\bm{Z}}_{d}^{t}\bm{u}_{d}$ with $\bar{\bm{Z}}_{d}=1$ . Often one explicitly supposes that the sampling fraction $f_{d}=n_{d}/N_{d}$ is negligible in the study.

[25] developed the best linear unbiased predictor (BLUP) of a linear combination of random effects $\bm{u}$ and fixed effects $\bm{\beta}$ for some known covariance matrix $\bm{V}$ . The BLUP estimator for our area means Eq. (6) is

$\displaystyle\tilde{\mu}_{d}=\bar{\bm{X}}^{t}_{d}\tilde{\bm{\beta}}+\bar{\bm{Z% }}^{t}_{d}\tilde{\bm{u}}_{d},$ (7)

where $\tilde{\bm{\beta}}=\tilde{\bm{\beta}}(\bm{\theta})=(\bm{X}^{t}\bm{V}^{-1}\bm{X% })^{-1}\bm{X}^{t}\bm{V}^{-1}\bm{y}$ , $\tilde{\bm{u}}_{d}=\linebreak\tilde{\bm{u}}_{d}(\bm{\theta})=\bm{G}_{d}\bm{Z}_% {d}\bm{V}_{d}^{-1}(\bm{y}_{d}-\bm{X}_{d}\tilde{\bm{\beta}})$ and $\bm{\theta}=(\theta_{1},\dots,\linebreak\theta_{h})^{t}$ . In practice $\bm{\theta}$ is unknown such that one has to use some estimates $\hat{\bm{\theta}}=\hat{\bm{\theta}}(\bm{y})$ giving the empirical BLUP (henceforth EBLUP)

$\displaystyle\hat{\mu}_{d}:=\hat{\mu}_{d}(\bm{\theta})=\bar{\bm{X}}^{t}_{d}% \hat{\bm{\beta}}+\bar{\bm{Z}}^{t}_{d}\hat{\bm{u}}_{d},\mbox{ with }$ (8) $\displaystyle\hat{\bm{\theta}}=(\hat{\theta}_{1},\dots,\hat{\theta}_{h})^{t}$

where $\hat{\bm{\beta}}=\hat{\bm{\beta}}(\hat{\bm{\theta}})$ , $\hat{\bm{u}}=\hat{\bm{u}}(\hat{\bm{\theta}})$ .2 Under some conditions on the distributions of the random effects and errors, as well as on the variance components $\bm{\theta}$ , [26] proved that the two-stage procedure provides an (unbiased) estimator of $\mu_{d}$ . We put ‘unbiased’ in parenthesis because this estimator is only unbiased when not conditioning on the area. In fact, given the area-level random effect, the $\hat{\mu}_{d}$ are biased for $d=1,\ldots,D$ , but the biases sum up to zero. Intuitively one could say that the bias is a consequence of reducing the $D$ unknown $\bm{u}_{d}$ to a random variable, and thereby reducing the estimation problem from $D$ unknown parameters to only $\sigma_{u}$ . For an LMM like ours, this actually gives a mixture of approximation and smoothing biases.

Remark 1. SAE techniques are suitable to account for non-response in $y_{dj}$ which is a very common issue in survey data. Suppose that the LMM (1) and its modelling assumptions hold, and that we are provided with $X$ for all units while some $Y$ values are missing (at random). Under this setting, using NERM, a small area mean is typically approximated by

$\displaystyle\mu^{m}_{d}=N^{-1}_{d}$ (9) $\displaystyle\left(\sum_{j=1+l_{d}}^{N_{d}}y_{dj}+\sum_{j=1}^{l_{d}}\{\bm{x}^{% t}_{dj}\bm{\beta}+u_{d}\}\right),$

where $l_{d}$ are missing units in area $d$ . If the variance parameters $\bm{\theta}$ are known, we can estimate $\mu^{m}_{d}$ applying BLUP. If further $\bm{\theta}$ is unknown, we can proceed with the two-stage estimation procedure and replace random and fixed effects in Eq. (2.1) by their EBLUP. In such a case of facing a finite population area mean, the expression in Eq. (2.1) contains an additional error term [27].

Remark 2. While it is often assumed to have fixed design, that is, complete information on $\bm{X}$ but possibly missing cases in $\bm{y}$ , in practice one deals with at least three different frameworks, namely (a) the in Remark 1 discussed which in the SAE literature is probably the most common one, (b) almost as before, but even the assumed complete information (on $X$ ) is actually a sample, though official data,3 (c) one has a sample, but with data that are (almost) complete also in the response variable $Y$ . In the last situation Eq. (6) is replaced by

$\displaystyle\mu^{s}_{d}=\bar{\bm{x}}^{t}_{d}\bm{\beta}+\bar{\bm{z}}^{t}_{d}% \bm{u}_{d}$ (10)

with $\bar{\bm{x}}_{d}=(1,\bar{\bm{x}}_{d1},\dots,\bar{\bm{x}}_{dp})$ , $\bar{\bm{x}}_{d}=\frac{1}{n_{d}}\sum_{j=1}^{n_{d}}\bm{x}_{dj}$ and $\bar{\bm{z}_{d}}\equiv\bar{\bm{Z}_{d}}$ . In case of (b), the estimator in Eq. (2.1) is replaced by

$\displaystyle\mu^{sm}_{d}=n^{-1}_{d}$ (11) $\displaystyle\left(\sum_{j=1+l_{d}}^{n_{d}}y_{dj}+\sum_{j=1}^{l_{d}}\{\bm{x}^{% t}_{dj}\bm{\beta}+\bm{z}^{t}_{dj}\bm{u}_{d}\}\right),$

Obviously, the usage of Eqs (6) and (2.1) or Eqs (10) and (2.1) is strongly related to the data availability. When we have access to the covariate information for the entire population, then Eqs (6) and (2.1) are preferable since they are more stable and less biased. Nevertheless, very often the requirement to have all population units is very restrictive, and practitioners are liable to use survey data only. For notation and presentation it is probably the easiest to think of Eq. (10) in the following. Otherwise one would have to introduce more notation.

2.2 Notes on the MSE

An estimate or prediction should be accomplished by a measure of uncertainty. To assess its variability, the mean squared error $\mathrm{MSE}[\hat{\mu}_{d}]=\mathbb{E}[\hat{\mu}_{d}-\mu_{d}]$ has become the most common measurement of the uncertainty in mixed models. Here, $\mathbb{E}$ denotes the expectation with respect to the model Eq. (2). It is not hard to see that we can decompose the MSE into

$\displaystyle\mathrm{MSE}[\hat{\mu}_{d}]=\mathrm{MSE}[\tilde{\mu}_{d}]+\mathbb% {E}[\hat{\mu}_{d}-\tilde{\mu}_{d}]^{2}$

(12) $\displaystyle{}+2\mathbb{E}[(\tilde{\mu}_{d}-\mu_{d})(\hat{\mu}_{d}-\tilde{\mu% }_{d})].$

$\mathrm{MSE}[\tilde{\mu}_{d}]$ accounts for the variability of $\mu_{d}$ when the variance components $\bm{\theta}$ are known. For our situation, defining $\bm{m}_{d}^{t}=\bar{\bm{X}}^{t}_{d}-\bm{a}_{d}^{t}\bm{X}_{d}$ with $\bm{a}_{d}=\bar{\bm{Z}}^{t}_{d}\bm{GZ}^{t}_{d}\bm{V}^{-1}_{d}$ , it is given by

$\displaystyle\bar{\bm{Z}}^{t}_{d}(\bm{G}_{d}-\bm{G}_{d}\bm{Z}^{t}_{d}\bm{V}_{d% }^{-1}\bm{Z}_{d}\bm{G}_{d})\bar{\bm{Z}}_{d}+\bm{m}_{d}^{t}$ (13) $\displaystyle\left(\sum_{d=1}^{D}\bm{X}^{t}_{d}\bm{V}_{d}\bm{X}_{d}\right)^{-1% }\bm{m}_{d}=:g_{1d}(\bm{\theta})+g_{2d}(\bm{\theta}).$

Under normality, mixed models can be fitted by maximum likelihood (ML) or residual maximum likelihood (REML) methods. Then, the last term in Eq. (2.2) is zero and therefore not further considered.

The exact expression of MSE does not exist, because the empirical predictors are not linear statistics due to the estimation of the variance components $\bm{\theta}$ . For this reason, the two last terms in Eq. (2.2) are intractable such that one has to approximate them. Using some additional technical assumptions, one relies on the linearisation and large sample techniques to approximate all unknown quantities. [28] provided a proposal, [18] tried to improve on their results studying second-order accuracy of models with block diagonal matrices in Eq. (2). [19] derived approximations for the models with estimated variance components, and [20] developed further expansions for general LMM. The second-order unbiased $\mathrm{MSE}[\hat{\mu}_{d}]$ estimator obtained by applying the fitting-of-constants and REML methods is, cf Eq. (2.2),

$\displaystyle\mathrm{mse}_{L}[\hat{\mu}_{d}]\approx g_{1d}(\hat{\bm{{\theta}}}% )+g_{2d}(\hat{\bm{\theta}})+2g_{3d}(\hat{\bm{\theta}}),$ (14) $\displaystyle g_{3d}(\bm{\theta})=\text{tr}[(\partial\bm{a}_{d}^{t}/\partial% \bm{\theta})\bm{V}_{d}(\partial\bm{a}_{d}^{t}/\partial\bm{\theta})^{t}\ \bm{V}% _{A}]$

with asymptotic covariance matrix $\bm{V}_{A}=\mathbb{E}[(\hat{\bm{\theta}}-\bm{\theta})(\hat{\bm{\theta}}-\bm{% \theta})^{t}]$ , such that $\mathbb{E}\{\mathrm{mse}_{L}[\hat{\mu}_{d}]\}=\mathrm{MSE}[\hat{\mu}_{d}]+o(D^% {-1})$ . [18] provided simplified expressions that accounted for uncertainty in NERM Eq. (2.1) and FHM Eq. (5).

Similar analytical approximations have been obtained in case of more general non-linear mixed and linear multivariate models by [29, 30]. Linearisation based techniques are theoretically sound. Certainly, they are fully model and estimator dependent, i.e. for each modification a new approximation might be necessary. More importantly, they are restricted to linear parameters and their corresponding EBLUP. Therefore, different bootstrapping schemes have been proposed as alternatives. Some depend on normality assumptions. For parametric bootstrap approaches see [31, 21]. [32] suggested resampling with replacement from the variance-inflated errors and random effects. In contrast, [33] advised to use a wild bootstrap, see further below.
3. The time dimension

When it comes to fore- or nowcasting for small clusters, then one has to think about the inclusion of the time dimension in the SAE model. Before discussing this, we should note that there exists already a huge literature on mixed models for longitudinal data in biometrics and related fields [34, 35]. However, this literature focuses on estimation and prediction problems that are quite different from ours. It does not mean we couldn’t learn much from that literature, but that would clearly be beyond the scope of our article. There is also some literature on SAE with longitudinal data, but looking at direct and indirect estimators which are very different from the here introduced approaches.

In order to continue in the same line as the previous sections, we consider extensions of LMM, but now also modelling their dependence structure. It is convenient to keep in mind that we are usually thinking of large $D$ and small $T$ . The presently existing area-level models with time dimensions follow mainly [4], with extensions for different applications to predict, for instance, unemployment [36] or poverty rates [37, 5]. Regarding unit-level models, we refer to [11], who deal with temporal extensions of the NERM. So far, the developments concentrate on the area-level FHM, mainly for the sake of data-availability. This holds also true for extensions that we will not discuss in this article, like the LMM with time-space dependencies [38]. Without loss of generality, we therefore concentrate here on area-level models too. We first introduce models with independent time effects, then models with time dependence in the random effects. Repeating such development for unit-level models is above all a notational challenge.

3.1 Accounting for time effects

Recall our FHM from Eq. (5), and let us extend it in two ways: First, we allow all terms except the fixed effects $\bm{\beta}$ to vary over time; and second, we then re-introduce a time-independent area effect, say $v_{d}\overset{\textit{ind}}{\sim}N\linebreak(0,\sigma^{2}_{v})$ for all areas $d$ . This results in

$\displaystyle y_{dt}=\bm{x}^{t}_{dt}\bm{\beta}+v_{d}+u_{dt}+e_{dt},$

(15) $\displaystyle d=1,\dots,D,t=1,\ldots,T$

in which $\bm{x}_{dt}=(1,x_{dt1},\dots,x_{dtp})^{t}$ . As before, $u_{dt}\overset{\textit{iid}}{\sim}\linebreak N(0,\sigma^{2}_{u})$ , $e_{dt}\overset{\textit{iid}}{\sim}N(0,\sigma^{2}_{e_{dt}})$ with $\sigma^{2}_{e_{dt}}$ ( $d=1,2,\dots,\linebreak D$ ) being known, and all three random effects independent from each other as well as independent from $\bm{x}_{dt}$ . Alternatively, you may think of Eq. (3.1) as the original FHM, repeatedly observed and adding $u_{dt}$ . In any case, the idea is to allow for more flexibility when modelling the pure area effects, being now $v_{d}+u_{dt}$ , $d=1,\ldots,D$ .

If we stack the variables such that we first group them over time, then over areas, i.e. $\bm{y}:=(\bm{y}^{t}_{1},\ldots,\bm{y}^{t}_{D})^{t}$ with $\bm{y}_{d}:=({y}_{d1},\ldots,{y}_{dT})^{t}$ , and all other variables accordingly, then $\bm{G}=\sigma_{u}^{2}\bm{I}_{DT}$ , $\bm{R}=\textit{diag}\{\bm{R}_{d}\}_{d=1}^{D}$ with $\bm{R}_{d}=\textit{diag}\{\sigma^{2}_{e_{dt}}\}_{t=1}^{T}$ . Defining the $TD\times TD$ matrix $\bm{Z}_{v}:=\textit{diag}\{\bm{1}_{T}\}_{d=1}^{D}$ we can write the variance structure as $\bm{V}:=\bm{V}(\bm{\theta})=\mathbb{V}\mathrm{ar}(\bm{y})=\sigma^{2}_{v}\bm{Z}% _{v}\bm{Z}_{v}^{t}+\bm{G}+\bm{R}$ . Let us define $\bm{V}_{d}:=\mathbb{V}\mathrm{ar}(\bm{y}_{d})$ , and $\bm{Z}=[\bm{Z}_{v},\bm{I}_{DT}]$ accordingly. The unknown parameter of this variance matrix is $\bm{\theta}=(\sigma^{2}_{v},\sigma^{2}_{u})$ which again, can be estimated by residual maximum likelihood (REML). It should be noted that in practice the knowledge of the $\sigma^{2}_{e_{dt}}$ is either based on information like sampling weights, the number of units per area and time, or it is pre-estimated like by the means of direct estimators and/or historical data.

Not surprisingly, the BLUPs of $\bm{\beta}$ and $\bm{\nu}=(\bm{v}^{t},\bm{u}^{t})^{t}$ look like they did before, namely4

$\displaystyle\tilde{\bm{\beta}}=(\bm{X}^{t}\bm{V}^{-1}\bm{X})^{-1}\bm{X}^{t}% \bm{V}^{-1}\bm{y}$ (16) $\displaystyle=\left(\sum_{d=1}^{D}\bm{X}^{t}_{d}\bm{V}^{-1}_{d}\bm{X}_{d}% \right)^{-1}\sum_{d=1}^{D}\bm{X}^{t}_{d}\bm{V}_{d}^{-1}\bm{y}_{d}$ $\displaystyle\tilde{\bm{\nu}}=\bm{V}_{\nu}\bm{Z}^{t}\bm{V}^{-1}(\bm{y}-\bm{X}% \tilde{\bm{\beta}})\mbox{ with }$ (17) $\displaystyle\bm{V}_{\nu}=\textit{diag}(\sigma_{v}^{2}\bm{I}_{D},\sigma_{u}^{2% }\bm{I}_{DT}).$

For obtaining feasible EBLUPs, one has to substitute estimates $(\hat{\sigma}_{v}^{2},\hat{\sigma}_{u}^{2})$ for the unknown variances. As said, these can be obtained by REML. Under the commonly applied normality assumption one has to maximize

$\displaystyle-(DT-p)\log(2\pi)+\log|\bm{X}^{t}\bm{X}|-\log|\bm{V}|$ $\displaystyle-\log|\bm{X}^{t}\bm{V}^{-1}\bm{X}|-\bm{y}^{t}\bm{P}\bm{y}\mbox{ % where }$ (18) $\displaystyle\bm{P}:=\bm{V}^{-1}-\bm{V}^{-1}\bm{X}(\bm{X}^{t}\bm{V}^{-1}\bm{X}% )^{-1}\bm{X}^{t}\bm{V}^{-1}.$

For more information on estimation algorithms, see [5] and the corresponding implementations in the R package saery.

The parameter prediction of interest is obtained from the EBLUPs by

$\displaystyle\hat{\mu}_{dt}=1_{dt}(\bm{X}\hat{\bm{\beta}}+\bm{Z}\hat{\bm{\nu}}% )=\bm{x}_{dt}^{t}\hat{\bm{\beta}}+\hat{v}_{d}+\hat{u}_{dt}$ (19)

with $1_{dt}$ a vector of length $D T$ with 1 at position $t+(d-1)T$ but zeros elsewhere. Again, the particular difficulty of its use is the estimation of a measure of variability. As before, the most common measure is the MSE, which can be decomposed like in Eqs (2.2) to (2.2), and estimated by plug-in or bootstrap methods. More specifically, recall decomposition

$\displaystyle\textit{MSE}(\hat{\mu}_{dt})=g_{1}(\bm{\theta})+g_{2}(\bm{\theta}% )+g_{3}(\bm{\theta})$ (20)

with $g_{1}(\bm{\theta})=1_{dt}^{t}\bm{ZPZ}^{t}1_{dt}$ for $\bm{P}=\bm{Z}-\bm{V}_{\nu}\bm{Z}^{t}\bm{V}^{-1}\linebreak\bm{ZV}_{\nu}$ , and set $\bm{a}^{t}=1_{dt}^{t}\bm{ZV}_{\nu}\bm{Z}^{t}\bm{Z}^{-1}$ to get

$\displaystyle g_{2}(\bm{\theta})=(1_{dt}^{t}\bm{X}-1_{dt}^{t}\bm{Z}\bm{P}\bm{Z% }^{t}\bm{R}^{-1}\bm{X})$ $\displaystyle\quad(\bm{X}^{t}\bm{V}^{-1}\bm{X})^{-1}(\ldots)^{t}$ $\displaystyle g_{3}(\bm{\theta})=\text{tr}\{(\partial\bm{a}^{t}/\partial\bm{% \theta})\bm{V}(\partial\bm{a}^{t}/\partial\bm{\theta})^{t}\mathbb{E}$ $\displaystyle\quad[(\hat{\bm{\theta}}-\bm{\theta})(\hat{\bm{\theta}}-\bm{% \theta})^{t}]\}$

with $(\ldots)^{t}$ referring to the first term in parenthesis (but transposed). Like for the LMMb one can derive an estimator for the MSE Eq. (20) as in Eq. (2.2). For detailed derivations we again refer to [14] as these go over several pages and are clearly beyond the scope of this review. The bootstrap estimation of this MSE works exactly as outlined in the previous section. This will change in the case when allowing for time dependence.
3.2 Extension to time dependence

Recall that we are interested in predicting $\mu_{dt}$ . Consequently, researchers are more inclined to increase the modelling effort on the area random effects than on the $e_{dt}$ . In this spirit, it is natural to consider an appropriate time series modelling of $u_{dt}$ . This is illustrated along the popular approach of supposing an autocorrelation of lag one, i.e. an AR(1) structure:

$\displaystyle u_{dt}=\rho u_{d(t-1)}+\varepsilon_{dt}\mbox{ for all }d=1,% \ldots,D,$

(21) $\displaystyle t=1,\ldots,T\mbox{ with }|\rho|<1$

where $\varepsilon_{dt}\overset{\textit{ind}}{\sim}N(0,\sigma^{2}_{\varepsilon})$ such that $\sigma^{2}_{u}=\sigma_{\varepsilon}^{2}/(1-\rho^{2})$ and $\bm{G}=\mathbb{V}ar(\bm{u})=\sigma_{\varepsilon}^{2}\Omega$ with

$\displaystyle\Omega=\textit{diag}\{\Omega_{d}\}_{d=1}^{D},$ (22) $\displaystyle\Omega_{d}=\frac{1}{1-\rho^{2}}\left(\begin{array}[]{ccccc}1&\rho% &\cdots&\rho^{T-2}&\rho^{T-1}\\ \rho&1&\ddots&&\rho^{T-2}\\ \vdots&\ddots&\ddots&\ddots&\vdots\\ \rho^{T-2}&&\ddots&1&\rho\\ \rho^{T-1}&\rho^{T-2}&\cdots&\rho&1\end{array}\right).$

This is denoted by $(u_{d1},\ldots,u_{d(T)})\sim AR1(0,\hat{\sigma}^{2}_{\epsilon},\linebreak\hat{% \rho})$ . It implicitly defines also a new $\bm{V}_{\nu}=\textit{diag}\{\sigma_{v}^{2}\bm{I}_{D},\linebreak\bm{G}\}$ , so that our variance parameter $\bm{\theta}$ gets enlarged by one more parameter, i.e. $\bm{\theta}=(\sigma_{v}^{2},\sigma_{\varepsilon}^{2},\rho)^{t}$ . However, essentially all other formulas introduced in the last subsection remain unchanged, namely those for the EBLUPs $\hat{\bm{\beta}}$ , $\hat{\bm{\nu}}$ , the MSE decomposition Eq. (20) with its terms, and even the REML expression Eq. (3.1). Certainly, when it comes to the explicit estimation, the algorithms get quite complex. Therefore, for details we must refer to [5].

As said, bootstrap methods offer a quite attractive alternative to estimate the MSE or prediction intervals. While the general principle is always the same, one has now to produce bootstrap time-varying area effects $\bm{u}_{d}^{}$ that are $N(\bm{0},\hat{\bm{G}})$ . So they have to follow the assumed autocorrelation structure, an AR(1) structure in our case. With $\bm{e}^{}\sim N(\bm{0},\bm{R})$ (recall that $\bm{R}$ is assumed to be known) and $\bm{v}^{*}\sim N(\bm{0},\hat{\sigma}_{v}^{2}\bm{I}_{D})$ , we can generate $B$ bootstrap samples

$\displaystyle y_{dt}^{b}=\bm{x}^{t}_{dt}\bm{\beta}+v_{d}^{b}+u_{dt}^{b}+e_{dt}% ^{b},$ (23) $\displaystyle d=1,\dots,D,t=1,\ldots,T,b=1,\ldots,B.$

From these, one obtains $B$ predictions $\hat{\mu}_{dt}^{b}$ for all $D$ areas and $T$ periods. This bootstrap predictions allow us to construct empirical prediction intervals for our $\hat{\mu}_{dt}$ . The functioning has been proven theoretically, and was checked by simulations in the above cited literature.

We conclude with the remark that extensions to other time dependence structures like AR(q) with $q>1$ , moving average (MA) or the combination, i.e. an ARMA structure are thinkable. The three main points to be (re-)considered then are, first the adaptation of $\bm{G}=\mathbb{V}ar(\bm{u})$ in Eq. (22), second the adjustment of the bootstrap procedure, and third the decomposition of the random area effect(s) $v_{d}+u_{dt}$ .
4. Nowcasting approaches

We do not know, to what extent the above outlined methods have already been used for nowcasting in small areas. Yet, to the best of our knowledge, this possibility has so far not been studied in the academic literature. We discuss here the most obvious extensions of the presented methods to such kind of problems. As said in the last section, for the here considered context, it is more natural to think of data with large $D$ but small $T$ instead of few, though long time series data. We are not saying that the former case is generally more frequent than the latter, but in the latter case one would tend to consider fixed area/cluster effects, maybe combined with random time effects. We are not saying that the above methodology could not be adapted to such situations, but outlining this would clearly go beyond the scope of this article. In order to be consistent with the above notation, let us suppose that $\{\bm{y}_{t},\bm{x}_{t}\}_{t=1}^{T}$ are observed, and we need to nowcast our parameter of interest which we denoted by $\mu_{d(T+1)}$ .

For nowcasting $\mu_{d(T+1)}$ we need to first nowcast the $y_{d(T+1)}$ for which in turn you need to predict the $u_{d(T+1)}$ . The natural procedure is to use Eq. (3.2) with $\varepsilon_{d(T+1)}=0$ for all $d$ . ‘Most obvious extensions’ within an area level model (i.e. $y_{d(T+1)}$ completely unknown) are for example those strategies that essentially take Eq. (19) but write $y_{d(t+1)}$ on the left-, and $(u_{d(t+1)},e_{d(t+1)})$ on the right-hand side. As else the EBLUP and MSE estimation doesn’t change, for area-level models one can concentrate in the following on the discussion of designing (or selecting) $\bm{x}_{dt}$ . For example, without information about $\bm{x}_{d(T+1)}$ , nowcasting would become a model-based one-step forecasting problem.

It might be worth to develop an improved nowcasting method for the situation when in some of the $D$ areas information $y_{d(T+1)}$ is already observed. It is clear that this could improve in particular on the prediction of the $\hat{u}_{d(T+1)}$ of those areas; therefore, its usefulness depends more on how realistic such situation is. This is different for unit-level models. There it is more realistic to assume that for some units (in some areas) such information is timely available. Then the next step is to extend the existing methods presented in Section 3 to the unit-level model. This is tedious but technically not that hard.

Before discussing the design of the $\bm{x}_{dt}$ , let us mention that SAE data are often low frequency data; in case one has quarterly data or even higher frequencies, indicators for capturing seasonality are to be included. Moreover, in case of higher frequencies, data may refer to different, often rotating cohorts. This is to be modelled too, see [36].

Variable selection:

When one talks about the right design for $\bm{x}_{dt}$ , then in SAE this is (almost) always about finding the right set of variables and/or transformations to get the best predictors for $y_{dt}$ . In the context of SAE with LMM, the discussion in [12] was one of the first that considered such kind of model selection. Based on maximum likelihood approaches, he essentially followed the ideas of [39] proposing the conditional Akaike criterion (cAIC) as objective function. [40] followed a different approach for the FHM; they derived bootstrap-based bias corrections to the Akaike information criterion as well as to the Kullback symmetric divergence criterion. Other approaches were worked out and further developed by [41] and applied in [42] for poverty estimation in small areas. It is clear that their ideas carry over to now- or forecasting almost immediately. This holds true as long as the own prediction procedure relies on (quasi) maximum likelihood procedure for which an appropriate generalised degrees of freedom (gDF) measure exists. We are not presenting details because the final criterion depends on the likelihood and gDF measure chosen. Notice further that research on model selection in this context is rather recent and still to be studied more in detail. Instead, we discuss interesting sets of predictors.

Finally, different (sub-)sets of predictors that potentially enter the nowcasting model are (un-)available in different periods, i.e. are available or missing on a non-systematic manner. Consequently, a consistent set of predictors is rarely available. This induces a changing database, which requires to update the design for nowcasting on a regular basis. This makes a well functioning, easy handleable variable selection procedure so important.

Nowcasting with time series of $\bm{y}_{t}$ :

An evident choice of variables to be included in $\bm{x}_{dt}$ are the past observations of $y_{d}$ , namely $y_{dk}$ , $k\leqslant t$ . The difficulty is to find the right trade-off, because enlarging the number of lags reduces the number of equations one can include ( $T$ gets reduced). On the one hand, this problem is standard in forecasting with time series, so that one could borrow ideas from that literature. On the other hand, it is natural to assume the same order of lags for modelling the stochastic process of the area effects $\bm{u}_{d}$ , which in turn complicates the variance structure, recall Eq. (22), and thereby our EBLUP and MSE estimation. This suggests to keep the order of lags small. For the case when using only one lag, the above formulas can be applied directly.

Nowcasting for known $\bm{x}_{t+1}$ :

In theory, nowcasting really refers to ‘now’ and therefore a time point for which most likely no data are available. In practice, however, nowcasting often refers to ‘contemporary’ or just ‘timely’ in the sense of ‘without much delay’. In those cases, it is quite plausible to assume that at $T+1$ when imputing $y_{d(T+1)}$ , information on several potential predictors, say $X_{j,d}$ , $j=1,\ldots,p_{1}<p$ , is already available. This would suggest to return to our original notation, i.e. that of Eq. (19). Now, $x_{j,dt}$ refers to period $t$ for $j=1,\ldots,p_{1}$ , but to period $(t-1)$ for $j=p_{1}+1,\ldots,p$ , as for these predictors more recent information is not yet available. Certainly, the latter should include $y_{d(t-1)}$ . You also may want to include some further time lags of some predictors – but this could again reduce the number of equations you can include in the estimation. Then, once we have calculated all EBLUPs and parameter estimates from sample $\{y_{dt},\bm{x}_{dt}\}_{t=1,d=1}^{T,D}$ , the $y_{d(T+1)}$ are imputed by Eq. (19) with $\hat{\bm{\beta}}$ , $\hat{v}_{d}$ , and $\hat{u}_{d(T+1)}=\hat{\rho}\cdot\hat{u}_{dT}$ for all $d=1,\ldots,D$ . Prediction intervals can be approximated by bootstrap procedure Eq. (3.2) from Section 3.2. It is clear that in practice this idea is applied more flexibly by including predictors with a time lag that is feasible and has maximal prediction power. The key problem is then the variable selection – which for this reason we put in front of our discussion.

4.1 An algorithm for SAE nowcasting

The EBLUP Eq. (19) and the MSE Eq. (20) are used to predict mixed parameters and to measure the quality of the predictions in the target domains. In addition, they give information about how the selected temporal FHM fits to the data $(y_{dt},\bm{x}_{dt}^{t})$ , $d=1,\ldots,D$ , $t=1,\ldots,T$ , of the investigated period. The EBLUP can also be adapted to predict the values of the target mixed parameters $\mu_{d(T+1)}$ , $d=1,\ldots,D$ , under the FHM with the assumed AR(1) correlation structure (AR(1)-FHM) defined in Eqs (3.2) and (22). We introduce a prediction algorithm by assuming that the set of auxiliary variables $\bm{x}_{d(T+1)}$ , $d=1,\ldots,D$ , is already observed or can be constructed by time series procedures assuming some prediction scenarios. For the sake of brevity, we do not consider all the prediction setups considered in Section 4. Nevertheless, the basic ideas can be adapted to them. The Algorithm A is

Fit the AR(1)-FHM to the data $(y_{dt},\bm{x}_{dt}^{t})$ , $d=1,\linebreak\ldots,D$ , $t=1,\ldots,T$ . Calculate $\hat{\bm{\phi}}=(\hat{\bm{\beta}}^{\prime},\hat{\sigma}^{2}_{v},\hat{\sigma}^{% 2}_{\epsilon},\linebreak\hat{\rho})$ and $\hat{v}_{d}$ , $d=1,\ldots,D$ . Obtain the preliminary predictions $\tilde{\mu}_{d(T+1)}=\bm{x}_{d(T+1)}^{t}\hat{\bm{\beta}}+\hat{v}_{d}$ , $d=\linebreak 1,\ldots,D$ .

Fit the AR(1)-FHM to the data $(y_{dt},\bm{x}_{dt}^{t})$ , $(\tilde{\mu}_{d(T+1)},\bm{x}_{d(T+1)}^{t})$ , $d=1,\ldots,D$ , $t=1,\ldots,T$ . Calculate the new estimators of model parameters, and the new predictors of the random effects. Apply Eq. (19) to obtain predictors $\hat{\mu}_{d(T+1)}$ , $d=1,\ldots,D$ .

Algorithm A can be programmed by using the function fit.saery from the R package saery, see [43]. For estimating the MSEs of the out-of-sample predictors, we propose the following parametric bootstrap Algorithm B.

Fit the model to the data $(y_{dt},\bm{x}_{dt}^{t})$ , $d=1,\ldots,D$ , $t=1,\ldots,T$ . Calculate $\hat{\bm{\phi}}=(\hat{\bm{\beta}}^{\prime},\hat{\sigma}^{2}_{v},\hat{\sigma}^{% 2}_{\epsilon},\hat{\rho})$ .

Repeat $B$ times ( $b=1,\ldots,B$ ):

For $d=1,\ldots,D$ , generate $v_{d}^{*(b)}\overset{\textit{ind}}{\sim}N(0,\hat{\sigma}^{2}_{v})$ , $(u_{d1}^{*(b)},\ldots,u_{d(T+1)}^{*(b)})\overset{\textit{ind}}{\sim}AR1(0,\hat% {\sigma}^{2}_{\epsilon},\hat{\rho})$ and $e_{dt}^{*(b)}\overset{\textit{ind}}{\sim}N(0,\sigma^{2}_{dt})$ , $t=1,\ldots,T+1$ .

For $d=1,\ldots,D$ , calculate the theoretical bootstrap means $\mu_{dt}^{*(b)}=\bm{x}_{dt}^{t}\hat{\bm{\beta}}+v_{d}^{*(b)}+u_{dt}^{*(b)}$ , $t=1,\ldots,T+1$ , and target values $y_{dt}^{*(b)}=\mu_{dt}^{*(b)}+e_{dt}^{*(b)}$ , $t=1,\ldots,T$ .

Apply Algorithm A to the data $(y_{dt}^{*(b)},\bm{x}_{dt}^{t})$ , $d=1,\ldots,D$ , $t=1,\ldots,T$ , to obtain predictors $\hat{\mu}_{d(T+1)}^{*(b)}$ , $d=1,\ldots,D$ .

The output we obtain is:

$\displaystyle\textit{mse}^{*}(\hat{\mu}_{dt})=\frac{1}{B}\sum_{b=1}^{B}$

(24) $\displaystyle\left(\hat{\mu}_{d(T+1)}^{(b)}-\mu_{d(T+1)}^{(b)}\right)^{2},d=% 1,\ldots,D.$

It is obvious that for constructing prediction intervals for any of these nowcasts, the above sketched bootstrap procedure offers the probably most attractive method. We conclude this section by recalling that depending on the frequencies of the considered data, the above proposed prediction methods can be completed with standard procedures (known from time series, panel data or longitudinal data analysis) that account for time trends and seasonality in the deterministic parts of the equations.
5. Simulation studies

This section presents two simulation experiments for investigating the nowcasting Algorithm A when the model parameters are estimated by REML. It also investigates the parametric bootstrap algorithm for estimating the MSEs of the out-of-sample predictors. For $d=1,\ldots,D$ , $t=1,\ldots,T+1$ , the explanatory and dependent variables are always set to

$\displaystyle x_{dt}=\frac{1}{5}(b_{dt}-a_{dt})U_{dt}+a_{dt},$ $\displaystyle U_{dt}=\frac{t}{T+1},a_{dt}=1,$ $\displaystyle b_{dt}=1+\frac{1}{D}(T(d-1)+t),$ $\displaystyle y_{dt}=\beta_{1}+\beta_{2}x_{dt}+v_{d}+u_{dt}+e_{dt},$ $\displaystyle\beta_{1}=0,\beta_{2}=1,$

where $v_{d}\sim N(0,\sigma_{1}^{2})$ with $\sigma_{1}^{2}=0.25$ , and $e_{dt}\sim N\linebreak(0,\sigma_{dt}^{2})$ with $\sigma_{dt}^{2}=0.25$ . For $d=1,\ldots,D$ , generate the AR(1)-correlated random effects

$\displaystyle u_{d1}=(1-\phi^{2})^{-1/2}\varepsilon_{d1},u_{dt}=\phi u_{dt-1}+% \varepsilon_{dt},$ $\displaystyle t=2,\ldots,T+1,$

where $\varepsilon_{dt}\sim N(0,\sigma_{2}^{2})$ , $t=1,\ldots,T+1$ , with $\sigma_{2}^{2}=0.25$ and $\phi=0.5$ . Note that we consider the complex situation of extrapolating for doing nowcasting. Certainly, many other scenarios can be imagined and would be interesting to be studied.

5.1 Simulation 1: Performance of the nowcast

The aim of Simulation 1 was to study the numerical performance of the nowcasted parameter itself. In order to evaluate the numerical performance we calculated the average prediction biases and the prediction mean squared errors, see steps 2 and 3. The entire study consisted of the following steps.

Repeat $I=10^{4}$ times ( $i=1,\ldots,I$ )

1.1.
Generate a sample of size $D(T+1)$ and calculate $\mu_{dt}^{(i)}=\beta_{1}+\beta_{2}x_{dt}+v_{d}^{(i)}+u_{dt}^{(i)}$ , $d=1,\ldots,D$ , $t=1,\ldots,T+1$ .
1.2.
Fit the model to the data $(y_{dt}^{(i)},\bm{x}_{dt}^{\prime})$ , $d=1,\ldots,D$ , $t=1,\ldots,T$ . Calculate $\tilde{\tau}^{(i)}\in\{\tilde{\beta}_{1}^{(i)},\tilde{\beta}_{2}^{(i)},\tilde{% \sigma}_{1}^{2(i)},\tilde{\sigma}_{2}^{2(i)},\tilde{\phi}^{(i)}\}$ by REML.Calculate the predictors of the random effects $\tilde{v}_{d}^{(i)}$ and $\tilde{u}_{dt}^{(i)}$ , $d=1,\ldots,D$ , $t=1,\ldots,T$ . For the current and future time instants, $T$ and $T+1$ respectively, calculate

the EBLUPs $\tilde{\mu}_{dT}^{(i)}=\bm{x}_{dT}^{\prime(i)}\tilde{\bm{\beta}}^{(i)}+\tilde{% v}_{d}^{(i)}+\tilde{u}_{dT}^{(i)}$ , $d=1,\ldots,D$ , and

the preliminary predictions $\tilde{\mu}_{d(T+1)}^{(i)}=\bm{x}_{d(T+1)}^{\prime(i)}\tilde{\bm{\beta}}^{(i)}% +\tilde{v}_{d}^{(i)}$ , $d=1,\ldots,D$ .

1.3.
Fit the AR(1)-FHM to the data $(y_{dt}^{(i)},\bm{x}_{dt}^{\prime})$ , $(\tilde{\mu}_{d(T+1)}^{(i)},\bm{x}_{d(T+1)}^{\prime})$ , $d=1,\ldots,D$ , $t=1,\ldots,T$ . By applying REML, calculate the new estimators of model parameters $\hat{\tau}^{(i)}\in\{\hat{\beta}_{1}^{(i)},\hat{\beta}_{2}^{(i)},\hat{\sigma}_% {1}^{2(i)},\hat{\sigma}_{2}^{2(i)},\hat{\phi}^{(i)}\}$ . Calculate the new predictors of random effects $\hat{v}_{d}^{(i)}$ , $\hat{u}_{dt}^{(i)}$ , $d=1,\ldots,D$ , $t=1,\ldots,T+1$ . For the future parameters in $T+1$ calculate

the EBLUPs $\hat{\mu}_{d(T+1)}^{(i)}=\bm{x}_{d(T+1)}^{\prime(i)}\hat{\bm{\beta}}^{(i)}+% \hat{v}_{d}^{(i)}+\hat{u}_{d(T+1)}^{(i)}$ , $d=1,\ldots,D$ .

For the predictor $\hat{\mu}_{d}\in\{\hat{\mu}_{d(T+1)},\tilde{\mu}_{dT}\}$ of $\mu_{d}\in\{\mu_{d(T+1)},\mu_{dT}\}$ , $d=1,\ldots,D$ , calculate the absolute performance measures

$\displaystyle\textit{BIAS}_{d}=\frac{1}{I}\sum_{i=1}^{I}(\hat{\mu}_{d}^{(i)}-% \mu_{d}^{(i)}),$ $\displaystyle\textit{RMSE}_{d}=\left(\frac{1}{I}\sum_{i=1}^{I}(\hat{\mu}_{d}^{% (i)}-\mu_{d}^{(i)})^{2}\right)^{1/2},$ $\displaystyle\mu_{d}=\frac{1}{I}\sum_{i=1}^{I}\mu_{d}^{(i)}.$ $\displaystyle\textit{ABIAS}=\frac{1}{D}\sum_{d=1}^{D}|\textit{BIAS}_{d}|,$ $\displaystyle\textit{RMSE}=\frac{1}{D}\sum_{d=1}^{D}\textit{RMSE}_{d}.$

For the predictor $\hat{\mu}_{d}\in\{\hat{\mu}_{d(T+1)},\tilde{\mu}_{dT}\}$ of $\mu_{d}\in\{\mu_{d(T+1)},\mu_{dT}\}$ , calculate the relative performance measures (in %)

$\displaystyle\textit{RBIAS}_{d}=100\frac{\textit{BIAS}_{d}}{|\mu_{d}|},$ $\displaystyle\textit{RRMSE}_{d}=100\frac{\textit{RMSE}_{d}}{|\mu_{d}|},d=1,% \ldots,D.$ $\displaystyle\textit{ARBIAS}=\frac{1}{D}\sum_{d=1}^{D}|\textit{RBIAS}_{d}|,$ $\displaystyle\textit{RRMSE}=\frac{1}{D}\sum_{d=1}^{D}\textit{RRMSE}_{d}.$

Tables 1 and 2 show the aggregated performance measures of Simulation 1. Figure 1 presents the boxplots of the disaggregated relative performance measures. For $d=1,\ldots,D$ , the EBLUPs of $\mu_{dT}$ are basically unbiased. In contrast, but not surprisingly, the predictors of $\mu_{dT+1}$ exhibit small biases that can be positive or negative. However, their relative biases are smaller than 7% in all cases. Accordingly, the root-MSEs are smaller for EBLUPs $\mu_{dT}$ than for predictors $\mu_{dT+1}$ . The relative root-MSEs take values around 15% and 25% for the first and second predictor respectively, which is a sensible difference.

Table 1
Absolute performance measures for $T=20$

$D=$ 50 100 200 300

$T$ ABIAS 0.0033 0.0026 0.0029 0.0029

RMSE 0.3719 0.3709 0.3706 0.3707

$T+1$ ABIAS 0.1088 0.1590 0.1676 0.1711

RMSE 0.6214 0.6386 0.6416 0.6431

Table 2
Relative performance measures for $T=20$

$D=$ 50 100 200 300

$T$ ARBIAS 0.1286 0.1038 0.1197 0.1191

RRMSE 15.0467 15.1342 15.1978 15.2136

$T+1$ ARBIAS 3.4212 5.9412 6.4489 6.6122

RRMSE 24.2923 25.2625 25.5702 25.6595

Figure 1.
Boxplots of $\textit{RBIAS}_{d}$ (left) and $\textit{RRMSE}_{d}$ (right), $d=1,\ldots,D$ , $D=100$ .

5.2 Simulation 2: Performance of the MSE estimate

	$D=$	50	100	200	300
$T$	ABIAS	0.0033	0.0026	0.0029	0.0029
	RMSE	0.3719	0.3709	0.3706	0.3707
$T+1$	ABIAS	0.1088	0.1590	0.1676	0.1711
	RMSE	0.6214	0.6386	0.6416	0.6431

	$D=$	50	100	200	300
$T$	ARBIAS	0.1286	0.1038	0.1197	0.1191
	RRMSE	15.0467	15.1342	15.1978	15.2136
$T+1$	ARBIAS	3.4212	5.9412	6.4489	6.6122
	RRMSE	24.2923	25.2625	25.5702	25.6595

Simulation 2 studies the performance of the proposed bootstrap based MSE estimators of our nowcast predictor $\hat{\mu}_{d(T+1)}$ . More specifically, the bootstrap estimator, $\textit{mse}_{d(T+1)}^{*}$ , of $\textit{MSE}_{d(T+1)}=(\textit{RMSE}_{d(T+1)})^{2}$ is investigated. The bootstrap and algorithm (see Algorithm B) were introduced in Section 4.1. To evaluate its performance, the proposed MSE estimators are compared with the Monte Carlo MSE of $\hat{\mu}_{d(T+1)}$ obtained in Simulation 1. The simulation procedure is as follows.

For $D=100$ , take the values $\textit{MSE}_{d(T+1)}$ from the output of Simulation 1.

Repeat $I=500$ times ( $i=1,\ldots,I$ )

2.1.
Generate a sample of size $D(T+1)$ .
2.2.
Fit the model to the data $(y_{dt}^{(i)},\bm{x}_{dt}^{\prime})$ , $d=1,\linebreak\ldots,D$ , $t=1,\ldots,T$ . Calculate the REML estimators $\tilde{\beta}_{1}^{(i)}$ , $\tilde{\beta}_{2}^{(i)}$ , $\tilde{\sigma}_{1}^{2(i)}$ , $\tilde{\sigma}_{2}^{2(i)}$ , $\tilde{\phi}^{(i)}$ .
2.3.
Repeat $B=500$ times $(b=1,\ldots,B)$

2.3.1.
Generate $v_{d}^{(ib)}$ , $u_{dt}^{(ib)}$ , $e_{dt}^{(ib)}$ , $d=1,\ldots,D$ , $t=1,\ldots,T+1$ , by using $\tilde{\sigma}_{1}^{2(i)}$ , $\tilde{\sigma}_{2}^{2(i)}$ , $\tilde{\phi}^{(i)}$ instead of $\sigma_{1}^{2}$ , $\sigma_{2}^{2}$ , $\phi$ .
2.3.2.
Generate the bootstrap target variables

$\displaystyle y_{dt}^{(ib)}=\tilde{\beta}_{1}^{(i)}+\tilde{\beta}_{2}^{(i)}x_% {dt}+v_{d}^{(ib)}$ $\displaystyle+u_{dt}^{(ib)}+e_{dt}^{(ib)},d=1,\ldots,D,$ $\displaystyle t=1,\ldots,T.$
2.3.3.
Calculate $\mu_{dt}^{(ib)}=\tilde{\beta}_{1}^{(i)}+\tilde{\beta}_{2}^{(i)}x_{dt}+v_{d}^{% (ib)}+u_{dt}^{(ib)}$ , $d=1,\ldots,D$ , $t=1,\ldots,T+1$ .
2.3.4.
Fit the model to the data $(y_{dt}^{(ib)},\bm{x}_{dt}^{\prime})$ , $d=1,\ldots,D$ , $t=1,\ldots,T$ .

Calculate the REML estimators $\tilde{\beta}_{1}^{(ib)}$ , $\tilde{\beta}_{2}^{(ib)}$ , $\tilde{\sigma}_{1}^{2(ib)}$ , $\tilde{\sigma}_{2}^{2(ib)}$ , $\tilde{\phi}^{(ib)}$ .

Calculate the predictors of random effects $\tilde{v}_{d}^{(ib)}$ , $d=1,\ldots,D$ , $t=1,\ldots,T$ .

For the future time point $T+1$ , calculate the preliminary predictions $\tilde{\mu}_{d(T+1)}^{(ib)}=\bm{x}_{d(T+1)}^{\prime(i)}\tilde{\bm{\beta}}^{(% ib)}+\tilde{v}_{d}^{(ib)}$ , $d=1,\ldots,D$ .

2.3.5.
Fit the AR(1)-FHM to the data $(y_{dt}^{(ib)},\linebreak\bm{x}_{dt}^{\prime})$ , $(\tilde{\mu}_{d(T+1)}^{(ib)},\bm{x}_{d(T+1)}^{\prime})$ , $d=1,\ldots,\linebreak D$ , $t=1,\ldots,T$ .

Calculate the new REML estimators $\hat{\beta}_{1}^{(ib)}$ , $\hat{\beta}_{2}^{(ib)}$ , $\hat{\sigma}_{1}^{2(ib)}$ , $\hat{\sigma}_{2}^{2(ib)}$ , $\hat{\phi}^{(i)}$ .

Calculate the new predictors $\hat{v}_{d}^{(ib)}$ , $\hat{u}_{dt}^{(ib)}$ , $d=1,\ldots,D$ , $t=1,\ldots,T+1$ .

For the future time point $T+1$ , calculate the EBLUPs

$\displaystyle\hat{\mu}_{d(T+1)}^{(ib)}=\bm{x}_{d(T+1)}^{\prime(i)}\hat{\bm{% \beta}}^{(ib)}$ $\displaystyle+\hat{v}_{d}^{(ib)}+\hat{u}_{d(T+1)}^{(ib)},$ $\displaystyle d=1,\ldots,D.$

2.4
For $d=1,\ldots,D$ , $t=1,\ldots,T$ , calculate

$\displaystyle\textit{mse}_{d(T+1)}^{(i)}=\frac{1}{B}\sum_{b=1}^{B}$ $\displaystyle\left(\hat{\mu}_{d(T+1)}^{(ib)}-\mu_{d(T+1)}^{(ib)}\right)^{2},$

For $d=1,\ldots,D$ , calculate the absolute performance measures

$\displaystyle B_{d(T+1)}=\frac{1}{I}\sum_{i=1}^{I}$ $\displaystyle\left(\textit{mse}_{d(T+1)}^{(i)}-\textit{MSE}_{d(T+1)}\right),$ $\displaystyle AB=\frac{1}{D}\sum_{d=1}^{D}|B_{d(T+1)}|$ $\displaystyle RE_{d(T+1)}=\left(\frac{1}{I}\sum_{i=1}^{I}(\textit{mse}_{d(T+1)% }^{*(i)}-\right.$ $\displaystyle\left.\textit{MSE}_{d(T+1)})^{2}\phantom{\sum_{i=1}^{I}}\!\!\!\!% \!\!\!\!\right)^{1/2},$ $\displaystyle RE=\frac{1}{D}\sum_{d=1}^{D}RE_{d(T+1)}.$

Figure 2.
Boxplots of $\textit{RB}_{d}$ (left) and $\textit{RRE}_{d}$ (right), $D=100$ , $B=50,100,200,300$ .

For $d=1,\ldots,D$ , calculate the relative performance measures (in %)

$\displaystyle RB_{d(T+1)}=100\frac{B_{d(T+1)}}{\textit{MSE}_{d(T+1)}},$ $\displaystyle\textit{ARB}=\frac{1}{D}\sum_{d=1}^{D}|RB_{d(T+1)}|$ $\displaystyle\textit{RRE}_{d(T+1)}=100\frac{RE_{d(T+1)}}{\textit{MSE}_{d(T+1)}},$ $\displaystyle\textit{RRE}=\frac{1}{D}\sum_{d=1}^{D}\textit{RRE}_{d(T+1)}.$

Table 3 summarizes the results of Simulation 2. It shows that absolute bias, $A B$ , remains constant as the number of bootstrap iterations $B$ increases form $B=50$ to $B=300$ . However, the root mean squared error, $R E$ , slightly decreases as $B$ increases. Further, we see that the bias seems to be the main contributor to root-MSE. Note that for the sake of computational time, the number of iterations was $I=500$ for $B=50,100,200,300$ but just $I=100$ for $B=400,500$ .

Table 3
Results of Simulation 2 for $T=20$ , $D=100$

$B=$ 50 100 200 300 400 500

AB 0.2262 0.2255 0.2254 0.2277 0.2299 0.2289

RE 0.2419 0.2337 0.2301 0.2310 0.2327 0.2314

ARB 35.5893 35.4850 35.4615 35.8263 36.1565 36.0182

RRE 38.0313 36.7575 36.1794 36.3405 36.5915 36.4064

Figure 2 presents the boxplots of the disaggregated relative performance measures. For $d=1,\ldots,D$ , the bootstrap MSE estimators present negative relative biases around 23%. The relative root-MSEs take values around 24% on average. As Table 3 shows that no sensible improvement is achieved for $B$ greater than $B=300$ , we recommend running the bootstrap algorithm with about $B=250$ replicates.
6. Conclusions and extensions

$B=$	50	100	200	300	400	500
AB	0.2262	0.2255	0.2254	0.2277	0.2299	0.2289
RE	0.2419	0.2337	0.2301	0.2310	0.2327	0.2314
ARB	35.5893	35.4850	35.4615	35.8263	36.1565	36.0182
RRE	38.0313	36.7575	36.1794	36.3405	36.5915	36.4064

Today, SAE is standard in many Statistical Offices for predicting indicators on highly disaggregated levels. The mixed-model-assisted methods are extremely popular, especially when it is easier to gather auxiliary $\bm{x}$ than direct information. Until today, this is a lively research area in official and applied statistics. It has many overlaps with biometrics, in particular for the analysis of longitudinal or clustered data. However, while models that account for time dependence exist since the late nineties, now- or forecasting is much less studied so far.

Having in mind the problem of nowcasting for highly disaggregated levels, we have revisited some ideas of SAE for prediction and time series modelling. Based on these ideas we have proposed straight forward extensions for nowcasting. The advantage of this strategy is that it provides nowcasting procedures for which well-studied methods and algorithms already exist. One may argue that none of these proposals contains an entirely new idea. Yet, the intention was not to develop (completely) new ideas, but to combine proven methods for tackling the nowcasting problem when a high level of disaggregation is wanted.

We have only discussed simple indicators and LMMs. First, for extensions to more complex indicators we refer to the book of [16] which gives different examples of SAE for analysing poverty and inequality. For extensions to more complex models see [44, 45] for semiparametric extensions, or [18, 46] for details on random regression coefficient models. Extensions of SAE with time effects to generalized linear mixed models were introduced recently by [47, 6].

Footnotes

https://unstats.un.org/sdgs/indicators/indicators-list/.

From now on $\tilde{\cdot}$ refers to known $\bm{\theta}$ , and $\hat{\cdot}$ to $\hat{\bm{\theta}}$ .

While it is still common to speak of a census in official statistics, even those censi are typically imputations of particularly carefully conducted surveys.

In abuse of notation, $\bm{P}$ is used as replacement character with changing definitions.

Acknowledgments

We thank the organizers and participants of the CCS-UN Workshop on Nowcasting in International Organizations (February 2020 in Geneva) for their input and discussion. We are also thankful for helpful discussions with Kartarzyna Reluga. Financial support is acknowledged from projects 200021-192345 of the Swiss National Science Foundation, PGC2018-096840-B-I00 and PID2019-105986GB-C22 of the Spanish Ministry for Science and Investigation.

References

Banbura

Giannone

Reichlin

. Nowcasting. In: Clements

Hendry

, eds. In the Oxford Handbook on Economic Forecasting. Oxford University Press; 2011. pp. 63–90.

Mazzi

Ladiray

Rieser

, eds. Handbook on rapid estimates. United Nations and Eurostat, Publications Office of the European Union; 2017.

Longford

. Missing Data and Small-Area Estimation. Springer; 2005.

Rao

. Small-area estimation by combining time-series and cross-sectional data. The Canadian Journal of Statistics. 1994; 22(4): 511–528.

Esteban

Morales

Pérez

Santamaria

. Small area estimation of poverty proportions under area-level time models. Computational Statistics and Data Analysis. 2012; 56(10): 2840–2855.

Boubeta

Lombardía

Marey-Pérez

Morales

. Poisson mixed models for predicting number of fires. International Journal of Wildland Fire. 2019; 28(3): 237–253.

Ghysels

Sinko

Valkanov

. MIDAS regressions: further results and new directions. Econometric Reviews. 2007; 26(1): 53–90.

Hendry

Hubrich

. Combining disaggregate forecasts or combining disaggregate information to forecast an aggregate. Journal of Business and Economic Statistics. 2011; 29: 216–227.

Battese

Harter

Fuller

. An error-components model for prediction of county crop areas using survey and satellite data. Journal of the American Statistical Association. 1988; 83(401): 28–36.

10.

Fay

Herriot

. Estimates of income for small places: an application of james-stein procedures to census data. Journal of the American Statistical Association. 1979; 74(366): 269–277.

11.

Morales

Santamaría

. Small area estimation under unit-level temporal linear mixed models. Journal of Statistical Computation and Simulation. 2019; 89(9): 1592–1620.

12.

Pfeffermann

. New important developments in small area estimation. Statistical Science. 2013; 28(1): 40–68.

13.

Rao

JNK

Molina

. Small area estimation. John Wiley & Sons; 2015.

14.

Morales

Esteban

Perez

Hozba

. A course on small area estimation and mixed models. Springer Series in Statistics. Springer New York; 2020.

15.

Tzavidis

Zhang

L-C

Luna

Schmid

Roajas-Perilla

. From start to finish: a framework for the production of small area official statistics. Journal of the Royal Statistical Society: Series A (Statistics in Society). 2018; 181(4): 927–979.

16.

Pratesi

. Analysis of poverty data by small area estimation. Wiley Series in Survey Methodology. John Wiley; 2016.

17.

Jiang

. Linear and Generalized Linear Mixed Models and Their Applications. Springer; 2007.

18.

Prasad

NGN

Rao

JNK

. The estimation of the mean squared error of small-area estimators. Journal of the American Statistical Association. 1990; 85(409): 163–171.

19.

Datta

Lahiri

. A unified measure of uncertainty of estimated best linear unbiased predictors in small area estimation problems. Statistica Sinica. 2000; 10(2): 613–627.

20.

Das

Jiang

Rao

JNK

. Mean squared error of empirical predictor. The Annals of Statistics. 2004; 32(2): 818–840.

21.

Hall

Maiti

. On parametric bootstrap methods for small area prediction. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2006; 68(2): 221–238.

22.

Chatterjee

Lahiri

. Parametric bootstrap approximation to the distribution of EBLUP and related prediction intervals in linear mixed models. The Annals of Statistics. 2008; 36(3): 1221–1245.

23.

Flores Agreda

. On the inference of random effects in Generalized Linear Mixed Models. University of Geneva; 2017.

24.

Yoshimori

Lahiri

. A second-order efficient empirical bayes confidence interval. The Annals of Statistics. 2014; 42(4): 1233–1261.

25.

Henderson

. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975; 31(2): 423–447.

26.

Kackar

Harville

. Unbiasedness of two-stage estimation and prediction procedures for mixed linear models. Communications in Statistics – Theory and Methods. 1981; 10(13): 1249–1261.

27.

González-Manteiga

Lombardia

Molina

Morales

Santamaría

. Bootstrap mean squared error of a small-area EBLUP. Journal of Statistical Computation and Simulation. 2008; 78(5): 443–462.

28.

Kackar

Harville

. Approximations for standard errors of estimators of fixed and random effect in mixed linear models. Journal of the American Statistical Association. 1984; 79(388): 853–862.

29.

González-Manteiga

Lombardía

Molina

Book

Santamaría

. Estimation of the mean squared error of predictors of small area linear parameters under a logistic mixed model. Computational Statistics & Data analysis. 2007; 51(5): 2720–2733.

30.

González-Manteiga

Lombardía

Molina

Morales

Santamaría

. Analytic and bootstrap approximations of prediction errors under a multivariate fay-herriot model. Computational Statistics & Data Analysis. 2008; 52(12): 5242–5252.

31.

Butar

Lahiri

. On measures of uncertainty of empirical bayes small-area estimators. Journal of Statistical Planning and Inference. 2003; 112(1): 63–76.

32.

Carpenter

Goldstein

Rasbash

. A novel bootstrap procedure for assessing the relationship between class size and achievement. Journal of the Royal Statistical Society: Series C. 2003; 52(4): 431–443.

33.

Hall

Maiti

. Nonparametric estimation of mean-squared prediction error in nested-error regression models. The Annals of Statistics. 2006; 34(4): 1733–1750.

34.

Laird

Ware

. Random-effects models for longitudinal data. Biometrics. 1982; 38(4): 963–974.

35.

Verbeke

Molenberghs

. Linear Mixed Models for Longitudinal Data. Springer; 2000.

36.

Datta

Lahiri

Maiti

. Hierarchical bayes estimation of unemployment rates for the states of the U.S. Journal of the American Statistical Association. 1999; 94(448): 1074–1082.

37.

Esteban

Morales

Pérez

Santamaria

. Two area-level time models for estimating small area poverty indicators. Journal of the Indian Society of Agricultural Statistics. 2012; 66(1): 75–89.

38.

Marhuenda

Molina

Morales

. Small area estimation with spatio-temporal fay-herriot models. Computational Statistics and Data Analysis. 2013; 58(1): 308–325.

39.

Vaida

Blanchard

. Conditional akaike information for mixed-effects models. Biometrika. 2005; 92: 351–370.

40.

Marhuenda

Morales

Pardo

. Information criteria for fay-herriot model selection. Computational Statistics and Data Analysis. 2014; 70: 268–280.

41.

Lombardía

López-Vizcaíno

Rueda

. Mixed generalized akaike information criterion for small area models. Journal of the Royal Statistical Society: Series A (Statistics in Society). 2017; 180(4): 1229–1252.

42.

Lombardía

López-Vizcaíno

Rueda

. Selection of small area estimators. Statistics and Applications. 2018; 16(1): 269–288.

43.

Esteban MDPA M D. R package saery: Small Area Estimation for Rao and Yu Model. R CRAN; 2015. Available from: https://cran.r-project.org/web/packages/saery/index.html.

44.

Lombardía

Sperlich

. Semiparametric inference in generalized mixed effects models. Journal of the Royal Statistical Society, Series B. 2008; 70(5): 913–930.

45.

González-Manteiga

Martínez-Miranda

Lombardía-Cortiña

Sperlich

. Kernel smoothers and bootstrapping for semiparametric mixed effects models. Journal of Multivariate Analysis. 2013; 114: 288–302.

46.

Hozba

Morales

. Small area estimation under random regression coefficient models. Journal of Statistical Computation and Simulation. 2013; 83(11): 2160–2177.

47.

Hobza

Morales

Santamaría

. Small area estimation of poverty proportions under unit-level temporal binomial-logit mixed models. Test. 2018; 27: 270–294.

On model-based nowcasting for highly disaggregated levels

Abstract

Keywords

1. Introduction

2. Model-based small area estimation revisited

2.1 Modelling and estimation framework

The small area mixed effects model.

The small area mixed effects estimator.

3.1 Accounting for time effects

Variable selection:

Nowcasting with time series of 𝒚 t :

Nowcasting for known 𝒙 t + 1 :

4.1 An algorithm for SAE nowcasting

5.1 Simulation 1: Performance of the nowcast

Footnotes

Acknowledgments

References

Nowcasting with time series of $\bm{y}_{t}$ :

Nowcasting for known $\bm{x}_{t+1}$ :