Abstract
Informed regional policy needs good regional data. As regional data series for key economic variables are generally absent whereas national-level time series data for the same variables are ubiquitous, we suggest an approach that leverages this advantage. We hypothesize the existence of a pervasive “common factor” represented by the national time series that affects regions differentially. We provide an empirical illustration in which national FDI is used in place of panel data for FDI, which are absent. The proposed methodology is tested empirically with respect to the determinants of regional demand for housing. We use a quasi-experimental approach to compare the results of a “common correlated effects” (CCE) estimator with a benchmark case when absent regional data are omitted. Using three common factors relating to national population, income and housing stock, we find mixed support for the common correlated effects hypothesis. We conclude by discussing how our experimental design may serve as a methodological prototype for further tests of CCE as a solution to the absent spatial data problem.
Introduction
Despite the celebrated data avalanche of recent years, regional economists are faced with the ironic situation that key regional economic data are invariably unobtainable: regional GDP (GRP), regional FDI, regional CPI, regional capital stocks and regional returns to capital are all important indicators of economic performance for which data are often absent. This situation has not escaped the attention of policy makers who often issue laborious guidelines in order to generate consistent regional cross sectional data, such as proportionally-allocated regional accounts, from national data such as national accounts (see for example EUROSTAT 2013). This paper proposes an alternative approach based on the use of panel data. We suggest using the common correlated effects estimator (CCE) to address the absent data situation whereby unavailable panel data are related to a common, typically national, factor that affects regions differentially.
In what follows, we refer to panel data that are available, but where individual observations happen to be missing as “missing” data, and to panel data that are entirely unavailable as “absent” data. The latter problem is existential and therefore more serious. Numerous solutions to missing data have been proposed. If the panel data happen to be spatial, as they are here, various geo-statistical interpolation methods available such as spline functions or kriging (de Smith, Goodchild, and Longley 2007) or “small area” methods (Pfefferman 2002). When regional spatial panel data such as GRP and FDI are absent, researchers proxy absent panel data in various ways. For example, FDI has been proxied by regional employment (Haskel, Pereira, and Slaughter 2008), CPI by regional house prices and incomes (Beenstock, Ben Zeev, and Felsenstein 2011; Borooah et al. 1996), GRP by regional earnings (Tsionas 2000), and capital by GRP (Garofalo and Yamarik 2002; Gleed and Reese 1979). The alternative to proxying is to ignore the absent data altogether. Since the quality of the proxies is unknown, it is not clear which is the lesser evil.
In this paper, we explore a possible solution to the absent spatial panel data problem, which is based on the common correlated effects estimator (CCE, Pesaran 2006). This solution hypothesizes that the absent panel data are related to a common factor, which is represented by their aggregate time series, and regional factor loadings that are heterogeneous. For example, if panel data for gross regional products (GRP) are absent, but GRPs are heterogeneously related to GDP, the parameters of the model may be estimated by CCE using GDP as a common factor. We show that under weak assumptions, the parameter estimates are consistent and even unbiased. However, they would be more efficient had the panel data not been absent in the first place.
In summary, absent panel data are hypothesized to be related heterogeneously to their national or macro time series counterparts, for which data are usually available. This issue of data frequency is part of a more general discussion in spatio-temporal analysis with respect to synchronizing spatial and temporal frequencies in regional research (Park and Hewings 2012; Chung and Hewings 2019) Apart from the example of GRP and GDP, time series data are available for numerous absent panel data, such as FDI, capital, CPI etc. Hence, CCE as methodological solution to the problem of absent spatial panel data, may be of broad and even widespread applicability, because national time series data are naturally more available than panel data. If, however, the common factor hypothesis is false, CCE ceases to be a solution to the absent data problem. Since the common factor hypothesis is testable, its empirical corroboration provides indirect evidence that the absent panel data are related to the hypothesized common factor.
We begin in section 2 by describing the methodology. We distinguish between spatial data that are weakly or strongly dependent, as well as spatial panel data that are stationary or nonstationary. In section 3 we provide an empirical example of CCE in which absent spatial panel data for foreign direct investment are hypothesized to be related to national time series data for foreign direct investment. In section 4 we carry out experiments of the methodology in which a “true” model is estimated with spatial panel data, but subsequently CCE is used in an attempt to replicate the parameters of the true model by pretending that some panel data are absent. Section 5 concludes.
Methodology
Consider the panel data model for N panel units and T time periods:
where the disturbance (u) is iid, and x and z are exogenous and stationary. Suppose panel data for z are absent. However, “macro” data for
Suppose the factor model for zit is:
where the regional factor loadings are denoted by λi and ∊ is iid and independent of u. Substituting equation (2) into equation (1) gives:
Equation (1) cannot be estimated because data for zit are absent. If equation (1) is estimated without z, the estimate of β would be biased if z is correlated with x. We refer to this regression by OAD (omit absent data). What happens if equation (3) is estimated instead? We refer to this regression by CCE since equation (3) specifies common correlated effects. If N is sufficiently large, zt would be independent of vit due to the law of large numbers, in which event zt, which include ∊it is independent of ∊it. For example, GRP for region i is independent of GDP despite the fact the GRPi is a component of GDP, because the share of GRP in region i in GDP tends to zero as N becomes increasingly large. Equation (2) implies that the expected value or probability limit of zit equals λizt in which case estimates of γλi are consistent and even unbiased. This claim results from the fact that the covariance between zt and vit tends to
Although equation (1) does not include spatial lagged variables, equation (3) may include spatial lagged dependent variables as well as spatial Durbin lags. The spatial Durbin lag for z in equation (1) would be
An alternative to CCE is to perform a principal components analysis of the residuals generated by equation (1) as in Bai and Ng (2004). The principal components constitute common factors that may be used instead of zt in equation (3). The two approaches are not identical. However, CCE has the practical advantage of being easier to implement, and in many instances it is justified by economic theory, as discussed below.
In summary, equation (3), which is based on CCE, enables the estimation of the structural parameters, β and γ as well the specific effects. These estimates are less efficient than their hypothetical counterparts from equation (1) because the variance of v is obviously larger than the variance of u. However, estimation of equation (3) would only be justified if estimation of equation (1) without the absent panel data for z (γ = 0) induced strong cross section dependence in its residuals (u), as predicted by equation (3). Alternatively, if the factor loadings (λ) are zero, our CCE proposal would not be feasible.
Weak and Strong Cross-section Dependence
Spatial analysts have typically associated cross section dependence with spatial spillovers, spatial externalities, or “filtering” between different spatial units (Anselin 1988; White 1991). For example, a shock in one region will causally affect other regions with decreasing force as distance from the initial point of disturbance grows. This “weak” cross section dependence (Chudik, Pesaran, and Tosseti 2011) is often associated with contagion and constitutes a source of endogeneity. In his anatomy of the Reflection Problem in econometrics, Manski (1995) refers to this type of cross-section dependence by “endogenous effects.” Modeling the interactions between spatial cross sections is often a primary goal of spatial econometrics, especially within the context of spillovers or network effects (Debarsy and Ertur 2010).
Chudik et al refer to CCE as “strong” cross-section dependence, because unlike its “weak” or spatial counterpart, cross-section dependence does not vary inversely with the distance between spatial units. Manski refers to this form of cross-section dependence by “contextual effects.” Unlike weak spatial dependence, contagion is not the issue here. Rather cross section dependence arises because spatial units are affected to varying degrees by common causes. For example, house prices may be influenced by neighboring house prices (weak cross section dependence), or they may be influenced by weather conditions or unemployment rates (strong cross section dependence) that affect regions differentially. These common factors can be considered as nuisance variables introduced to capture information efficiently. In practice, panel data may combine both strong and weak cross sectional dependence and distinguishing between the two becomes a challenge (Bailey, Holly, and Pesaran 2016).
Under strong cross-section dependence,
Testing and Characterizing Cross Section Dependence
To test for cross-section dependence, we may use the Breusch and Pagan (1979) statistic:
where rij denotes the ½N(N – 1) correlations between ui and uj in equation (1). If BP is less than its critical value, the null hypothesis of cross-section independence cannot be rejected. In this case CCE cannot solve the absent data problem because it assumes that cross-section dependence is present. If BP exceeds its critical value, the null hypothesis of cross-section independence may be rejected. Suppose this to be the case. Pesaran (2015) has proposed the following test to determine whether this cross-section dependence is weak or strong:
where
The intuition behind the CD test is straightforward. If distance between spatial units does not matter for cross-section dependence, increasing N adds more remote spatial units to the sample. Since cross-section dependence does not weaken with distance when it is strong, the average correlation should not tend to zero with N. By contrast, if distance matters because cross-section dependence is weak, average correlations should tend to zero as increasingly remote spatial units are included in the sample. Epidemics eventually run their course so that cross-section correlations tend to zero. However, these correlations do not tend to zero if they are induced by common factors.
Integrated Panel Data
Thus far we have assumed that the panel data in equation (1) are covariance stationary, i.e. their sample moments, such as means, variances and covariances, do not depend on T, N or both T and N. When panel data are nonstationary, estimates of the parameters in equation (1) may be spurious (Phillips and Moon 1999; Baltagi 2013 chapter 12). Beenstock and Felsenstein (2019, chapter 5) discuss the case of spatial nonstationarity, which arises when T = 1 and N becomes increasingly large. Panel cointegration tests for spurious regression have been proposed by Pedroni (1999) and others, under the assumption that the panel units are independent (cross section independence). The null hypothesis in these tests is that the residuals (u) in equation (1) are nonstationary. If the panel data are difference stationary and the residuals are difference stationary, the parameter estimates are spurious. If, however, the null hypothesis is rejected, the parameter estimates are genuine (not spurious).
Kapitanios, Pesaran, and Yamagata (2011) extended CCE to nonstationary panel data. Critical values for rejecting the null hypothesis of spurious regression have been calculated by Banerjee and Carrion-I-Silvestre (2017), which are the strongly dependent counterparts to Pedroni’s critical values for independent panel data. For example, when there are 4 variables the critical values for panel cointegration (GADF) due to Pedroni are −2.75, and when there is one common factor the critical value is −2.25 due to Banerjee and Carrion-I-Silvestre.
When the data used to estimate equation (3) are nonstationary, the parameter estimates are “super-consistent” if N is fixed (Beenstock and Felsenstein 2019, chapter 7). Since N typically comprises all the regions in a country rather than a sample of regions, N is naturally fixed, but T is not. Hence, the asmyptotics depend on T alone. Instead of root T consistency when the data are stationary, the parameter estimate are T consistent if the data generating processes (DGP) are random walks, and they are T3/2 consistent if the DGPs are random walks with drift (as they typically are). Super-consistency has important implications for equation (3). If the variables in equation (3) are panel cointegrated according to the critical values of Banerjee and Carrion-I-Silvestre, the covariance between zt and vit tends to zero rapidly because cointegration implies that vit is stationary whereas zt is nonstationary. Since zt and vit depend on ∊it, the potential bias induced is rapidly mitigated with respect to T.
In summary, CCE is expected to solve the absent spatial panel data problem more effectively, due to superconsistency, if the panel data happen to be nonstationary. Since spatial panel data tend to be nonstationary, this enhances the interest in CCE.
Absent Foreign Direct Investment
There are numerous empirical applications of CCE, which are not reviewed here. These applications assume that the cross section dependence is strong (induced by common factors) rather than weak (induced by spatial dependence). There are even more spatial econometric applications in which it is assumed that cross section dependence is weak rather than strong (Beenstock and Felsenstein 2019, chapter 10). In this section we focus on applications of CCE to solve the absent data problem. This is followed by a test of CCE in which we compare results estimated by CCE when the spatial panel data are not absent, i.e. parameter estimates of equation (1) are compared with their CCE counterparts in equation (3) when zit is artificially assumed to be absent.
There is widespread interest in the effect of FDI on regional economic development. However, absent data on regional FDI stocks has impeded the empirical analysis of regional polarization due to FDI. Indeed, we are aware of only three studies, all of which proxy FDI stocks in various ways. Haskel et al. (2008) proxy it with data relating to the share in regional employment of foreign-owned plants in the UK. This implicitly assumes that FDI stocks are proportionate to employment. They also implicitly assume that foreign-owned businesses are financed through FDI when they might have been financed in the domestic capital market. They show that productivity is higher in foreign-owned firms, suggesting a positive effect of FDI on labor productivity. Ascani and Gagliardi (2015) use regional FDI (not FDI stocks) provided by the Bank of Italy for 103 provinces, to show that regional R&D varies directly with the regional distribution of FDI. Finally, Casi and Resmini (2010, 2014) construct a dedicated regional FDI database for all EU NUTS2 regions (FDIregio) from micro (establishment-based) data obtained from a commercial source. These data, however, only count the number of foreign firms in a region regardless of their size, are discontinuous (1997–9, 2001–3 and 2005–7), and make no distinction between plants and firms.
Since FDI is a component of the balance of payments, its provenance by origin is recorded, but its location within the destination country is unrecorded. It is for this reason that data on the regional distribution of FDI are generally absent. The statistical authorities do not obtain data on where FDI was disbursed after being recorded in the balance of payments. Investment undertaken by foreign-owned businesses is part of FDI, but sources on foreign-ownership do not reveal how much investment was undertaken. If foreign-owned firms operate more than one plant, it is impossible to detect from its balance sheet how much FDI was invested in each plant. In short, the problem of generating data on regional FDI stocks seems to be insurmountable. Furthermore, the regional distribution of capital is unknown in all countries including the US, the UK, Japan and leading industrialized countries (Beenstock, Ben Zeev, and Felsenstein 2011; EU 2011). Therefore, it is hardly surprising that the regional distribution of FDI is unknown.
We illustrate the absent spatial panel data problem by drawing on Beenstock, Felsenstein and Rubin (2017), henceforth BFR, who use CCE to solve this issue for FDI by assuming that national FDI stocks are a common factor for regional FDI stocks in Israel. Their basic model is:
where k denotes log capital-labor ratios, αi denotes cross-section fixed effects, Z denotes the log stock of regional investment grants, KFDI denotes the national stock of FDI in logs, and X is a vector of demographic controls. The loadings (ξ) are assumed to be heterogeneous as in equation (3). KFDI is treated as a common factor for absent panel data for KFDIit. Positive loadings imply that FDI induces capital deepening. If regional investment grants encourage capital deepening β is expected to be positive. The annual data refer to nine regions in Israel (Figure 1) during 1987–2012. BFR show that these data are nonstationary. See BFR for motivation and data details.

Regional map of Israel.
BFR begin by estimating equation (7) by ignoring FDI altogether (Model 1, Table 1) as in an OAD regression. The X variables include real wages, schooling, age and the share of Jews in the regional working age populations. Model 1 in Table 1 suggests that the capital-labor ratio varies directly with wages and human capital, as measured by schooling and experience (proxied by age), but it varies inversely with the relative size of the Jewish population. Finally, the estimate of β is 0.204, i.e. the elasticity of k with respect to cumulative investment grants is about 0.2. Since this greatly exceeds the share of investment grants in the capital stock, the estimate of β indicates that regional investment grants induce capital-deepening. We do not report t-statistics for these parameter estimates because they have non-standard distributions. Indeed, despite the fact that they are large, the GADF statistic (−1.36) shows that model 1 is not panel cointegrated because it greatly exceeds its critical value of −3.32. Therefore, model 1 is a spurious regression. Nevertheless, when Z is dropped from model 1 (not shown) GADF increases to −0.83, suggesting that investment grants might nevertheless have a role in determining capital-labor ratios. The BP statistic for the residuals of model 1 is clearly statistically significant, so we may confidently reject the hypothesis of cross-section independence between the residuals. Since the CD statistic easily exceeds its critical value, we may reject the hypothesis that the cross-section dependence is weak.
Model 2 is the same as model 1, except it specifies two-way fixed effects, which is why GADF* becomes more negative. However, nor is model 2 panel cointegrated. The estimate of β continues to be positive, but when it is dropped from model 2 (not shown) GADF remains almost unchanged. Moreover, if model 2 is estimated with one-way fixed effects the estimate of β is negative (not shown) and GADF = −1.77. In model 2 cross-section dependence continues to be significant and strong. However, the residuals are negatively correlated (r-bar = −0.119) instead of positively correlated.
Panel Regressions for Equation (7).
Note: Regressand is lnkit. Estimation by EGLS-SUR. GADF* denotes the critical value for GADF at p = 0.05 from Pedroni (1999) for models 1–2 and Banerjee and Carrion-I-Silvestre (2017) for models 3–4.
Model 3 estimates equation (7) by specifying the log of KFDI as a common factor. It induces a discrete reduction in GADF from −1.67 to −2.89, which suggests that KFDI might have a role in determining capital-deepening. Model 3 is panel cointegrated because GADF (−2.892) is less than its critical value (−2.31). Model 3 does not specify time fixed effects, because the common factor is a time series, which largely substitutes for time fixed effects. Since the BP statistics in models 2 and 3 are almost identical, the specification of the common factor in model 3 does not reduce cross section dependence. Moreover, the evidence of strong cross section dependence is greater in model 3 than in model 2 because the absolute value of CD is larger. Its sign change implies that the average cross section correlation of the residuals increases from −0.119 to 0.522. Despite the specification of a common factor in model 3, the CD statistic more than doubles because the average correlation increases from 0.248 (model 1) to 0.522.
Model 4 estimates equation (7) with the addition of a second common factor (k), which induces further reductions in GADF to −3.18. Its critical value is naturally more severe (−2.54) than for model 3. The additional common factor reduces cross section dependence (BP falls from 276 to 97) as well as evidence of cross section dependence (CD decreases from 15.39 to −0.094). Since the critical value of chi square for BP is approximately 50 (p = 0.95), and the critical value for CD is 1.64, the outstanding cross section dependence is significant and weak.
Estimates of factor loadings for model 3 in Table 1 are reported in Table 2 where we do not report standard errors and t-statistics because these loadings have non-standard distributions. For example, the estimated loading for Center is 0.213, which implies that the elasticity of the capital-labor ratio in Center with respect to the national stock of FDI is 0.213. The most sensitive region is North where the elasticity is 0.359, and the least sensitive is Krayot where the elasticity is 0.03. See BFR for discussion.
Factor Loadings for National FDI Stock.
In summary, the OAD regressions (models 1 and 2) show that there is strong cross section dependence in the residuals, which suggests that there might be a role for applying CCE as a solution to the absent data problem. The OAD regressions turn out to be spurious because their residuals are nonstationary. Matters are different for CCE since the variables in models 3 and 4 are cointegrated. These results support the idea of using national time series for FDI as a common factor for absent regional panel data for FDI. On the other hand, the specification of this common factor failed to mitigate cross section dependence and induced strong cross section dependence. When, the national capital-labor ratio is added as a common factor, the cross section dependence is greatly reduced and the evidence of strong cross section dependence is eliminated. We therefore think that using CCE to overcome the absent panel data problem is reasonable, and is certainly better than the default of ignoring the problem. On the other hand, there is no way of knowing whether the parameter estimates obtained by CCE would have been the same as, or similar to, those obtained had the data not been absent.
Experiments to Test CCE
The properties of estimators are typically investigated by Monte Carlo methods. For example, an experiment could be designed in which the factor loadings are zero in equation (2). The focus of the exercise would be to calculate the probability of estimating a false factor model. Alternatively, we could presume the factor model in equation (2) to be true, in which case the objective would be to calculate the probability of falsely rejecting CCE. Since we have no priors about CCE we do not go down this road. Instead, we use results “taken from life” in which the spatial panel data are not absent, and use CCE under the artificial pretense that they are absent. We carry out experiments by presuming spatial panel data are absent when in fact these data are available and have been used to estimate equation (1). We compare results obtained by estimating equation (3) with the “true” results obtained from equation (1). The question is, do the CCE results from equation (3) replicate those obtained from equation (1)? Also, is CCE the lesser evil, or would it have been better to ignore the absent data problem altogether as in OAD?
To answer these questions we use results from a “true” model estimated with spatial panel data and compare them to their OAD and CCE counterparts. For these purposes we use an example relating to the regional demand for housing. Equation (1) is based on the equation for house prices taken from Beenstock, Felsenstein, and Xieer (2018), which was estimated by maximum likelihood instead of EGLS. This equation features as model 1 in Table 3, which is an inverted demand equation for housing space, and implies that regional housing demand varies directly with regional population and income, and spatial population and house prices. The motivation and data for model 1 are fully described in BFX. The experiments appear as models 2–4 in Table 3. While the “true” model is estimated with spatial panel data, the added value of the experiments lies in their attempted replication of the parameters of the model 1 using CCE as a substitute when panel data are artificially withheld from the estimation process.
House Price Equations.
Dependent variable: ln(house prices per square meter). Logarithms of variables. Estimation by EGLS-SUR. Observations 1987–2015 for nine regions in Figure 1. Standard errors in parentheses. Asterisks denote spatial lagged variables.
In Table 3 GADF, BP and CD refer to the residuals of the reported model. GADF-OAD, BP-OAD and CD-OAD refer to the residuals of Model 1 when the absent panel data are omitted as in OAD. For example, in Model 2 spatial panel data for population are assumed to be absent, which is why there are no reported parameter estimates for population or its spatial lag. The national population is hypothesized as a common factor for panel data for regional populations, which for experimental purposes are assumed to absent. The estimated factor loadings for Model 2 may be found in Table 4. The GADF statistic of Model 2 is −2.69 for which its critical value from Banerjee and Carrion-I-Silvestre (2017) is −2.31 (p = 0.05). Hence, the variables in model 2 are cointegrated. The OAD statistics serve as the relevant benchmarks for the CCE results in Table 3 because the default is to estimate equation (1) without population data. BP-OAD for Model 2 (211) greatly exceeds its critical value of chi square (c50). Therefore, the residuals are clearly cross section dependent in the OAD regression. CD-OAD = −3.17, which clearly rejects the null hypothesis of weak cross section dependence, suggesting that on average the residuals are negatively correlated (−0.144) according to equation (6), and a common factor may be present.
Experimental Factor Loadings.
Note: The models refer to Table 3. Standard errors in parentheses.
When the national population is specified as a common factor GADF decreases from −2.42 to −2.69, BP decreases from 211 to 204, but CD becomes more negative at −3.24 instead of −3.17. Therefore, specifying the common factor enhances the stationary of the residuals, and weakens their cross section dependence, on the one hand, but makes it more likely that the cross section dependence is strong, on the other hand. The estimated factor loadings of model 2 are reported in Table 4, four of which are positive, and the rest negative. Recall that positive loadings imply that the absent panel data are positively correlated with their common factor if γ is positive, and negatively correlated if γ is negative. According to model 2, γ is positive. The largest loading is for Tel Aviv (0.372) and the smallest is for Center (−0.261).
In summary, this experiment may be regarded as a failure because it does not reduce strong cross section dependence as expected. Also, the parameter estimates in model 2 are clearly different to their “true” counterparts in model 1. Furthermore, the parameter estimates in the OAD regression (0.233 for income, −0.043 for housing stock, and 0.957 for spatial house prices) are closer to the “truth” than their CCE counterparts.
In the next experiment we pretend that regional panel data for income are absent (column 3 in Table 3). This experiment is more successful because the parameter estimates are closer to the “truth,” BP = 144 is considerably smaller than BP-OAD = 215, and CD is slightly larger. However, the OAD parameters (0.225 for population, 0.449 for spatial population, −0.355 for housing, and 0.881 for spatial house prices) are closer to the “truth” than their CCE counterparts.
In the final experiment, we pretend that regional panel data housing stocks are absent. This experiment is the least successful because although GADF = −2.21 is much smaller than its OAD counterpart (−1.20), it exceeds its critical value. Therefore, model 4 is close to being spurious. Also, the common factor increases cross section dependence because BP and CD increase, and the OAD parameters are closer to the “truth.” The CD statistic implies that the average correlation between the residuals is 0.79 in model 4, whereas in models 1–3 the average correlation is about −0.14. The estimated factor loadings for model 4 have the same sign (negative because house prices vary inversely with housing stocks), but the other loadings have mixed signs, and are, on average, negative.
These experiments may be criticized on the grounds that there is an implicit pre-test bias in the assumption that equation (1) is indeed the correct model. We cannot be sure that equation (3) is not the correct model, and that equation (1) is correct. The hypothesis in equation (1) is that panel data for z have a homogeneous effect on the dependent variable (y). The hypothesis in equation (3) is that the common factor for z has a heterogeneous effect on y. These are non-nested hypotheses since neither hypothesis is a special case of its rival. The same would apply if in equation (1) γ is heterogeneous rather than homogeneous. This means that models 2–4 in Table 3 might be preferable to model 1.
To investigate this possibility, we carry out non-nested J and JA tests (Davidson and MacKinnon 2009, chapter 15) for model 1 against its CCE rivals. For example, the J test specifies the predicted value of the dependent variable according to CCE in model 1, and vice-versa. The JA test is refinement of the J test and has superior finite sample properties. Non-nested tests have four possible outcomes: Model 1 explains house price behavior that CCE cannot explain but CCE cannot explain house prices that model 1 cannot explain. CCE explains house price behavior that model 1 cannot explain but model 1 cannot explain house prices that CCE cannot explain. Model 1 explains house price behavior that CCE cannot explain and CCE explains house prices that model 1 cannot explain. Model 1 cannot explain house price behavior that CCE cannot explain and CCE cannot explain house prices that model 1 cannot explain.
Model 1 encompasses CCE in outcome i) and is therefore preferable. CCE encompasses model 1 in outcome ii) and is therefore preferable. In outcomes iii) and iv) neither model encompasses its rival, so neither model is preferable.
Non-Nested Tests of Model 1 and CCE.
Note: The J and JA statistics have t-distributions. GADF refers to the JA test.
Results are reported in Table 5. Outcome iii) applies in all cases, so model 1 is not preferable to its CCE rival, but nor is CCE preferable to model 1. On the other hand, the GADF statistics for CCE are smaller (more negative) than their counterpart or model 1, suggesting that the CCE models have superior cointegration properties than model 1. This means that CCE can account for nonstationary elements in the residuals for which model 1 cannot account, but model 1 cannot account for nonstationary elements in the residuals for which CCE cannot account. For example, from Table 3 GADF = −2.2 for model 1 and −2.69 for model 2 (CCE for population). When CCE is the rival to model 1, GADF decreases from −2.2 to −2.64. When model 1 is the rival to CCE, GADF hardly decreases from −2.69 to −2.74. The same applies to the experiment with wage data (model 3 in Table 3). However, in the case of model 4, the experiment with housing stocks, CCE reduces GADF for model 1, but model 1 also reduces GADF for model 4 from −2.21 to −2.81.
Discussion
Informed regional policy needs good regional data. As regional data series for key economic variables are generally absent whereas national-level time series data for the same variables are ubiquitous, we suggest an approach that leverages this advantage. We hypothesize the existence of a pervasive common factor represented by the national time series that affects regions differentially. We provide an empirical illustration in which national FDI is used in place of panel data for FDI, which are absent. The proposed methodology is tested empirically with respect to the determinants of regional demand for housing. We use a quasi-experimental approach to compare the results of a CCE estimation with a benchmark case when absent regional data is omitted (OAD). Using three common factors relating to national population, income and housing stock, we find mixed support for the CCE hypothesis in the first and third instances but some support for the hypothesis in the second.
Before passing judgment on CCE as a solution to the absent panel data problem many more experiments are required involving many different dependent variables, rather than just one, as here. Indeed, our experimental design may serve as a methodological prototype for further tests of CCE as a solution to the absent panel data problem. In the meanwhile we think that our application of CCE to solve the absent data problem for foreign direct investment is of methodological interest, even if there is no way of establishing that it is correct. At the very least, we see CCE as an alternative to proxying absent data by variables, which may be incorrect too. As Frost (1979) has noted, while researchers use proxies believing that they are highly correlated with the absent variable, under certain conditions proxying can in fact increase both the squared bias and the variance of the estimated coefficients. The default of ignoring the absent data problem altogether provides no better a solution. Therefore, we suggest conservatively that CCE might be used as a robustness check to complement the widespread use of data proxies. More radically, non-nested tests suggest that CCE might be preferable to estimates that would have been obtained had the panel data not been absent.
Finally, if spatial panel data happen to be available at the annual frequency but not at the sub-annual frequency, a mixed frequency model (Koop, McIntyre, and Mitchell 2020) may use the available spatial panel data at the annual frequency, while CCE is used in place of the absent sub-annual spatial panel data. Suppose, for example, in the house price experiment that annual spatial panel data are available for all variables, but sub-annual data are available for all variables apart from housing stocks. In a mixed frequency model the annual data would used be to estimate the annual model, while CCE would be used in the sub-annual model in place of absent sub-annual data for housing stocks.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
