Abstract
Policymakers and healthcare service managers demand reliable, accurate and disaggregated information about child deaths at the sub-national level to plan and monitor healthcare service delivery and health outcomes. In support of this demand, this research aimed at providing reliable local municipality estimates of the under-5 mortality rate (U5MR) in South Africa. The paper used a small area estimation approach to improve the precision of local municipality estimates of U5MR by linking data from the 2016 Community Survey (CS) and the 2011 Population Census (PC). The diagnostic measures and validation of the reliability of the results showed that the local municipality estimates of U5MR produced by small area estimation are more efficient and precise than direct estimates of U5MR based only on the CS data. Further, accurate and cost-effective local municipality estimates of U5MR were produced without the need for more resources through combining the available data sources. This was achievable since the research did not require a separate survey for this purpose. The results can be used to monitor U5MR at the local level in South Africa since they link directly with the Sustainable Development Goals (SDGs).
Introduction
Mortality indicators, especially child mortality rate, is a representative demographics index and has been used as one of the main indicators of quality of life for a particular country or region [1]. Global reductions in under-5 mortality rate (U5MR), the probability that a child will die before reaching the age of five, has long served as an important proxy measure for overall improvements in population health and progress towards achievement of development goals [2].
There has been sustainable global progress in reducing child mortality since 1990 [2]. The global U5MR has declined from 19.6 million in 1950 to 5.4 million in 2017 [2]. Similar to the rest of the world, national official data indicates that South Africa has made substantial progress in reducing U5MR [3]. The number of child deaths has reduced substantially during the past decade, especially for U5MR in South Africa and which subsequently was 37–40 per 1 000 live births in 2015 [3]. One of the main goals of South Africa is reducing U5MR. Goal 3.2 aims to reduce U5MR to less than 25 per 1 000 live births by 2030 [3].
Nevertheless, the progress in reducing U5MR at the sub-national level (e.g., district, municipality, ward) remains unclear. In other words, there has been very limited effort in obtaining sub-national U5MR estimates in South Africa. This is unfortunate, as these estimates provide more localized spatial distribution [4]. For example, regarding economic history, one can learn new information from the spatial distribution of mortality rates in the sub-national administrative levels [4]. Sub-national estimates could be used as a basis for identifying marginalized households and communities to ensure the equitable and inclusive provision of assistance to the ones most in need. Furthermore, the U5MR in local municipalities is also an essential tool for monitoring progress towards the SDGs to ensure that certain districts or municipalities are not altogether left behind.
The aim of this paper is twofold. First, the primary aim is to account for inequality in U5MR between local municipalities in South Africa. In other words, we report the estimated U5MR in South Africa using a small area estimation method, underpinned by an approach that does not receive much attention in the mortality literature. Second, this paper analyzes geographical (spatial) variations of U5MR in local municipalities using spatial mapping. It is vital for decision-makers at all levels to better understand the spatial heterogeneity in child survival.
From the above aims, it is possible to summarize the following questions: (i) How does the inequality in U5MR vary across local municipalities? (ii) How is precision improved when model-based U5MR is used instead of direct survey estimates? (iii) What policy implications can be proposed based on this study? To explore these questions, this research relied on the 2016 Community Survey (CS) and the 2011 Population Census (PC) of South Africa.
In a nutshell, this study is a contribution to limited literature on U5MR in South Africa where no significant studies are as yet available to understand the mortality inequality between local municipalities using the above two South African data sets. In order to obtain local municipality estimates of U5MR, this research applies the Fay-Herriot (FH) and spatial FH models [5]. The advantage of using these models is that they only require summary data (area-level aggregate data), not unit-level data that might not be available for the researcher due to confidentiality concerns [6].
The remainder of the paper is structured as follows. Section 2 describes the data sources and model specifications. In Section 3, the results are provided, while Section 4 discusses the findings. The conclusions are then presented in Section 5.
Materials and methods
Data sources
South Africa has nine provinces which in turn, are divided into district municipalities, then local municipalities. We applied the FH and spatial FH models to derive local municipality estimates of U5MR using the available birth history data from the 2016 CS and the 2011 PC. The following variables were required to perform this analysis.
The target variable was drawn from the 2016 CS data conducted by Statistics South Africa (Stats SA). The main goal of this survey was to provide indicators such as population count, an estimate of household count, fertility, mortality, migration, employment, unemployment, the extent of poverty in households. The CS data was accessed from Stat SA website: The auxiliary variables were categorical (i.e., proportions of individuals in each category). Around 40 covariates were selected from the 2011 PC to consider for the modelling. These were gender (male, female); race (Black African, Coloured, Indian or Asian, and White); age (0–14, 15–24, 25–34, 35–44, 45–54, 55–64 and 65–120); employment status (employed, unemployed, not economically active, head employed, head unemployed and head not economically active); employment sector (formal sector, informal sector and private household); marital status (married, living together like married partners, never married, widower/widow, separated, divorced); education level (no schooling, some primary, completed secondary, some secondary, grade 12, tertiary and other); urban area, farm area, Income in South African Rand (No income, R 1–R 76 800, R 76 801–R 614 400, and R 614 401 or more). Reference categories were removed from the analysis since the proportions of individuals in the categories of the auxiliary variable summed up to one [7, 8]. The 2011 South African PC data were accessed from the DataFirst website: Out of these datasets, suitable covariates were chosen using correlation analysis followed by step-wise regression analysis [9]. First, the correlation analysis was run to determine, for example, if one of the covariates (i.e., unemployed category) has a reasonably good correlation with U5MR. Such assessment was replicated for all target and auxiliary variables. Finally, the following variables were identified for further analysis: Black African, no schooling, some primary attained, grade 12, married, divorced, farm area, head unemployed, formal sector, no income and R 614 401 or more.
Methodology
The FH model has been widely used in small area estimation for a variety of reasons: (i) it uses area level summary data and not unit-level data that might be unavailable to the analyst, because of confidentiality concerns [6]; (ii) its simplicity [10]; and (iii) its ability to provide design-consistent estimators [10].
Consider a finite population of size
The FH area-level model
The widely-used FH model [5] consists of two levels as follows:
Sampling Model:
Linking Model:
where
where
When the unknown parameters in Model Eq. (1) are replaced by their estimators, then the Empirical Best Linear Unbiased Predictor (EBLUP FH here after) for each small area
where
The FH model given in Eq. (1) assumes that the neighbouring small area estimates are spatially uncorrelated. In many applications, however, the random effects between the neighbouring small areas are correlated [6]. The proposed extensions of the FH model by [12] allow spatial association instead of independence between neighbouring small areas following a Simultaneous Autoregressive (SAR) process. This process is defined as [13]:
where
The spatial EBLUP (EBLUP SFH here after) estimator of
where
The most common practical problem in small area estimation is measuring the variability associated with the EBLUP FH and EBLUP SFH. The variability under these estimators was measured using mean squared error (MSE). For example, for the REML estimator, an unbiased analytical estimator of the MSE is:
Details about the specification of
Since the relationship between U5MR and the auxiliary variables is non-linear on the original scale, the log-transformed FH and spatial FH models are considered in this application. Following [18, 19], the direct estimator and its variance can be transformed as follows:
where
The EBLUP FH estimator in the transformed scale can be obtained as follows [18, 19]:
where
where
In a similar manner to the FH model, the spatial FH model was fitted by using
In addition to the point estimates, the prediction intervals were computed using the following equations:
where
The goodness-of-fit statistics such as log-likelihood, the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) were used to choose the best models (see [6]).
In most cases, the sampling variance,
where
the factor
Model comparison based on the log-likelihood, AIC and BIC values
The dispersion plots for GVF fit.
The response variable is the direct survey estimates of U5MR. The FH and spatial FH models were fitted using U5MR as a response variable and the covariates were chosen above as auxiliary variables. Table 1 presents the log-likelihood, AIC and BIC values of the log-transformed FH and spatial FH models. The goodness-of-fit statistics favoured the log-transformed FH model instead of the spatial FH model.
The regression parameters and their corresponding p-values are presented in Table 2. The
Estimates of the log-transformed FH model parameters
Estimates of the log-transformed FH model parameters
The direct, EBLUP FH and EBLUP SFH estimates of U5MR among the South African local municipalities are summarized in Table 3 using percentiles and mean. The median value was equal to 10.12; therefore 50% of the local municipalities (about 107) showed an estimated U5MR greater or equal to 10.12. Furthermore, the local municipality estimates of U5MR generated by the EBLUP FH approach ranges from 2.53 to 33.28 with an average of 11.66. The range of EBLUP SFH estimates appeared to be slightly larger (2.23 vs. 33.34) than the EBLUP FH (2.53 vs. 33.28). This is more likely due to the additional spatial information used in the spatial FH model [24].
Summary of direct and model based estimates of U5MR using the FH and spatial FH models
Bias diagnostics plots with 
The research considered bias diagnostics, the goodness of fit diagnostics, coefficients of variation (CV) and root MSE to validate the reliability of the small area estimates.
Bias diagnostics
Bias diagnostics were used to examine if the model-based estimates were less extreme when compared to the direct survey estimates [25, 6]. If the model-based estimate is unbiased, then a plot of this estimate against the direct survey estimate would show the scatter plot evenly centered around the unit line
Goodness of fit diagnostic
The goodness-of-fit diagnostic proposed by [26] also considered testing whether the direct survey estimates and model-based estimates are statistically different. The hypotheses of interest can be stated as follows:
The Wald statistic for the goodness-of-fit diagnostic for the log-transformed FH model is given as follows
where
Summary of the CVs of direct and model-based estimates along with the PIs
The coefficient of variation of direct, EBLUP FH and EBLUP SFH estimates for each local municipality in South Africa.
To compare the improved precision of the EBLUP estimates with directs survey estimates, the percent CV was computed. The CV measures the sampling variability as a percentage estimate [25]. The CVs of direct survey estimator were given as
Root MSE
The MSE of the model-based estimates and the MSE of the direct survey estimates should be comparable if the fitted model fits the data reasonably well [29, 6]. For this reason, this research examined the two MSEs (i.e., the MSE of the model based estimator and the MSE of the direct estimator). Figure 4 plots the estimated model-based root MSEs of the EBLUP FH and EBLUP SFH and the root MSE of the direct estimators of U5MR. It can be easily observed that the root MSE of model-based estimators tended to be smaller than the root MSE of the corresponding direct survey estimators. This is the case since minimizing the model-based MSE is one of the properties of the class of the model-based unbiased linear predictors [29, 6].
The root MSEs of direct and model based estimates for each local municipality in South Africa.
The plots of residuals versus estimates for the log-transformed FH model (top) and the log-transformed spatial FH model (bottom).
Checking model assumptions is crucial since the random effects in the FH model are identically independently distributed normal with a mean of zero [6]. In addition, the optimality properties of the model-based estimates depend on the extent to which distributional assumptions are true [28]. For this research, standard diagnostics such as residual vs estimates (both direct and model based estimates) were used for this purpose. Figure 5 presents the standardized residuals versus model-based estimates (left) and standardized residuals versus direct estimates (right). The direct takeaway from these plots is that both graphs have smoother shapes because they do not include anomalous observations [8].
Figure 6 presents the distribution of standardized residuals (left), histograms of standardized residuals (centre) and standard normal
Distribution of the standardized residuals (left), histogram of the standardized residuals along with it normal fit (centre) and q-q plots for the standardized residuals (right) for the log-transformed FH model (top plots) and spatial FH model (bottom plots).
Finally, Fig. 7 shows the spatial mapping of local municipality-wise U5MR produced using the EBLUP FH (top left) and EBLUP SFH (top right) estimates along with their corresponding CVs under the log-transformed FH (bottom left) and spatial FH (bottom right) models.
Map of U5MR in South Africa for the log-transformed FH model (left) and the log-transformed spatial FH model (right).
This article provides the local municipality estimates of U5MR in South Africa using the 2016 CS and the 2011 PC. In particular, since U5MR depends on the binary outcome coded as zero for dead and one for alive, neither linear mixed models nor unit-level model-based estimation approaches can be used [8]. “Instead, area-level models provide an easy-to-apply solution” [8]. Therefore, for this research, log-transformed FH and spatial FH models were applied to obtain local municipality estimates of U5MR in South Africa. These models have not received much attention in the mortality literature.
The regression analysis results indicated that auxiliary variables such as education level (some primary education, some secondary education and attainment of Grade 12), employment status (head unemployed), income level (no income and R 614 401 or more) and employment sector (formal) were strong predictors of U5MR. The results of this study are discussed here in light of related literature.
In the South African context, and indeed in other parts of the world, U5MR is linked to socio-economic factors such as poverty and/or lack of income, food insecurity, unemployment, education, the global food price increase and so on. Income (lack of income) increased U5MR, while income (R 614 401 or more) decreased U5MR. Several studies have demonstrated the relationship between mortality and poverty/lack of income. For example, researchers such as [30, 31, 32] have argued that there is a positive relationship between poverty and mortality rates (particularly child mortality). Furthermore, [33] emphasized in his essay that the survival conditions of poor households are worse when compared to non-poor households. In another study, [34] argued that persons living in poverty have higher mortality rates than those out of poverty. In other words, child mortality rates are low in high income earners compared to low income earners [35]. Education level (some primary education, some secondary education and attainment of grade 12) and employment status (head of household unemployed) were also strong predictors of U5MR. Mothers with relatively low levels of education are less likely to be employed, which prevents them from investing in better health outcomes for their children [36]. In addition, lack of income and/or poverty is the principal driver of South Africa’s food insecurity [37], and food insecurity is an important determining factor of child mortality [38]. The food prices increases due to economic crises result in death in famine situations, which in turn leads to an increase in childhood mortality [39].
Which local municipality is most at risk?
The local municipality-wise U5MR produced by the EBLUP FH approach in South Africa ranges from a minimum value of 2.53 observed for the Drakenstein local municipality in the Western Cape province to a maximum value of 33.28 observed for the Khai-Ma local municipality in the Northern Cape province.
The lowest (
Direct, EBLUP FH and EBLUP SFH estimates of U5MR (per 1000 live births), CVs of direct, EBLUP FH and EBLUP SFH estimators (
100) and 95% confidence intervals of EBLUP FH and EBLUP SFH estimates for the South African local municipalities with the lowest U5MR (
4 deaths per 1000 live births) and highest U5MR (
30 deaths per 1000 live births). For example, the lower and upper limits of the confidence interval for
based on the EBLUP FH estimate is obtained as:
, where
and
Direct, EBLUP FH and EBLUP SFH estimates of U5MR (per 1000 live births), CVs of direct, EBLUP FH and EBLUP SFH estimators (
The reliability of the model-based estimates of U5MR was examined using bias diagnostics, the goodness of fit diagnostics, CVs and root MSEs. These measures indicated that the model-based estimates are superior to the direct survey estimates, since the model-based estimates borrow strength from auxiliary variables from the 2011 PC [29]. Moreover, the standard diagnostics such as residual vs estimates (both direct survey- and model-based) were used to asses the models assumptions. The empirical results from the model diagnostics have shown that the assumptions are satisfactorily met.
Figure 7 (top) maps the EBLUP FH (left) and EBLUP SFH (right) estimates of U5MR. These maps give a visual illustration of estimated local municipality-wise U5MR in South Africa. They provide an important source of information on the distribution of local municipality estimates of U5MR with yellow colour corresponding to a better situation (lower estimate for the U5MR), while red corresponds with a higher estimate for the U5MR. Figure 7 (bottom) maps the corresponding CVs of the EBLUP FH (left) and EBLUP SFH (right) estimates (shown in yellow) indicate local municipalities with lower-values CVs. For example, in the northern part of South Africa (North West and Northern Cape provinces) there are many local municipalities with a high U5MR (shown in red). The values of CV are low (shown in yellow) in this part of South Africa. The Western Cape and Gauteng, in contrast, have many local municipalities with low U5MR (shown in yellow) with high corresponding CVs (shown in red). These maps are currently used to identify the policy priorities for reducing U5MR, for instance to allocate funds to local jurisdictions.
Conclusion and policy implication
Estimation of mortality-related variables at the sub-national scale is a challenging statistical problem because sample sizes are too small and simple parametric models may only poorly capture the relationships between these variables [42]. At the same time, there is a growing demand for accurate, reliable and cost-effective U5MR among policy- and decision-makers, as well as private and public sector administrators using data from different sources. As a consequence, many area- and unit-level small area models have been proposed in the literature [25]. In this context, the FH and spatial FH models have the potential to make a real impact by providing smoothed and stabilized estimates [4].
Having good small area estimates is important because most policies will generally be based on this type of information. The results indicate that root MSEs and CVs of the direct survey estimates are generally larger than model-based estimates. This is the case because the direct estimates of U5MR in local municipalities with a sparse sample in a 2016 CS do not provide adequate precision [42, 6]. From this, we can conclude that a model-based estimate of U5MR is more reliable than a direct survey estimate.
Both local municipality estimates, as well as the spatial map of U5MR have great potential in assessing mortality issues, as they give useful information in identifying municipalities with higher child mortality rates for targeted interventions. Most importantly, the local municipality level U5MR statistics are useful for (i) relevant Departments and Ministries of South Africa to implement policy interventions for targeted local municipalities with high U5MR, as well as for ensuring effective administrative and financial decisions (e.g., budget allocation) (ii) international organizations for their strategic planning and policy research.
Furthermore, accurate and cost-effective U5MR statistics at the local municipality level were produced without the need for more resources/surveys through combining the available data sources (i.e., the 2016 CS and the 2011 PC).
Footnotes
Acknowledgments
The author would like to acknowledge Statistics South Africa for the 2016 CS data and Data First for the 2011 Census data. In addition, the author thanks the editor and two anonymous referees for their valuable comments and suggestions; these led to a considerable improvement in the article. All errors are mine.
