Abstract
Data on the number of health insurance participants at the subdistrict level is crucial since it is strongly correlated with the availability of health service centers in the areas. This study’s primary purpose is to predict the proportion of health and social security participants of a state-owned company named Badan Penyelenggara Jaminan Sosial Kesehatan (BPJS) in eleven subdistricts in Padang, Indonesia. The direct, ordinary least square, and hierarchical Bayesian for small area estimation (HB-SAE) methods were employed in obtaining the best estimator for the BPJS participants in these small areas. This study found that the HB-SAE method resulted in better estimation than two other methods since it has the smallest standard deviation value. The auxiliary variable age (percentage of individuals more than 50 years old) and the percentage of health complaints have a significant effect on the proportion of the number of BPJS participants based on the HB-SAE method.
Keywords
Introduction
One indicator of Indonesia’s medium-term goals is the proportion of the individuals registered in health and social security system namely BPJS Kesehatan (Badan Penyelenggara Jaminan Sosial Kesehatan, Health Social Security Agency), a state-owned company of aimed at providing universal and affordable health care to its citizens. Nationally, the number of the national health insurance participants in Indonesia has reached 221,580,743 people or 83.5% of all Indonesia’s citizen as of May 2019. In Padang, the capital city of West Sumatra province, BPJS recorded that until December 1, 2019, the number of people registered in the National Health Insurance for the Indonesia Health Card (JKN-KIS) program had reached 1,463,097 people or 80.3% of all population in Padang. These information regarding the number of BPJS registrants/participants, provided by BPS (Badan Pusat Statistik, Central Bureau of Statistics), are only presented at the city/district level. There is no detail information available regarding the number of BPJS participants at the subdistrict level in Padang and this has become an obstacle for local governments in policy-making for regional autonomy implementation.
Information regarding the number of BPJS participants at the subdistrict level is vital because it corelates strongly to the availability of health service centers in the areas. Thus, the reliability of small-area estimation is critical in making proper decision or policies. In the estimation of a characteristic of such a small group, a direct estimate based solely on data from the small group is likely to be unreliable, because only a small number of observations are available from the small group (Sugasawa & Kubokawa, 2020). The problem of small area estimation is how to generate a reliable estimate for a small group’s characteristic, and small area estimation has been actively studied from both theoretical and practical aspects due to an increasing demand for reliable small area estimates from public and private sectors. A study by Yoshimori and Lahiri (2014) proposed a new adjustment factor that rectifies the problems associated with the existing adjusted likelihood methods. Sugasawa et al. (2018) developed an empirical Bayesian approach in which the Monte Carlo expectation-maximization algorithm for computing the maximum likelihood estimator. Diallo and Rao (2018) suggested relaxing the normality assumption of Molina et al. (2014) and derive the Empirical Bayesian estimator, assuming that the random errors in the nested-error model follow skew-normal distributions. Tsujino and Kubokawa (2019) suggested the nested error regression model with skew-normal distributions error terms. Sugasawa et al. (2019) proposed the unmatched sampling and linking models with an unknown link function modeled by a P-spline. They provided a hierarchical representation of the proposed model. Yanuar et al. (2019) employed empirical Bayesian estimator for small area estimation which use Poisson Gamma as prior distribution. Sugasawa and Kubokawa (2020) reviewed small area estimation techniques using mixed models, covering from fundamental to recently proposed advanced ones. Few types of research have been done in the application or employment of this method, such as an application on poverty indicators (Molina et al., 2014) and forestry estimation (Ver Planck et al., 2018).
In this paper, hierarchical Bayesian models in small area estimation are employed for counts data. The SAE method using Bayesian specification are based on the Fay-Harriot model (Fay & Herriot, 1979), which considers a generalized linear Poisson model. You and Rao (2002) proposed a Normal-lognormal model within the class of the unmatched models. Recently, Nazir et al. (2016) has proposed a new improvement before hyperparameters of variance components and then consider a lognormal model. In this paper, a modification of the latter, a Binomial-log Normal model, is considered. Under appropriate conditions, any model could have some merits. Still, its appropriateness depends on various circumstances like the size of the areas, availability of suitable explanatory variables at the area level, the accuracy of sampling variance estimates, etc. The practical use of Bayesian hierarchical models has been boosted by software availability that implements MCMC simulations so that estimating the model can be straightforward and relatively easy.
This study aims to predict the proportion of the BPJS participants in the subdistrict level in Padang, Indonesia. The direct method, ordinary least square (OLS), and SAE-HB are employed to achieve the purpose. The three methods are compared to obtain the best estimator. The best approach is then implemented to predict the proportion in a subdistrict where the number of BPJS participants is unknown. The proportion of BPJS participants using the direct method cannot be predicted in the corresponding areas since not all subdistricts in Padang have complete information regarding the number of BPJS participants. Meanwhile, the OLS method is used for large data set and if the error term is not properly interpreted, the regression results are sensitive to functional form, which can lead to widely disparate conclusions depending on how the regression is initially set up. The SAE method with the HB approach is employed then. This method involves the auxiliary variables to forecast the proportion of the BPJS participants and its confidence interval. The confidence interval is essential for the BPJS party in allocating the benefit funds paid to customers in the future.
Materials and methods
Materials
This study uses data of 2018 BPJS participants in eleven subdistricts obtained from the Padang’s BPJS branch. Data from ten subdistricts are analyzed to predict the model as presented in Table 1. One subdistrict that is Bungus Teluk Kabung, has no information regarding the number of BPJS participants. The number of individuals registered in BPJS for this subdistrict (Bungus Teluk Kabung), will then be estimated by the proposed model obtained later.
The number of BPJS participants in each subdistrict in Padang (in thousand)
The number of BPJS participants in each subdistrict in Padang (in thousand)
As can be seen in Table 1, the highest number of BPJS participants is in Koto Tangah subdistrict with 174,000 participants, with the number of individuals living in this subdistrict is 193,000. Meanwhile, the lowest of BPJS participants is in Padang Barat, with only 40,000 participants.
Many studies have been published regarding factors affecting individuals enrolling in health insurance. Mahumud et al. (2017) explored the socioeconomic, demographic, and behavioral characteristics of an individual that impacted health insurance expenditures in Bangladesh. Duku (2018) identified age, sex, educational level, marital status, health status and, travel time to the nearest health facility as determinants of enrolment in health insurance. Salari et al. (2019) considered the variables that affect the individual to register in health insurance: wealth, marital status and, age.
This study considered age (percentage of individual more than 50 years old), sex (percentage of female), marital status (percentage of married people), educational level (percentage of an individual had at least senior high school), and percentage of health complaints, to be auxiliary variables. These auxiliary variables are assumed could improve the prediction of parameters to be estimated in this model, that is the proportion of the number of BPJS participants in each subdistrict in Padang. In a preliminary analysis, it was obtained that only age (percentage of individuals more than 50 years old) and percentage of health complaints, have significant correlation on response. Therefore, only two auxiliary variables are then considered in the hypothesis model.
In this research, the problem is how to estimate model parameters whose response has Binomial distribution with parameter
Direct estimation for binary response
Direct estimator for parameter
The likelihood function then can be constructed based on this pdf
By maximizing Eq. (2) then we obtain the parameter estimated for
In this study, the estimated proportion here refers to the proportion of BPJS participants in subdistrict
Ordinary least square
Ordinary least squares (OLS) is a linear least squares method used to estimate unknown parameters in a linear regression model. The principle of least squares is used by OLS to select the parameters of a linear function of a set of explanatory variables: minimizing the sum of squares of the differences between the observed dependent variable (values of the variable being observed) in the given dataset and those predicted by the linear function of the independent variable (Higgins & Thomas, 2019).
Hierarchical Bayesian in SAE for binary response
In this present study, hierarchical Bayesian is constructed using the Binomial Logit-Normal model to linearized the correlation between response and its auxiliary variables. It is assumed that response
The random effect
Meanwhile, the likelihood function for
The joint prior distribution for
In this present study, the parameter
The marginal posterior distribution for
Rao and Molina (2015) suggested employing Gibbs sampler to solve Eq. (10) and follow these steps:
where
and
The estimated value for each parameter is achieved by constructed marginal posterior distribution for the corresponding parameter, such as following:
The posterior distribution for the parameter
The posterior distribution for regression coefficients
The posterior distribution for a random effect
Since mean posterior and variance posterior for each parameter above could not be achieved analytically, a numerical approach using MCMC (Markov Chain Monte Carlo) is applied by generating random samples. Convergency tests of parameter estimated are based on trace plot, autocorrelation plot and Monte Carlo error (Yanuar, 2015).
For model selection and model comparison in Bayesian model, the most popular criteria is the Deviance Information Criteria (DIC) (Chan & Grant, 2016; Spiegelhalter, 2002). Assume that a model for observed data
with
Model construction
In this section, we fit the data by employing direct, ordinary least square and HB-SAE estimation methods to construct the estimator for a proportion of the number of BPJS participants at the subdistrict level in Padang. The estimation results of the three methods are then compared to identify the suitable estimator to predict the proportion of binary response cases, i.e., as a participant or not, denoted by
Parameter estimated for
The estimated values for
We then implement OLS to construct the proposed model first, which is used to predict the proportion of BPJS’s participants in Bungus Telung Kabung then. To do this analysis, we plot the trend between
We then reran several candidate models using the OLS approach. We hypothesized several combination models, including explanatory variables (which have a significant correlation with the response), i.e., age (percentage of individuals more than 50 years old,
Candidate models based on OLS method
The pooled trend between 
As shown in Table 2,
The proportion for Bungus Teluk Kabung could be predicted by substituting
We then employ the indirect estimation method or hierarchical Bayesian SAE for a binary response using the Binomial Logit-Normal link function. In this study, three hypothesis models are constructed to model the proportion of BPJS participants and several auxiliary variables. In SAE, the explanatory variables are known as auxiliary variables. Two auxiliary variables considered here are age (percentage of individuals more than 50 years old,
Model A: logit
The best model is chosen based on the smallest DIC value (Chan & Grant, 2016). Table 3 provides the estimated results for the regression coefficient (
Candidate models based on HB-SAE
Based on model C, the proportion of BPJS’s participants at each subdistrict based on HB-SAE method could be formulated as follows:
or
The proposed model in Eq. (14) then is used to estimate the mean of proportion, standard deviation of proportion, and 95% confidence interval for estimated mean of proportion for each district. The results of estimation are provided in Table 4.
Estimated proportion based on HB-SAE method
The subsequent analysis is the convergency test for each parameter estimated. The indicator to convergency test is based on trace plot, density plot and ACF (autocorrelation plot). Figure 2 provides the results of the convergency test for estimated parameters in the proposed model.
Figure 2 provides the trace plot in the right side, density plot in the middle and ACF plot in the left side for each parameter estimated. Based on the trace plot for all four parameters, it can be seen that the parameter estimation algorithm has converged because there is no specific pattern in the plot. From the density plot, it can be seen that the distribution of the sample has approached the normal distribution. This indicates that the convergence of the algorithm has been achieved. Meanwhile, from the autocorrelation plot, it can be seen that the autocorrelation values in the first lag approach one and then the autocorrelation values in the next lag continue to decrease to 0. This indicates that there is a weak correlation in the chain. This weak correlation suggests that the errors are uncorrelated and the algorithm is already in the target distribution area.
In this section, the results based on all estimation methods are compared to identify the best approach in estimating the proportion of BPJS’s participants in each subdistrict in Padang. The best one should have the lowest value of standard deviation. Table 5 provides the comparison result of estimated proportions based on all three methods for each subdistrict. The estimation is based on the direct method using Eq. (12), while Eq. (13) is used to construct the estimated proportion based on OLS model, and the estimated proportion based on HB-SAE is by using Eq. (14).
Estimated proportion and standard deviation based on direct method, OLS and HB-SAE
Estimated proportion and standard deviation based on direct method, OLS and HB-SAE
Trace plot, density plot, ACF plot for (a) 
Table 5 informs us that the standard deviation obtained from the SAE-HB method results in smaller values than other methods, indicated in bold. It could be concluded that the proportion of BPJS’s participants obtained based on SAE-HB using model C results in a better model than other models. Thus, the final model for the proportion of BPJS’s participants in Padang is formulated as:
After obtaining the best method with its proposed model to estimate the proportion of BPJS participants in each subdistrict in Padang. It has been informed that one subdistrict, that is Bungus Teluk Kabung has no complete information. The available data from this subdistrict is age (percentage of individual more than 50 years old,
This study identifies that the proportion of BPJS participants at Pauh and Bungus Teluk Kabung are less than 80%, i.e., the proportion at Pauh is 0.7432, and at Bungus Teluk Kabung is 0.7722. These results inform us that need to promote counseling on the importance of joining the BPJS program or health service quality should be improved in these subdistricts so that more individuals enroll in health insurance. This activity can be arranged based on the conditions of each sub-district then.
Conclusions
We have demonstrated the comparison result between classics method (direct and OLS method) and HB-SAE in estimating the proportion of BPJS participant in eleven districts in Padang, Indonesia. In this study, we assume that response is binomial distributed. We have proved that if we applied the direct and OLS method here, both methods resulted unsatisfactory estimated values. The direct method tends to result higher values of standard deviation and this method cannot result the prediction model. While OLS cannot estimate all values for proportion and its standard deviation.
This study found that the hierarchical Bayesian in small area estimation method yielded better estimation values than other methods. It is concluded that the small area estimation proposed in this study is suitable to be implemented in the case of binary data. The proposed model informed that percentage of individuals more than 50 years old and percentage of health complaints have a significant effect of improving the precision of mean response. Pauh and Bungus Teluk Kabung have lower proportion of BPJS participants than other districts, i.e., less than 80%.
Footnotes
Acknowledgments
This research was support by grant from DRPM, the Deputy for Strengthening Research and Development of the Ministry of Research and Technology/National Research and Innovation Agency of Indonesia, in accordance with Contract Number 163/SP2H/AMD/LT/DRPM/2020.
Conflict of interest
The authors declare no conflict of interest.
