Abstract
This study analyzes the effects of socio-economic factors on the real wage rates for male workers in India over the period 1983–2010. In particular, we examine the role of human capital by estimating the Mincerian wage equation. We construct a regional level pseudo-panel data set for our analysis. Our findings show that while the return to primary education is remarkably high, the returns to other, higher, levels of education are equally remarkably low for all of India taken together, becoming progressively so as the level of education increases. These findings are in contradistinction to those of the other studies on returns to education in India, all of which, however, have relied on cross-sectional data for their analyses. We also find relatively little effects of caste, tribe and religion on real wage rates in India, suggesting that these factors may not be as important as is sometimes believed.
Introduction
The purpose of this article is to study the effects of socio-economic factors on real wage rates for male workers in India over the period 1983–2010. Caste, tribe and religion are thought to play important roles in Indian society. It is, therefore, of some importance to examine the role of these factors in real wage rate determination in India. Do they indeed outweigh the impact of education, which, among other things, presumably enhances worker skills and productivity? And if education is important, what precise level of education is critical or most important? Previous studies on returns to education in India have often reached contradictory conclusions in this regard. Dutta (2006), for example, found a U-shaped pattern of return for the regular wage workers, with relatively lower returns for the primary level of education when compared to the secondary and graduate levels but higher than those at the middle level of education. Aggarwal (2012), by contrast, found that returns to education in his sample increased with increases in the level of education (see also, among others, Azam, 2012; Chamarbagwala, 2006; Duraiswamy, 2002; Mehta and Hasan, 2012; Vatta et al., 2016). In this context, one may also note that Dreze and Sen (2002), in particular, have emphasized the vital importance of primary education for economic and human development.
Regarding the influence of caste and tribe status, Kijima (2006) found that the scheduled caste (SC) households had lower returns to education compared to the non-SC households. Madheswaran and Attewel (2007) too reached very similar conclusions. Ito (2009) found job discrimination against the members of the lower caste but not wage discrimination.
Most of the previous studies on returns to education in India have used only the cross-sectional information for the reference years chosen. However, the returns to education, as is well known, are likely to be closely correlated with unobservable factors such as individual abilities and/or motivations. There is an endogeneity problem here (Warunsiri and McNown, 2010), and to deal with it effectively, one needs, at the very least, a panel data set. For our analysis, we construct a pseudo-panel dataset based on the unit-level data available from India’s National Sample Survey Organisation (NSSO). The data set and the methodology we use in our analysis are explained in due course.
The plan of the rest of the article is as follows. The second section outlines the model we estimate. The third section describes the data and the variables used. The fourth section presents the results. The fifth section concludes.
The model
For simplicity a la Heckman et al. (2006), we assume that (i) the wage income (w) is determined by schooling years (s), (ii) an individual worker lives forever and (iii) there is no educational cost during the schooling years. The worker is interested in maximizing the sum of the discounted stream of wage income:
where γ is the interest rate. The first order condition of this maximization problem is given by
We obtain the following equation by integrating this condition with respect to schooling year (s).
where c is a constant of integration. Thus, the semi-log type equation is naturally generated by the dynamic optimization of a worker.
As is well known, the basic Mincerian wage equation is given by:
where ln wi is the natural logarithm of wage for a given worker, Dki is the dummy variable for kth level of education and
where βk is the coefficient of kth level of education and βk–1 is the coefficient of previous level of education and nk is the number of years of schooling for the kth level and nk−1 is the number of years of schooling for the previous level. While different education level dummies are used as explanatory variables, other variables such as work experience, caste and tribe status and religious groupings can also be included as control variables.
One of the limitations of the above model, of course, is that of the possible correlation between unobservable factors and education which will lead to biased estimates. To address the endogeneity issue and to identify the determinants of wage rates which are specific to a particular geographical unit over time, we apply the pseudo-panel data approach in which we aggregate the unit-level household data provided by the NSSO by regions that remain common across cross-sectional data sets in different years.
Once we take an account of the regional wage, the Equation (1) will become the model developed by Deaton (1985). We apply the pseudo-panel for the unit r based on the regional classifications. The unit is denoted as r in the Equation (3) below.
where r denotes regional unit and t stands for survey years for six rounds of NSS, 1983, 1987–88, 1993–94, 1999–2000, 2004–05 and 2009–10. The upper bar means that the average of each variable is taken for each unit, k, for each round, t,
The following Equation (4) can be estimated by the standard panel model, such as fixed effects or random effects model.
The issue is whether the Equation (3) is a good approximation of the underlying household panel models for household in the Equation (4). It is not straightforward to check this as we do not have ‘real’ panel data. However, as shown by Verbeek and Nijman (1992) and Verbeek (1996), if the number of observations in cohort r tends to infinity,
As mentioned in the introduction, we have constructed a pseudo-panel data set to address the endogeneity problem that arises from the returns to education being correlated with unobservable factors such as individual abilities and/or motivation. However, there is the further problem in our case in that the high-quality graduates may find it easier to move to high-waged regions and we need to address this endogeneity issue too. For this we do by taking lagged human capital as an explanatory variable. The model with the lagged human capital, therefore, is the model we consider to be more robust.
The study uses the data on employment and unemployment in India from six rounds of the NSSO conducted during 1983 (38th round), 1987–88 (43rd round), 1993–94 (50th round), 1999–2000 (55th round), 2004–05 (61st round) and 2009–10 (66th round), respectively. Each round collected information about 120,000 households and more than half a million individuals, selected from rural and urban areas. The national level estimates of the labour and work force participation, industrial distribution of the workers and status of their employment and wages were prepared on the basis of the data collected from these surveys. The sample selection used two-stage stratified random sampling procedure where the first stage of sampling is the census villages and urban blocks and the second stage is the household in these villages and blocks. Apart from the information on employment and unemployment, the surveys also recorded information about household size, age and education of the household members, social group of the household, religion and land owned.
For our analysis, we have constructed a regional level pseudo-panel data set based on the unit-level data of the NSSO. The NSSO defines a region as a ‘grouping of contiguous districts having similar geographical features, rural population densities and crop-pattern. Generally, the regions were not found to be cutting across districts boundaries in any state except Gujarat’ (NSSO, 2010, par. 2.1.4.). In order to make consistent time series over 1983–2010, we reclassified the NSS regions. The total number of NSS regions in the study is 65 as shown in Figure 1. We constructed regional units by aggregating individual variables into NSS regional level variables with rural and urban areas separately. Thus, the total available number of regions with rural and urban areas is 130. The approach adopted here has the merit of taking account of geographical diversity of India as well as of the improvements in the level of education over time.

In India, the constituent states can be divided into two categories, namely, the main states and special category states. The special category states consist of smaller states and union territories, and they generally receive more financial transfers (in relation to their revenues) from the central government. We present results for all states combined as well as for the main states separately.
In calculating the returns to education by estimating Mincerian wage equations, the times taken to complete the primary, middle, secondary and graduate levels of education are assumed to be 5, 8, 12 and 15 years, respectively. Accordingly, the time interval for each educational level dummy category is taken as 5, 3, 4 and 3 years, respectively. As the state-wise consumer price indices are not easily available in India, real wage (w) is computed by using the implicit deflator of the net state domestic product (NSDP). Dkrt is the kth education level in region r at time t (with below primary being the reference). The X vector consists of SCs, scheduled tribes (STs), Muslims, work experience in years, industry share in state s at time t (with agricultural sector being the reference). Descriptions, means and standard deviations of the main variables used in our analysis are presented in Table 1.
Descriptive statistics
We present two sets of results: one without and the other with the lagged human capital. For reasons already stated, we regard the model with the lagged human capital as the preferred one. However, for purposes of comparison, we also present the results of the model without the lagged human capital (which we call the basic model). 1
The basic model (Table 2) shows the returns to all levels of education to be positive and statistically significant across all specifications. However, the results with the lagged human capital (Table 3) show that while the returns to primary and secondary levels of education are positive and significant, the returns to other, higher, levels of education are statically insignificant across all regressions. Figure 2 presents estimated returns by levels of education from the basic model for both the main and all states combined. Figure 3 does the same for the model with lagged human capital.
Both Figures 2 and 3 clearly bring out the very high return to primary school education. This is in sharp contrast to most other studies which find the return to primary level of education to be very low. 2 In our case, the return to primary education is seen to be as high as 30 per cent. The return to graduate-level education, by contrast, is seen to be insignificant in the model with the lagged human capital (the robust model). These results would seem to suggest that the motivation hypothesis may have some strength in that people with high motivation may succeed without necessarily having to acquire higher education. 3 There is, of course, the other (in some ways complementary) hypothesis that the quality of most higher levels of education in India may be so low that the value added of these levels of education is not particularly significant.
Of the social and religion group variables, the Muslim variable is statistically insignificant across all specifications in both the basic model and regressions with the lagged human capital. The ST variable is statistically insignificant across all specifications in regressions with lagged human capital. The ST variable is also statistically insignificant for all states in the basic regression (though it is positive and statistically significant in three of the other four specifications in the basic regression). The SC variable is statistically insignificant across all specifications in the basic regression model. It is also statistically insignificant for all states in the regressions with lagged human capital. Taken together, these results would seem to suggest that the caste, tribe and religion based factors may not be as important in real wage rate determination in India as is sometimes believed.
Basic regression results
Basic regression results
Cluster–robust t statistics in parentheses(cluster = state).
*significant at 10%: **significant at 5% ***significant at 1%.
As expected, work experience in years has positive and statistically significant effect across all regressions.
Regression results with lagged human capital
Cluster–robust t statistics in parentheses (cluster = state).
*significant at 10%: **significant at 5% ***significant at 1%.
Our study has underlined the importance of primary education. This adds one more reason for making the provision of universal primary education a top priority goal for the policy makers. We have, however, found the returns to higher levels of education to be remarkably low, in some cases almost insignificant. Clearly, this needs an explanation. Could it be that the jobs that are being created mostly do not require the skills that are thought to be enhanced by higher levels of education? Or, equally, could it be that the quality of education itself at these higher levels leave much to be desired? Both are plausible, and we plan to explore these issues in future research. So far as the effects of caste and religion are concerned, we found these to be statistically insignificant in most cases, suggesting that these factors may not be as important in wage determination as is sometimes believed.
Acknowledgements
We are grateful for the financial support of JSPS ‘Topic-Setting Program to Advance Cutting-edge Humanities and Social Sciences Research: Global Initiatives’ and MEXT/JSPS KAKENHI Grant Number 17K03658. This article was written when the first author was visiting RIEB.


