Abstract
The formulation of evidence-based policy necessitates rigorous, objective evaluation of policy initiatives and, consequently, there has been a significant growth in evaluation of social policy over the last ten years. Alongside this, there is a recognition that the application of new policy initiatives needs to be flexible in order to be relevant to local populations. As a result, pilots and pathfinders are encouraged to undertake local evaluations in addition to national evaluations commissioned by central government. These dual evaluations are seen as a vehicle to provide evidence on effectiveness whilst accommodating heterogeneity of needs and provision. We suggest that without clear delineation of roles, dual evaluations are inefficient, likely to put additional pressure on busy practitioners (and the recipients of new services) to comply with varying data demands, and present policy makers with confusing messages. In this article we focus on the potential for local and national evaluations to reach different conclusions by demonstrating how a simplistic application of quantitative techniques at local level can lead to inappropriate conclusions which contradict national findings. We make a number of recommendations that might facilitate better coordination of local and national evaluations.
The last decade under New Labour saw a recommitment to funding the welfare state and public services from a perspective of ‘what matters is what works’ (Powell, 1999). In practice, this led to an increasing emphasis on evidenced-based policy with performance management, quality frameworks and evaluation as core components (Wilkinson, 2005). However, the appropriate methodology for policy evaluation is contested. The traditional approach, based on a scientific rational model, seeks to quantify an improvement in outcomes that can be directly or indirectly attributed to the policy, sometimes termed the additionality (HM Treasury, 1988). Inference relies on the construction of a credible and appropriate counter-factual (Holland, 1986) and this favours the selection of experimental designs for policy evaluation. Alternative approaches such as realist evaluation (Pawson and Tilley, 1997) stress the importance of context and seek to understand how this interacts with the mechanism of action of the policy to achieve the desired outcome. From this perspective the restrictive intervention protocols imposed by an experimental evaluation design limit their external validity. Mixed methods approaches have been promulgated by Chen and others (Chen, 2006; Brannen, 2005) as a pragmatic attempt to combine the advantages of both approaches.
Alongside an emphasis on evaluation, New Labour moved to devolve decision making in health and social care to local levels (Health and Social Care Act 2001, Children Act 2004). The resulting demand for policy evaluation at both local and national levels spurred the development of the dual-level evaluation framework (Bauld and Judge, 2002). Ideally the national evaluation focuses on the overall implementation and success of the programme, and local evaluation focuses on practice and roll-out issues. In reality dual-level evaluations can create problems at local levels (Allen and Black, 2006; Biott and Cook, 2000). Potential problems can also arise for the national evaluation when local evaluators focus on impact evaluation and select inappropriate evaluation methodologies. When the national evaluation suggests that conclusions about the effectiveness of an initiative should be more cautious, this can create dissatisfaction at the local level. In this article we highlight just such a conflict in the evaluation of a new initiative designed to improve outcomes for children and young people. We present the quantitative evaluation we undertook as the national evaluators and explore the reasons why the local evaluation came to different conclusions.
The budget-holding lead professional pilots
In 2006, 16 pilots were established in a variety of locations in England to allow lead professionals (LPs) working with children and young people with additional needs to hold budgets and commission services tailored to each child’s needs. Pilots received start-up funding from the Department for Children, Schools and Families (now the Department for Education) to pump-prime the pooling of corebudgets, develop appropriate infrastructures, train practitioners to be budget-holding lead professionals (BHLPs) and provide them with additional administrative support to access more responsive services. Budgets and interventions were to be tailored to each child’s needs. The BHLPs were expected to co-ordinate multi-service provision for each child or young person with additional needs, having conducted a thorough needs assessment via the common assessment framework and brought relevant professionals together in a team around the child. For LPs to be able to take on a radically new role as budget holders, joint commissioning and budget-pooling arrangements needed to be in place. The overarching aim was to empower the LPs and young people and their families to have greater control over the allocation of a budget so as to improve the outcomes specified in the Every Child Matters agenda (DfES, 2004). Moreover, the new BHLPs were expected to promote the development and delivery of targeted support services in the context of the wider reform of children's services and the Respect agenda, which sought to tackle anti-social behaviour by facilitating closer working between communities and government agencies (Blair, 2006). As LPs, practitioners were expected to act as a single point of contact for children and young people, enabling them to make choices andreceive appropriate multi-agency interventions, hence reducing overlap and inconsistency in service provision (CWDC, 2007).
In autumn 2006, a team from Newcastle University was commissioned to evaluate the cost effectiveness of devolving budgets to LPs (Walker et al., 2009). The pilots were encouraged by the Department to be innovative and they were not required to adopt any particular model of practice. As a result, pilots developed diverse approaches to BHLP practice and included a wide range of children and young people who were deemed to have additional needs. The pilots were expected to allocate a budget to designated BHLPs who would then use that, in consultation with each young person and their family, to prioritise the services needed and to purchase them directly from providers in the statutory, voluntary and private sectors. In fact, most pilots did not give BHLPs budgets but gave them access to a fund on which they could draw. In all the pilots BHLP practice had to be integrated into existing complex and changing structures and ways of working and many pilots targeted young people living in areas of high social and economic deprivation, with a clear focus on alleviating poverty. The children and young people targeted ranged in age from birth to nineteen years and they were living in families with very diverse needs. Few pilots defined the desired outcomes from BHLP practice in any detailed way, referring in broad terms to the Every Child Matters framework. One of the pilots, however, adopted a rather different approach and laid the groundwork for a more robust evaluation of outcomes. We use this pilot as a case study in this article.
Hertfordshire BHLP pilot
While the majority of pilots were established in local authority children’s services, the Hertfordshire pilot was located in the Connexions service and targeted primarily at young people aged 16 to 18 who were not in education, employment or training (NEET). Connexions personal advisers worked with all young people aged 13-18 years and provided one-to-one support for young people with additional needs, generally those currently NEET, or considered at risk of becoming NEET. Connexions personal advisers in Hertfordshire were designated as BHLPs and were able to access a fund to purchase goods or services to enhance current provision for their clients. This fund was created primarily from the pump-prime money from the Department, but it was supplemented with additional local funding. The advisers assessed whether a young person was eligible for BHLP funding. To grade eligibility for BHLP funding, a Needs Assessment Matrix was developed to quantify the extent of the young person's needs and determine access according to a banding system. The fund was used to purchase a variety of interventions aimed at getting young people into training, employment and further education. A number of vocational, counselling and self-esteem courses were purchased and transport was sometimes provided as well as support for young people living independently.
The BHLP pilot in Hertfordshire started in October 2006 and by March 2008over 700 young people aged 16 to 18 years had received interventions purchased from the BHLP budget. The pilot was initially rolled out in four of the ten districts in Hertfordshire but was soon made available throughout the county. Hertfordshire viewed the pilot as a success and claimed that it had been instrumental in reducing the proportion of young people who were NEET. Consequently, the Department was keen for the national evaluators to examine the impact of the BHLP pilot in Hertfordshire as part of the overall evaluation of the effectiveness of BHLP practice across the 16 pilots.
Evaluation of BHLP in Hertfordshire
The NEET status of all eligible 16 to 18-year-olds is collected and reported by each local authority in England on a monthly basis, providing an ideal source of data to examine the impact of BHLPs on overall NEET proportions. This data constituted a time series and we therefore analysed it as an interrupted time series in order to examine the impact of the introduction of BHLP in Hertfordshire (McDowall et al., 1980). In essence, this involves analysis of the pattern of measurements of a variable over many time points to detect if there has been a significant change at a particular point in time, for example, when BHLP practice started. The inclusion of data over the same time period from comparable local authorities who did not implement BHLP practice greatly strengthened the analysis.
Data sources LA-level data
The Connexions Client Information System collects data on the NEET status of each young person from September of the year in which they complete compulsory schooling to the month in which they turn nineteen. These data are compiled and reported for the whole of England, subdivided into 149shire/unitary authorities which vary in size from the large shire authorities such as Kent to several smaller unitary authorities representing towns andmetropolitan areas. These shire/unitary authorities will henceforth be referred to as the local authorities (LAs). We obtained access to the monthly NEET data for all LAs in England from the Planning and Performance Manager, Hertfordshire Connexions. Each LA is required to report the total population of school leavers up to their nineteenth birthday, the number who are NEET and the number whose status is unknown. The reported NEET figure includes an adjustment which ascribes a fraction of ‘unknowns' to NEET young people according to their last known NEET status (for young people whose last known status was EET, 92% are assumed to still be EET and that for those whose last known status was NEET, 58% are assumed to still be NEET).
Data on the total number of 16- to 19-year-olds were missing and therefore imputed from data from the nearest month with non-missing data for 32 (0.6%) of the monthly LA records. Data on the number of NEET young people were missing for 297 (5.6%) of the monthly LA records. For 225 of the missing entries, %NEET figures were available from quarterly summary reports which allowed estimation of the number of young people NEET. Theremaining 72 (1.4%) missing entries were distributed over 27 LAs. Extreme data for Herefordshire for May 2005 were assumed to be an error and the reported number of young people who were NEET was therefore changed to ‘missing’. Our primary analysis assumed that BHLP working started in Hertfordshire in November 2006, when each of the ten districts had assessed at least one young person who subsequently received an intervention, but we also conducted sensitivity analyses assuming both earlier and later start dates.
District-level data
Although analysis of LA-level data had the advantage that it allowed direct comparison with other LAs in England, it had the disadvantage that it did not allow for variations within Hertfordshire, either in the start date of BHLP practice or in subsequent outcomes. Aggregation of data to smaller areas gives more information and is more likely to detect any effect of BHLP practice. We were fortunate that data on NEET status were also available for each of the ten districts within Hertfordshire, although comparable data were not available for all of England.
These district-level data comprised monthly returns from June 2005 to March 2008 on the total number of 16- to 19-year-olds; the raw, unadjusted number who were NEET; the number whose NEET status was unknown; and the number who had been assessed for and subsequently received an intervention purchased from the BHLP budget. Hertfordshire reported that they had used BHLP funds to support 762 16- to 19-year-olds and 120 12- to 15-year-olds between the start of the BHLP pilot and the end of March 2008 (Simmons, 2008). We received data on 618 young people who had received an intervention from the BHLP fund (just over 70% of the total reported beneficiaries). The data included the district in which the young person had accessed the intervention and the assessment date, but not the date the intervention was actually delivered; hence we did not know when interventions started in each district.
Since data for the ten Hertfordshire districts came from a different source than the data for LAs and were not adjusted for the number of young people whose NEET status was unknown, we first investigated the effects of different assumptions about the missing data: that those with unknown NEET status were (i) EET, or (ii) NEET, or (iii) had NEET status representative of those whose NEET status was known. The first assumption (that all with unknown NEET status were EET) yielded %NEET that was very close to the %NEET calculated from the adjusted data reported for each month for Hertfordshire LA, the only substantial difference being in September 2007 when this assumption under-estimated the %NEET compared to the LA-level data, whereas the other assumptions resulted in greater disparities. We therefore assumed that young people in Hertfordshire districts who had unknown NEET status were EET.
Statistical methods
LA-level analysis
We used linear regression to model the variation over time in %NEET in all LAs excluding Hertfordshire. 1 The data were panel data with %NEET as the dependent variable and LAs as units observed at multiple time periods. To ensure that LAs with larger populations of 16- to 19-year-olds had more influence on the model, each LA was weighted by the number of young people in the LA, averaged over all months of the study. The model allowed for a secular trend over time, seasonal variation and correlation between %NEET in the same LA in successive months (i.e. auto-correlation). It was stratified by LA, using dummy variables to represent each LA.
Analysis was first performed including all LAs 1 . The LAs were then grouped into quartiles on the basis of their %NEET in the first three months of the study (baseline) and the regressions repeated within each quartile. As the trend of %NEET over time was less marked for LAs within the lowest quartile of %NEET at baseline – as Hertfordshire was – we restricted further analysis to LAs in this lowest quartile.
We then compared the %NEET in Hertfordshire LA before and after initiation of BHLP practice, adjusting for the pattern of trend over time, seasonal variation and auto-correlation found in analysis of the other LAs in the lowest quartile. We did this by adding to the regression model a dummy variable to indicate the start of BHLP in Hertfordshire.
Two sensitivity analyses were performed: the first assuming different start dates for BHLP practice, and the second excluding from the comparator group the five LAs in the lowest quartile of %NEET that were BHLP pilot areas, even though most of the BHLPs in those areas had worked predominantly with under-sixteens.
District-level analysis
First we performed an unadjusted regression analysis, relating the average %NEET in Hertfordshire districts to BHLP activity in the previous month and the levels in each of the two months before that. In the absence of data on the commencement of BHLP interventions or their cost we quantified BHLP activity as the number of young people assessed in that month who went on to receive an intervention from the BHLP budget. We treated the data as panel data, using similar methods to those used for the LA-level analysis, except that the units were now Hertfordshire districts.
We then performed an adjusted analysis, modelling %NEET in Hertfordshire districts in relation to BHLP activity while allowing the national pattern of %NEET as shown in the other LAs in the lowest quartile of %NEET. We could not do this in a single regression model, because the LAs had no data about BHLP activity. Instead, we generated the %NEET expected in Hertfordshire districts in each month on the basis of the national pattern (trend over time, seasonal variation and auto-correlation) in the lowest quartile, using the model described above for the LA-level analysis; then we calculated residuals by subtracting observed from expected %NEET and analysed these residuals using linear regression. We first compared these residuals before and after initiation of BHLP practice which, as before, was modelled using a dummy variable. Thenwe included in turn, as possible explanatory variables, the levels of BHLP activity in each district in the previous month, and the levels in each of the two months before that.
Two sensitivity analyses were performed: the first excluding data from September and October, when NEET status was unknown for a large proportion of young people in Hertfordshire, and the second excluding from the comparator group the five LAs in the lowest quartile of %NEET that were BHLP pilot areas.
Stata 10 was used for all statistical analyses (Stata, 2007).
Results
The pattern of data at LA-level
The number of young people aged 16 to 19 recorded by Connexions agencies peaked in September every year as new school leavers became eligible for inclusion, and decreased steadily from September until August of the following year as young people who reached their 19th birthday were excluded. The total number of young people recorded each September in all LAs in England was almost 1.8million in all three years (2005, 2006 and 2007) of the study. Between April 2006 and March 2008, Hertfordshire had the fifth largest population of 16- to 19-year-olds (over 39,000) among these LAs.
The number of 16- to 19-year-olds who were NEET increased from August to September every year as new school leavers were registered, and decreased from September to November as they found college training placements or started work. Every November, when the number recorded as NEET had stabilised, Hertfordshire recorded between 1,400 and 1,700 16- to 19-year-olds as NEET, the 10th largest number among English LAs. The percentage of 16- to 19-year-olds who were NEET (%NEET) reflected the seasonal variation in both the total number of 16- to 19-year-olds recorded by Connexions and the number who were recorded as NEET. Therefore %NEET showed monthly variation, with gradual increases from December to August and more marked decreases from August to November in most LAs; overall, %NEET tended to decrease over time (see Figure 1). Hertfordshire had a low %NEET: it ranked 133 out of the 148 LAs. The LAs that started off with a high %NEET tended to have larger decreases in %NEET than other LAs. When LAs were grouped into quartiles on the basis of their average %NEET in the first three months of the study, the downwards trend over time of %NEET in Hertfordshire was very close to that for the average of all LAs in the lowest quartile.
%NEET by month for each local authority in England.
The percentage of 16- to 19-year-olds whose NEET status was unknown showed a very marked peak in September every year, but fell to around 5 per cent by November. Hertfordshire had an exceptionally high percentage (40% to 50%) with unknown NEET status in September, ranking 9th among all LAs in England. However, taking the average over all months, the percentage of 16- to 19-year-olds with unknown NEET status in Hertfordshire was low, ranking 72nd in England. Furthermore, reporting of NEET status appeared to have improved from the end of 2007 onwards: during 2005, 2006 and the first half of 2007, Hertfordshire was usually in the second lowest quartile with respect to the percentage with unknown NEET status, whereas from November 2007 onwards it was in the lowest quartile.
LA-level analysis
Trend over time in %NEET in local authorities (LAs)
Indicates change in %NEET in one month; negative values of b indicate a reduction in %NEET.
Regression analysis by quartiles of baseline %NEET confirmed the auto-correlation and showed that LAs that had a lower %NEET initially had less marked decreases over time in %NEET. We therefore restricted further analysis to the 37 LAs that were in the lowest quartile of %NEET, as Hertfordshire was. The total numbers of young people registered each September in LAs in this quartile were 567,128 in 2005; 580,637 in 2006; and 577,581 in 2007. In Hertfordshire LA, the average %NEET was significantly (p = .006) lower after initiation of BHLP practice in November 2006: it was 5.3 per cent before and 4.8 per cent after. However, after applying the pattern of trend over time, seasonality and auto-correlation found in LAs in the lowest quartile of %NEET to Hertfordshire LA, the start of BHLP practice was associated with a much smaller reduction in %NEET (0.09, 95%CI: -0.18 to 0.36), which was not statistically significant (p= .49). Thus the reduction in %NEET in Hertfordshire following the introduction of BHLP practice could be totally explained by the trend of decreasing %NEET in all LAs in this quartile.
Sensitivity analysis, assuming different start dates for BHLP practice, confirmed this finding. After allowing for the national trends in %NEET, as shown in the other 36 LAs in the lowest quartile, we found no significant difference between the %NEET in Hertfordshire before and after the start of BHLP for any possible starting month between June 2006 to March 2008.
Further sensitivity analysis, excluding from the comparator group the five LAs that were BHLP pilot areas, also yielded a similar result to the primary analysis: a reduction in %NEET after the introduction of BHLP of 0.08 (95%CI: −0.20 to0.36).
District-level analysis
The monthly variation in %NEET in Hertfordshire districts was similar to that in LAs in the lowest quartile of %NEET (see Figure 2). Figure 3 shows the difference between the %NEET actually recorded in each district in Hertfordshire and the %NEET predicted from national trends (allowing for secular trend, auto-correlation, seasonality) in other LAs in the lowest quartile of %NEET and the level of BHLP activity by month. If %NEET in Hertfordshire were unaffected by any local factors, we would expect the difference between the predicted and actual %NEET to fluctuate at random around zero. If BHLPs were effective in reducing %NEET, we would expect the difference between the predicted and actual %NEET to fall some time after BHLP practice began – and to fall by more than the random fluctuations. If such a fall occurred we would expect it to be more marked if there was a higher level of BHLP activity in the district.
%NEET by month for local authorities with low %NEET and for each district in Hertfordshire. BHLP activity and difference between actual and predicted %NEET in Hertforshire districts. Note: BHLP = budget-holding lead professionals. Number of job vacancies by quarter for local authorities with low %NEET.


Association between %NEET and BHLP activity in Hertfordshire districts
aAdjusted for national trend in %NEET in LAs that were in the lowest quartile of %NEET in the first three months of the study.b indicates change in %NEET before/after introduction of BHLP working in November 2006; negative values of b indicate a reduction in %NEET after introduction.c indicates change in %NEET with change of one unit in level of BHLP activity; negative values of b indicate a reduction in %NEET with increasing BHLP.
Sensitivity analyses, excluding September and October when Hertfordshire had a high proportion of young people with unknown NEET status, yielded results similar to the primary adjusted analysis, except that the association between %NEET in Hertfordshire districts and BHLP activity in the previous month was statistically significant (p = .01), although the estimate of the magnitude of the effect was little changed (see Table 2). This corresponds to a fall in %NEET of one percentage point if one additional young person in every 2.5 receives an intervention co-ordinated by a BHLP (with 95% confidence intervals from one young person in four to all young people). This association was largely due to the relationship between a higher level of BHLP activity and lower %NEET in East Hertfordshire, Hertsmere and Welwyn Hatfield. Nevertheless, the overall association explained less than 2 per cent of the variation between the monthly records of %NEET in the ten districts.
Further sensitivity analysis, excluding the five LAs that were BHLP pilot areas, yielded similar results to the primary analysis, with very little difference in the %NEET before and after introduction of BHLP and no statistically significant association between %NEET and BHLP activity in previous months (see Table 2).
Discussion
Throughout England, the percentage of young people aged 16 to 19 years who were not in education, employment or training (NEET) varied seasonally and showed a secular trend, decreasing between April 2005 and April 2008. After allowing for this seasonal variation and the secular trend, we found only weakand inconclusive evidence of any association between BHLP practice and %NEET in Hertfordshire. We compared the variation in %NEET in Hertfordshire with the variation in 36 other LAs that had similar levels of %NEET at the start of the study period: although %NEET in Hertfordshire decreased after the introduction of BHLP, all of this decrease could be accounted for by national trends.
When we related %NEET in the each of the ten districts in Hertfordshire to a crude measure of the level of BHLP activity in the districts, we likewise found that the reductions in %NEET could be explained by national trends. However, sensitivity analysis, excluding September and October, when most LAs had a high percentage of young people with unknown NEET status, yielded a statistically significant association between a higher level of BHLP activity and lower %NEET in the following month. It is possible that BHLP activity had a small, temporary impact on %NEET in the following month, but this is unlikely to have been sustained as neither the primary analysis nor the sensitivity analysis showed any significant difference in %NEET before and after the start of BHLP working. At most, BHLP activity explained only 2 per cent of the variation in NEET rates in Hertfordshire’s districts.
Comparison of national and local evaluations
Our findings are in sharp contrast to those of the local evaluation of the BHLP pilot, which concluded that the BHLP pilot had been very effective in reducing %NEET (Connexions Hertfordshire, 2008):
The BHLP initiative in Hertfordshire has had a strong impact on young people, dramatically reducing the number of young people not in education, employment and/or training.
The local evaluation had three parts. First, it followed 60 randomly chosen young people who were allocated to a BHLP between January and October 2007. Of these, 22 were of compulsory school age at assessment and 36 of the remaining 38 were NEET. The status of this group in December 2007 wasreported as comprising 46 young people who were EET, 9 who were NEET and 5 of unknown status. Second, the evaluation also reported on the status of a cohort of 51 young people who had been the subject of an earlier evaluation by Hertfordshire (Connexions Hertfordshire, 2007). This cohort hadall been NEET and received support from BHLPs between September and December 2006. The previous evaluation found that by March 2007, 83per cent of these young people were EET. In March 2008 49 per cent of them were EET, 31 per cent of them were NEET and the status of 20 per cent was unknown. Third, the local evaluation evaluated the %NEET in Hertfordshire, making a series of comparisons for dates before and after the introduction of the BHLP pilot. In each of the comparisons, %NEET had fallen after the introduction of the pilot. The data were used to conclude that BHLPs had had a highly beneficial impact on NEET proportions in Hertfordshire.
It is instructive to set the local evaluation's assessment of trends within Hertfordshire in the context of the national picture at those time points. The local evaluators made a number of statements that do not stand up to scrutiny (Connexions Hertfordshire, 2008):
When one looks at the average NEET rate between November and January the fall is even more dramatic. The NEET rate for 2006-07 was 4.6%, but this had fallen over 12 months by 13% to 4.0%.
However, in fact the average NEET rate for the three-month period November to January fell between 2006-07 and 2007-08 by over 13 per cent – i.e. by more than it did Hertfordshire – in 64 out of the 148 LAs (43%); only 5 of these 64 LAs piloted BHLP practice.
The local evaluation also stated (2008),
Subsequent to this the data for February show that the NEET rate has fallen from 5.3% in 2007 to 4.3% in 2008.
In fact, the NEET rate fell between February 2007 and February 2008 by over 19 per cent – i.e. by more than it did Hertfordshire – in 32 out of the 148 LAs (22%); only one of these 32 LAs had piloted BHLP practice.
The local evaluation went on to state (2008),
Between February 2007 and February 2008, there was a reduction in the NEET rate in all District Council areas.
However, the NEET rate fell between February 2007 and February 2008 in 129/148 (87%) of LAs.
Both the local and national evaluations examined the same data on %NEET inHertfordshire. The local evaluation, however, failed to compare them withnational data and failed to recognise that the decreasing %NEET in Hertfordshire reflected national trends. The %NEET did indeed fall in Hertfordshire after introduction of the BHLP pilot, but there is no evidence that it fell because of the BHLP activity. The conclusion of the local evaluation– that the fall in the NEET rate in Hertfordshire was a consequence of BHLP activity – is based on the logical fallacy post hoc ergo propter hoc.
Strengths and weaknesses of the local and national evaluations
The major strength of the national evaluation is the comparison of Hertfordshire with 36 other LAs that were similar in terms of %NEET. Only 5 of these 36 comparison LAs were BHLP pilot areas and most of the BHLPs in those areas had worked predominantly with under-sixteens; we estimate that there were 40over-sixteens in total who were allocated to a BHLP in these 5 LAs compared to over 700 in Hertfordshire. Furthermore, exclusion of these 5 LAs from the comparator group made little difference to the results. The local evaluation did not consider %NEET in other LAs and was unable to account for the national trend in %NEET when evaluating the same local %NEET data. The inclusion of this comparison data is critical to any inferences made from the local data as it fully explains the downward trend observed.
It is of fundamental importance to any quantitative evaluation of an intervention that those receiving it should be compared with a similar group that did not receive the intervention. The local evaluation presented evidence on the outcomes for a cohort of NEET young people indicating that most were EET at review. No comparison with young people who had not had a BHLP was presented. The presentation of data on the intervention cohort in isolation gives the impression that positive changes in status following the intervention can be ascribed to the intervention. It is entirely possible that some or all of the cohort would have become EET without additional support from a BHLP, hence we are unable to infer anything about the impact of the BHLP pilot from these data in isolation. It is surprising that this evidence was considered of sufficient quality to merit inclusion in a Demos report on the impact of individual budgets (Leadbetter et al., 2008: 39). This suggests that misunderstanding of the interpretation of evidence of the impact of interventions is widespread.
The national evaluation was limited in its selection of outcome measures to %NEET by the availability of good-quality data. Accurate data are of little value unless they provide a measurement of the fundamental benefits of a programme or intervention. In this case, NEET status is directly relevant to at least one of the five Every Child Matters outcomes; long periods of inactivity after leaving school have a significantly detrimental effect on lifetime earnings (Godfrey et al., 2002). Nevertheless, the local evaluation was able to collect other relevant outcome measures such as educational attainment. Unfortunately, the small size of the cohort followed and the lack of a suitable comparison limited any inference from the data collected.
The national evaluation was subject to limitations from the availability of intervention data. The actual date of receipt of interventions coordinated by a BHLP and the cost of the package of interventions were not recorded on an individual basis. Estimations of the total expenditure on BHLP interventions by district in Hertfordshire were unavailable. Hence, in the district-level analysis we had to use the number of assessments by district in each month as a measure of BHLP activity. This measure may not have been wholly adequate and we may not have ascribed it to the most appropriate time. Our assumption that the start date for BHLP in Hertfordshire was November 2006 may not have been the most appropriate; hence we conducted a sensitivity analysis, varying the month that BHLP practice started. We found no significant difference between the %NEET in Hertfordshire before and after the start of BHLP for any possible starting month between June 2006 and March 2008.
The quasi-experimental design used in the national evaluation does not eliminate the possibility of bias arising from confounding variables. It is possible that Hertfordshire differed in some aspect from the comparison group of LAs leading to a different %NEET trajectory in response to changing economic factors. However, the very close correlation between %NEET in Hertfordshire and the comparison LAs over the 18 months prior to the introduction of BHLP would suggest that any confounding effects are small. A further potential weakness of this design is the possibility of historical bias. That is the attribution of an impact to an intervention that was actually caused by an external events occurring at the same time. In this case we should consider the possibility that local events occurring around the time of the introduction of BHLP had a greater impact on NEET levels in Hertfordshire than the comparison group. Whilst our comparison group captured the impact of political and economic factors at a national level, it is possible that an economic downturn was localised to Hertfordshire around the time of the BHLP pilot. To check for this we examined the number of job vacancies in Hertfordshire and the 36 other LAs in the comparison group and found no evidence of either a relative or an absolute economic decline in Hertfordshire around the time of implementation of BHLP (Figure 4).
Finally, the implementation of the BHLP initiative diverged significantly from the policy intent. Pilots allow experimentation and offer a very important opportunity to identify the elements which do and do not appear to work well. The BHLP pilots were implemented variously across England and relatively few grasped the policy intent and implemented a radically new approach to supporting children and young people with additional needs. The Hertfordshire pilot, likemany others, implemented BHLP practice as a top-up model: Connexions personal advisers were designated as BHLPs and given access to an additionalpotof money to use with the young people they were engaged in supporting. Although BHLPs were expected to co-ordinate multi-agency interventions in their new role as budget holders, we found little evidence that the Hertfordshire BHLPs did this. In essence, therefore, the BHLP approach they adopted was some distance from the original policy intent, although young people may well have benefited from the additional expenditure BHLPs could muster. Overall, therefore, the national evaluation was limited in its ability to evaluate BHLP as the government had intended it to be implemented.
Implementing a robust quantitative evaluation
Appropriate inference from quantitative analysis requires a robust evaluation design. Randomised controlled trials (RCT), where feasible, are the strongest research design for generating unbiased measures of effect (Rawlings, 2005). There has been considerable reluctance to apply RCTs to social interventions in the UK, partly because of a perception that they are ‘unfair' and partly because of a belief that contexts in social initiatives are simply too heterogeneous and dynamic to allow inference from an RCT (Granger, 1998; Macdonald, 1997). The challenges of conducting trials of complex social programmes are not insignificant (Wolff, 2000), but there are a number of successfully implemented examples (Shemilt et al., 2004; Oakley et al., 2003). The complexity of the setting in which social interventions are evaluated increases the need to ensure that intervention and comparison groups are equivalent to limit bias in evaluating outcomes (Oakley et al., 2003). Where RCTs are not feasible robust quasi-experimental designs should be considered (Cook and Campbell, 1979). The Realistic Evaluation school eschews controlled trials in favour of refining and testing the theory of the mechanism of action of an intervention (Pawson and Tilley, 1997). Whilst an understanding of the mechanism of action of social interventions is clearly advantageous, appropriate quantitative analysis is necessary to determine whether an intervention improved outcomes within the context studied (Cook, 2000; Mann and Schorr, 1998).
The Connexions Client Information System provided access to a large amount of high quality data; data on the total number of young people and their NEET status were available for 94 per cent of the monthly LA returns collated by Connexions across England. This situation is rare, and analysis of most outcome measures requires data collection at individual level as part of an appropriate research design. In this study an individual-level analysis would have required restricting implementation of the BHLP programme in some districts. Ideally the districts implementing BHLPs would have been chosen at random (Boruch, 2005), and outcomes compared for young people receiving BHLP support with a matched cohort 2 of young people accessing Connexions support in non-BHLP pilot districts. These designs need careful planning prior to the commencement of the pilot and close cooperation between pilots and evaluators.
Facilitating better evaluations
Successful formulation of evidence-based policy depends on the collection of good-quality evidence of effectiveness. We would argue that there is a role for robust quantitative and qualitative analysis in assessing evidence of effectiveness. There is much that local authorities can do to aid national evaluations in mounting robust evaluations and local authorities stand to gain from this if they seek to maximise the effectiveness of their own services. National evaluators need to be able to work closely with pilots from the inception of any new programme. That means selecting the evaluators at the same time as the pilots. More emphasis needs to be placed on the importance of evaluation in pilot tenders.
Implementing an experimental design inevitably creates tension between the desire of local implementers to adapt and shape programmes to their local environment and the desire of national evaluators to maintain a coherent study design across all the pilots. But many of the potential difficulties can be anticipated. A dialogue needs to occur before a pilot begins so that local authorities are fully aware of the needs of the national evaluation and the national evaluation design is appropriate for local implementation plans.
Collaboration on quantitative evaluation between pilots and national evaluators should be encouraged. This can facilitate stronger research designs; implementation of an individual-level analysis of BHLPs in Hertfordshire, as outlined in this article, would have facilitated assessment of other outcome measures in addition to %NEET. Pilots ought to exploit the expertise of the national evaluation team in quantitative and qualitative evaluation when utilising their own resources set aside for evaluation.
Conclusions
Conflicting conclusions about the effectiveness of the BHLP pilot in Hertfordshire were driven by inadequate and inappropriate research design in the local evaluation. Valid inference requires good study designs which focus on collecting the best available evidence, not only of what happened in the presence of the intervention, but also of what would have happened in the absence of the intervention. The case study discussed in this article highlights the risks of not applying rigorous evaluation methodology, namely that inappropriate conclusions may be drawn about the effectiveness of social programmes.
Political pressures can create an environment in which those implementing pilot initiatives are keen to show they are a success. While this is to some degree inevitable, it might be mitigated by strong encouragement at the national level forrobust evaluation. Emphasis needs to be placed on the role of pilots as experiments to determine the effectiveness of an initiative, and the importance of objective analysis. The societal cost of basing policy on weak evaluations can be considerable (Berk et al., 1985). If local decision makers wish to allocate resources in a manner which maximizes societal welfare then they must seek good estimates of the effectiveness of social programmes.
Footnotes
Acknowledgements
We acknowledge the kind help of Simon Gentry, Planning and Performance Manager for Hertfordshire Connexions in providing the data on which this analysis is based. The analysis and interpretation of the data is purely the authors’ own.
