Abstract
The International Health Regulations (2005) (IHR) Monitoring and Evaluation Framework is designed to assist States Parties in assessing progress toward compliance and sustainable capacities under the IHR. The States Parties Self-Assessment Annual Report (SPAR) is the only mandatory tool in the 4-component framework. The current SPAR is the third version of the tool since its inception in 2010. The revisions, while reflecting evolving requirements for health security capacity under the IHR, hinder the ability to compare capacity scores between versions and prevent analysis of historical data. In this article, we describe a methodology that aligns capacities across the 3 versions of the tool by creating umbrella terms for common themes that can be adapted or applied to any future SPAR changes, providing a sustainable framework for ongoing assessment and analysis. Our methodology enables States Parties, policymakers, and other stakeholders to view and assess country capacity across the history of self-assessment. Mapping by common themes allows for a historical understanding of national, regional, and global efforts to strengthen health security capacity.
Introduction
To
When SPAR 2021 was first announced and published, little information was provided as to the scope of the changes compared to SPAR 2018, and this gap continues. Our initial research aims were to conduct a critical review of the changes between SPAR 2018 and SPAR 2021 and to determine how these revisions impact a country’s ability to assess IHR compliance. Our research team found that the 2018 and 2021 editions had systemic differences that impact both “how” and “what” countries are required to assess and report to WHO. There are relevant changes in how capacities are assessed and what is defined as an indicator of compliance. Through this process, our team determined that the revisions make it difficult to assess health security capacities, particularly across editions (ie, between the questionnaire and SPAR 2018, and between SPAR 2018 and SPAR 2021) as they are currently collected, reported, and visualized on the e-SPAR website. 10
Why does this matter? If revisions to SPAR improve the evaluation of IHR compliance, why does it matter that scores cannot be compared to previous versions? A vital use of the SPAR is its ability to serve international organizations, governments, researchers, policymakers, and funders as an annual assessment of country and regional health security capacity. An inability to compare scores across the 3 versions negatively impacts this incremental assessment. Revisions to global health monitoring and evaluation frameworks can strengthen the ability of national, regional and/or global stakeholders to measure capacity more effectively and better prepare for emerging global health security threats. However, continued revisions over time, even when beneficial, can impact accurate monitoring and evaluation processes to assess compliance and health security capacity building. Imagine taking a test, where halfway through, the mode by which you were assessed changed? Your scores, even after years of potential hard work, may dramatically drop and/or not reflect acquired knowledge since new questions are asked. This is what happened with SPAR 2021; the test by which countries are assessed changed, and with little guidance, leaving those responsible to not only understand the new examination but also relate it to old scores. Accountability for the success of capacity building lands on IHR national focal points and other key national stakeholders who must explain to leaders and policymakers potentially large swings in scores due to SPAR revisions, and thus assumed compliance, even when on-the-ground capacity has been maintained or improved. Incorrect analyses (ie, comparing scores between versions of SPAR data when methods of assessment have changed) can lead to misunderstandings about country, regional, or global capacities, resulting in incorrect assumptions and inappropriate allocations of resources or recommendations for action.
Returns on investment and the ability to track those investments become obfuscated by new measures for assessment, which cannot be easily compared to previous data according to the admission of the guidance document.
In this article, we examine the structural differences between SPAR 2018 and SPAR 2021 and highlight the major revisions that make the comparison of underlying scores between the versions difficult. We then describe our actions and methodology to correct misaligned capacities across all 3 versions of the SPAR by topic areas using what we designate “common themes.” This method provides a platform for States Parties, policymakers, and other stakeholders to view and assess country capacity across common themes that can be adapted or applied to future SPAR changes, providing a sustainable framework for ongoing assessment and analysis.
Methods
To determine the revisions between SPAR 2018 and SPAR 2021 and develop a means to assess country scores across all 3 versions, our research fell into 2 separate tasks: (1) we reviewed the questionnaire, SPAR 2018, and SPAR 2021 to identify substantive differences between the versions; and (2) we developed a means to align all 3 versions of the SPAR across common themes.
We first compared WHO guidance documents for the 3 SPAR versions, including identified supplementary materials and footnote materials for SPAR 2018 and SPAR 2021. Two research teams, composed of 3 members each, independently assessed the technical capacities and their associated indicators to identify changes in title, order, and content. Findings from the team assessments were consolidated and compared. We also aligned our SPAR 2018 and SPAR 2021 comparison to the SPAR 2021 Annex 1, which aims to detail the changes between versions but does not provide a methodology for making the revisions.
We then set out to align the questionnaire, SPAR 2018, and SPAR 2021 capacities to our common themes. We first downloaded all 196 individual States Parties data from the e-SPAR website and created a single unified dataset of country and WHO regional data reported from 2010 to 2022. For all 3 SPAR versions, we selected “All WHO regions” and “All countries” for each year from 2010 to 2022 and downloaded the Microsoft Excel files. We then consolidated data into a single file using the R programming language, in which downloaded files were combined using key variables (country, year, capacity, indicator, title, score) to create a unified dataset for analysis. It is important to note that we revised the following aspect of data downloaded from the e-SPAR website: if country-specific scores recorded “No data,” we entered it as NA rather than “zero,” as it was recorded in data provided on the e-SPAR website. “Zero” is not an available scoring option in any of the SPAR revisions. However, the WHO data repository currently designates “no capacity” numerically as zero, an important finding we noticed by downloading directly from the WHO e-SPAR website. The decision to do so has the unintended consequence of impacting a country’s calculated average scores, leading to artificially lower scores.
Following our review of the SPAR versions, we identified 15 common themes that are present and assessed across all 3 versions of the SPAR. Using the common themes as aggregating terms, we coded and reformatted the e-SPAR country data for analysis and introduced a new parameter we call “capacity themes” into our dataset. This step was necessary because, while the different versions of the SPAR contain different questions and cannot be easily compared, the SPAR broadly covers the same material across all versions. By creating an aggregating term that measures the overall SPAR themes present across revisions, we were able to evaluate historical data from all SPAR versions, assessing them not by any single capacity, but by retrieving scores for related capacities, categorizing them under a single theme parameter, and then assessing the theme, rather than individual capacity scores. Our team then created a matrix to compare the capacity themes from the beginning of country self-assessment under the questionnaire in 2010 until 2022 and the global average scores for all countries that have submitted for those years (on a scale of 0 to 100). This matrix was visualized using Tableau and assessed by looking at each part of the developed theme matrix, which corresponds to each version of the SPAR, and understanding the trends present for the theme. We did not conduct this comparison at the indicator level due to the variety in number and content of indicators across SPAR editions.
Results
Importance of Content and Assessment Revision
SPAR 2021 expands upon SPAR 2018 content with the introduction of 2 new capacities and 11 new indicators, for a total of 15 capacities and 35 indicators. The introduction of a standalone capacity for financing (C3), previously grouped with legislation and policy (C1), underscores the importance of developing national funding mechanisms for public health emergencies as a priority for strong and sustainable health systems, 11 and infection prevention and control (C9) underpins the role of healthcare settings in limiting transmission of disease to the public (Tables 1 and 2). 12
Ordering of SPAR Capacities Across Versions by Number (C1-C15)
An ordinal view of SPAR capacities across versions demonstrates that the capacity number is a poor means to assess capacities because the numbering system has shifted over time.
Abbreviations: IHR, International Health Regulations (2005); SPAR, States Parties Self-Assessment Annual Reporting.
Reordering SPAR Capacities by Mapped Common Themes
Abbreviations: IHR, International Health Regulations (2005); SPAR, States Parties Self-Assessment Annual Reporting.
The addition of 11 new indicators indicates recognition and movement toward gender equality and equity in public health emergencies. 13 Revisions made to capture incremental compliance is reflected by the revised indicator language. Unlike SPAR 2018, in which several indicators followed a hierarchal structure, indicators in SPAR 2021 follow an incremental structure. The difference between the 2 versions of the SPAR is most easily depicted in C2.2, which focuses on multisectoral coordination in both versions. In SPAR 2018, C2.2 measures 5 distinct areas that could require a multisectoral response to be effectively responded to, including infectious diseases, zoonoses, food safety, and chemical and radiological emergencies. The introduction of an incremental scoring scheme for SPAR 2021 indicators provides several benefits for capacity building, capturing progressive development and quantify compliance. However, these revisions complicate data alignment and longitudinal assessments of country, regional, or global health security capacities across the 3 versions. This impact should be considered and measured in future revisions to determine whether an order change for an existing capacity is necessary.
In SPAR 2021, countries are measured on whether multisectoral coordination mechanisms are in development, ad hoc, or established at the national, intermediate, and local levels. This revision is intended to allow for progression for the health security capacities to be more easily assessed over time. Figure 1 parts A and B depict this difference in scoring scheme between SPAR 2018 and SPAR 2021 using C2.2 as an example.

Indicator progression scheme differences between the SPAR 2018 and SPAR 2021. (A) A nonlinear progression is depicted for SPAR 2018 indicator C2.2. (B) A linear progression is depicted for SPAR 2021 indicator C2.2, with baseline capabilities listed that evolve toward a fully developed capacity that is exercised, reviewed, and evaluated on a regular basis. (C) New mandatory sections include “status of implementation” and “areas involved.” Abbreviations: IHR, International Health Regulations (2005); SOP, standard operating procedure; SPAR, State Party Self-Assessment Annual Report.
What is still absent in SPAR 2021 is the ability to score an indicator “not applicable” or “no data” without impacting a capacity’s overall average. Indicators, which make up capacities, are scored on a scale of 1 to 5. In turn, capacity scores are calculated from 0% to 100% using a mathematical formula that accounts for component indicator scores. WHO currently translates “no data” to a numerical score of zero, lowering the overall compliance score of a country. Two new mandatory sections, “status of implementation” and “areas involved,” demonstrated in Figure 1C, require States Parties to classify and identify the stage of development for each core capacity and highlight the stakeholders involved in continued development. While useful for annual national monitoring and evaluation in preparation for SPAR, it is not clear how the data are assessed by WHO (Figure 1).
Incidentally, our analysis found that 42.9% of the website links provided in SPAR 2021 do not directly connect the reader to the document referenced because the links are either broken or associated with the wrong documents. In addition, some links refer to outdated manuals and guidance documents, such as the third edition of the Laboratory Biosafety Manual, 14 despite that the fourth edition 15 was published in 2020, prior to the release of SPAR 2021. The inaccurate links and outdated reference documents detract from revisions made to improve and strengthen the self-assessment processes, placing responsibility on the user to find proper guidance documents. As part of this research endeavor, we created a repository of all relevant SPAR 2021 documents, which is publicly available online at https://www.ergriffinprogram.org/tools-resources.
Alignment Through Common Themes
The e-SPAR data across the 3 SPAR versions was initially difficult for our research team to collect because the online repository is highly siloed, without adequate explanation to identify revisions in capacity names or order of numbers between versions. This problem of alignment results in the unintentional mismatching of capacities and incorrect assessment of data. Unless a user is aware of the order change in capacity numbers between the SPAR versions, they could compare up to 3 different technical capacity scores depending on the capacity number they review if using an ordinal (C1 to C15) basis for their analysis. For example, C3 covers surveillance in the questionnaire, but in SPAR 2018 it represents zoonotic events and the human–animal interface, and in SPAR 2021 it represents financing. If one were to compare the C3 scores across the versions without recognizing the name change, they would be comparing 3 different focus areas (Figure 2).
Figure 2. Example of an incorrect approach to assessing historical SPAR data. The x-axis represents time in years, from 2010 to 2022, the entire recording period of SPAR assessments. The y-axis assesses global average SPAR scores from 0 to 100. The green line plots the global average capacity score for the questionnaire and extends if scores were used from questionnaire C3 surveillance for the entire recording period, the brown line plots capacity scores for C3 zoonotic events for SPAR 2018, and the orange line plots financing capacity scores for SPAR 2021. Data for each version are siloed from each other on the e-SPAR website, without adequate comments to identify revisions in capacity name or order between versions. This alignment (green line) results in the unintentional mismatching of capacities and incorrect assessment of these data. Abbreviation: SPAR, State Party Self-Assessment Annual Report.
Despite changes in the indicators, which makes it difficult to compare individual capacity scores across the SPAR versions, there are 15 common themes for health security capacity building (Table 3). We added the parameter of “common theme” to our unified dataset of global SPAR scores from 2010 to 2022. This step allows for visualization of the 15 common themes, bridging some of the incompatibility between versions of the SPAR. The result is that key topics for health security capacity building are common across the SPAR versions, despite changes in indicators that make individual comparison of capacity scores between the different versions difficult.
Common Themes for Mapped SPAR Capacities
The identification of common themes that were consistent across all versions of the SPAR allows for analysis of identified overarching themes for assessment across the entire historical period of monitoring and evaluation framework self-reporting.
Abbreviations: IHR, International Health Regulations (2005); SPAR, States Parties Self-Assessment Annual Reporting.
Our analysis does not plot capacities directly against each other because inconsistent naming conventions across the versions, as well as underlying congruency issues, make this a challenge for data analytics. Instead, we aligned capacity scores from all SPAR versions under the common themes. In this way, themes can be assessed as a piecewise function with 2 break points in 2018 and 2021 when revisions to SPAR occurred. By mapping capacities based on topic area and creating an umbrella term of common themes, it is possible to conduct an assessment of global average scores. For example, surveillance scores from the questionnaire (C3), SPAR 2018 (C6), and SPAR 2021 (C5) can now be aligned and assessed. As noted earlier, this is not a perfect solution because the indicators that make up the capacity scores have changed; however, this process allows for the general assessment of a capacity by common theme. By denoting whether global average scores increase, decrease, or remain neutral using the curves of each reporting period, it is possible to assess progress by common themes from 2010 to 2022 using a tabular format and draw conclusions regarding health security capacity building efforts.
It should be noted that 2 common themes, related to financing and infection prevention and control, are new capacities in SPAR 2021 and therefore do not align with previous versions. Measures related to financing and infection prevention were not absent in previous versions but were discussed at the indicator level, and therefore it is difficult to compare efforts related to these topics across all 3 versions of the SPAR. Assessment of these topics at the indicator level, which change with each version of the SPAR, means we are unable to provide common themes prior to SPAR 2021 for financing and infection prevention and control. Going forward, however, insights can be garnered for these common themed capacities (Figure 3).

Common theme approach to assessing historical SPAR data. The x-axis records time, in years, from 2010 to 2022, the entire recording period of SPAR assessments. The y-axis assesses global average SPAR scores from 0 to 100. Within each matrix, there are 3 separate curves. The red line plots the global average capacity score for the questionnaire, the blue line plots capacity scores for SPAR 2018, and the orange line plots capacity scores for SPAR 2021. Themes can be assessed as a piecewise function with 2 break points in 2018 and 2021 when revisions to SPAR occurred. By performing capacity mapping based on topic and creating umbrella terms of common themes, assessment of global average scores can be completed. Abbreviations: IHR, International Health Regulations (2005); SPAR, State Party Self-Assessment Annual Report.
Discussion
This work stemmed from an interest in navigating confounding changes when evaluation frameworks are updated without consideration for longitudinal analysis. Providing tools for various stakeholders to assess capacity under common themes is not a perfect solution but it prevents historical investment and reporting data from being disregarded. Our methodology of assessing common themes offers a systematic approach for States Parties, policymakers, national focal points, and other key partners to compare and track capacity scores over the lifetime of the SPAR. Using the outputs from this methodology, any user can take a historical view of the investments made to build national, regional, and global health security capacities between all versions of the SPAR.
Performing capacity mapping through the creation of common themes also allows us to present global average scores. In doing so, users can visualize the overall increase or decrease in a capacity by common theme during a SPAR’s period of compliance. For example, one can assess increases or decreases in compliance under the common themes of “surveillance” or “risk communication” between 2010 and 2017, 2018 and 2020, or 2021 to 2022, captured in Figure 3. Although we acknowledge subjectivity in this finding, such an assessment demonstrates that investments to increase global health security capacity have been successful under the SPAR with overall increases achieved from 2010 to 2022. Our process is adaptable and can be applied to future SPAR versions, providing a sustainable framework for ongoing assessment and analysis. As new topics are identified, new themes are also identified, as seen with financing and infection prevention and control. This process allows for the changing of SPAR indicators, or even the addition of new capacities, as global priorities for disease response shift; novel capacities can always be mapped to novel themes.
We acknowledge that the method proposed, as with any method that attempts to align versions of an assessment tool, has limitations. While we were able to align technical capacities by grouping them under common themes, a similar analysis at the indicator level would prove impossible, as indicators are different between each of the SPAR versions. In addition, mechanisms for scoring have changed from hierarchical to incremental logic, which does not allow for thematic grouping at the indicator level. In SPAR 2021, the step-by-step process highlights the efforts necessary to achieve a singular capacity goal, whereas the older SPAR versions approach capacity building as the presence or absence of broader capabilities. This understanding means we fully acknowledge that the proposed process of grouping mapped capacities between the SPAR versions with common themes is a way of looking back, and not forward. The framework proposed here is not intended to instruct policymakers and other interested in building national capacity for health security on the best way to increase scores for any indicator or capacity under SPAR 2021. Instead, we recommend working collaboratively with key stakeholders to assess country capacity and directly assess the specific needs of SPAR indicators to achieve higher scores under the most current version of the SPAR tool. An assessment by common theme allows a way to review country, regional, or global efforts to achieve robust health security capacity historically, which is useful for demonstrating to national public officials or international funding sources the importance and results of their investments over time. Without an ability to compare scores directly, however, it becomes more difficult to advocate for increased or sustained budgets and supportive policies within ministries of health and across national governments, or among international bodies or external entities such as global health security initiatives.
Conclusion
Revisions to frameworks and methods of evaluation should be encouraged to respond to evolving capacities and indicators for global health security. As threats and diseases advance so should our health systems for preparedness, response and recovery. Countries have access to a variety of assessments tools as they strengthen capacity, many of these frameworks assess overlapping priorities. We do not see this overlap as a challenge, but rather a benefit, as a majority of assessments are voluntary, and countries select these tools based on ease of application or recommendations under international frameworks. The SPAR, however, as the only mandatory reporting tool under the IHR, plays a unique role for States Parties. The challenge lies in capturing met capacities under the SPAR with other relevant assessment tools such as the Joint External Evaluation, 4 HEPR (health emergency prevention, preparedness, response and resilience), 16 and PVS (Performance of Veterinary Services) Pathway, 17 and vice versa. As a next step, we recommend an alignment of the technical areas in these tools to identify “universal” health security capacities, which can streamline national processes for capacity building by identifying major areas for continued assessment.
The SPAR is a vital component of the IHR MEF, which tracks country compliance toward building sustainable capacity to respond to health security threats. The revisions made to SPAR 2021 strengthen the MEF and support States Parties’ self-assessment and reporting of health security capacities by providing more means to track internal progress over time, building upon a new incremental scoring scheme, and adding new capacities and indicators that align with lessons learned from COVID-19. Nevertheless, the strengths of the updated SPAR 2021 are encumbered by oversights including inaccurate links to reference documents and a lack of guidance for countries on how to best complete the evaluation, which detract from the finished product and hamper robust analysis of the data collected from previous years. Rather than simply noting these oversights, our research team aimed to provide a solution to the problem of understanding scores between the SPAR versions with different means of assessment. As a result, it is possible to assess historical SPAR data using common themes, which provides a way for health security experts and practitioners to demonstrate the importance of their work to policymakers and interested global partners. Frameworks that are updated regularly without guidance not only complicate understanding progress toward health security targets, but they also require investments from policymakers and national leaders who must work toward understanding new compliance structures rather than building health systems. We recommend a clear and transparent process for revisions to existing frameworks. Revisions to health security assessment tools are required to adapt to evolving systems, needs, and threats; however, the method of revision and the impact to past assessments and capacity building need to be clearly defined for its users. Any new framework developed should outline, in its first edition, plans and approaches for revisions to prevent misalignment or impacts to past investments in capacity building. It is our hope that the findings presented in this article highlight the importance of evaluation frameworks, clarify why consistency in the methods of assessment is vital for understanding capacity building and understanding why revisions were made are just as important as the revisions themselves, and demonstrate that the work accomplished toward building health security capacity, while a challenge, has been an overall success.
Data Sharing Considerations
Data used for this study were extracted from publicly available sources from the e-SPAR website. To access the unified dataset, please contact Dr. Erin Sorrell at
Footnotes
Acknowledgments
The research would like to extend a sincere thank you to Elizabeth R. Griffin student researchers Dana Krauss, MS and Alaa Mohammed, MS for their efforts and assistance during the initial steps of this research. This work was supported by the Elizabeth R. Griffin Program.
