Abstract
Background:
Several ultrasound (US)-based risk stratification systems have been increasingly used for the optimal management of thyroid nodules. However, there are considerable discrepancies across these systems. This study aimed to summarize and compare the category-based diagnostic performance in the detection of thyroid cancer of different US-based risk stratification systems from four societies: the American College of Radiology-Thyroid Imaging Reporting and Data System (ACR-TIRADS), the American Thyroid Association (ATA), the Korean Thyroid Association/Korean Society of Thyroid Radiology (KTA/KSThR; K-TIRADS), and the European Thyroid Association (EU-TIRADS).
Methods:
MEDLINE/PubMed and EMBASE databases were searched to identify original articles investigating the category-based diagnostic performance according to at least one of the following guidelines: ACR-TIRADS, ATA, K-TIRADS, and EU-TIRADS. Pooled sensitivity and specificity were calculated using a bivariate random-effects model. A subgroup analysis on nodules of 1 cm or larger and a meta-regression analysis to identify factors associated with the diagnostic performance were performed.
Results:
A total of 29 articles including 33,748 thyroid nodules met the eligibility criteria and were included in the analysis. For ACR-TIRADS, the pooled sensitivity and specificity were, respectively, 66% and 91% for category 5 and 95% and 55% for category 4 or 5. For ATA, the pooled sensitivity and specificity were, respectively, 74% and 88% for category 5 and 91% and 64% for category 4 or 5. For K-TIRADS, the pooled sensitivity and specificity were, respectively, 55% and 95% for category 5 and 89% and 64% for category 4 or 5. For EU-TIRADS, the pooled sensitivity and specificity were, respectively, 82% and 90% for category 5 and 96% and 52% for category 4 or 5. Study location, proportion of female patients and malignant nodules, and study design were associated with study heterogeneity.
Conclusions:
The overall diagnostic performance of the four US-based risk stratification systems was comparable.
Introduction
As ultrasound (US) is a highly sensitive diagnostic modality for the characterization of thyroid nodules (1), neck US is recommended as the primary imaging workup test for the diagnosis of thyroid cancer (2 –4). This has resulted in increasing use of US examinations, fine-needle aspiration biopsy (FNAB), and core needle biopsy (CNB) and has consequently contributed to the increase in the recorded incidence of thyroid cancer (5,6). However, given that thyroid cancer frequently shows a less aggressive nature, not all lesions require invasive diagnostic and therapeutic procedures. In this context, several representative US-based risk stratification systems have been increasingly used to risk stratify thyroid nodules and minimize unnecessary biopsies (7 –10) and also to assess the requirement for ablative treatment of benign nodules (11 –13).
However, there are considerable discrepancies across these risk stratification systems with respect to the imaging features used for the risk categories, expected malignancy risk, diagnostic performance, and size cutoffs for biopsy. There have been many attempts to externally validate and compare the systems (14 –18), but comprehensive interpretation is still difficult because of heterogeneity in study designs and populations. Castellana et al. (19) conducted a meta-analysis on the selection of thyroid nodules for FNAB according to five well-known risk stratification systems; however, category-based diagnostic performance, subgroup analyses, and meta-regression analyses were not reported.
This study aimed to summarize and compare the diagnostic performance in the detection of thyroid cancer of the different US-based risk stratification systems from four societies: the American College of Radiology-Thyroid Imaging Reporting and Data System (ACR-TIRADS) (7), the American Thyroid Association (ATA) (8), the Korean Thyroid Association/Korean Society of Thyroid Radiology (KTA/KSThR; K-TIRADS) (9), and the European Thyroid Association (EU-TIRADS) (10). In addition, we performed subgroup and meta-regression analyses to identify any factors associated with the diagnostic performance.
Materials and Methods
This study was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (20).
Search strategy and eligibility criteria
A literature search of the MEDLINE/PubMed and EMBASE databases was conducted using pertinent MeSH or EMTREE terms with common keywords for relevant articles up to August 5, 2019. The search terms were as follows: ((thyroid)) AND ((thyroid imaging reporting and data system) OR (TIRADS) OR (TI-RADS) OR (guideline)) AND ((American Thyroid Association) OR (ATA) OR (American College of Radiology) OR (ACR) OR (Europe*) OR (EU-TIRADS) OR (Korea*) OR (K-TIRADS)). The search was limited to English-language publications but was not limited by human or animal studies, or publication date.
After eliminating duplicate publications, articles were screened according to their title and abstract. Full-text articles were then thoroughly assessed according to the following eligibility criteria: (a) population: patients who underwent US examinations for thyroid nodules; (b) index test: US-based risk stratification systems according to at least one of the following guidelines: ACR-TIRADS (7), ATA (8), K-TIRADS (9), and EU-TIRADS (10); (c) reference standard: pathological diagnosis or imaging follow-up; (d) outcomes: sensitivity and specificity of the US-based risk stratification systems for diagnosing malignant thyroid nodules; and (e) study design: not limited. Studies were excluded if any of the following criteria were met: (a) review articles; (b) case reports or case series including fewer than 10 patients; (c) conference abstracts; (d) letters, editorials, and commentaries; (e) animal studies; (f) studies with a partially overlapping patient cohort (for studies with an overlapping study population, the publication with the largest population was selected); (g) studies conducted with a pediatric population; (h) studies using a cytopathology reporting system other than the Bethesda classification system (21); or (i) studies not providing sufficient details to extract 2 × 2 tables.
The literature search and application of the criteria were conducted independently by two authors (P.H.K. and C.H.S.; with 3 and 8 years of experience in performing thyroid US and interventional procedures, respectively), and any discrepancies were resolved through discussion and consensus with a third author (J.H.B.; 21 years of experience in performing thyroid US and interventional procedures).
Data extraction and quality assessment
A standardized database form was used to obtain the following information from the selected studies: (a) study characteristics: institution, study period, study design (prospective vs. retrospective; single-center vs. multicenter), consecutive or nonconsecutive enrollment, reference standard, and blinding to the reference standard; (b) demographic and clinical characteristics: total number of patients, total number of nodules and malignant nodules, mean age (range), and sex; (c) characteristics associated with the US examinations: vendor, model, transducer frequency, and number and length of experience of participating readers; and (d) diagnostic performance of US-based FNAB criteria for diagnosing malignant thyroid nodules in the form of a 2 × 2 table. The quality assessment of selected studies was investigated using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) (22).
Data synthesis and analysis
Two-by-two tables were constructed for each study, choosing the results with the highest performance if the diagnostic performance was separately evaluated for different radiologists. The criteria for the positive test results were set to be (a) category 5 or (b) category 4 or 5 of each risk stratification system. For example, if we set category 5 as a cutoff value, true-positive nodules indicated the nodules classified as category 5 on US and turned out to be malignant, false-positive nodules indicated the nodules classified as category 5 and turned out to be benign, false-negative nodules indicated the nodules classified as category 1 to 4 and turned out to be malignant, and true-negative nodules indicated the nodules classified as category 1 to 4 and turned out to be benign. Similarly, if we set category 4 or 5 as a cutoff value, true-positive nodules indicated the nodules classified as category 4 or 5 on US and turned out to be malignant. The diagnostic performance with the category 3, 4, or 5 was additionally evaluated, as provided in the Supplementary Data. We followed the reference standard set in each study. Since the ATA system (categorizing sonographic pattern as benign, very low, low, intermediate, and high suspicion) is not called TIRADS, we treated benign to high suspicion pattern in ATA as category 1 to 5, respectively.
The pooled sensitivity and specificity and their 95% confidence intervals [CIs] were calculated using a bivariate random-effects model, and a coupled forest plot was constructed (23 –27). In addition, a hierarchical summary receiver operating characteristics (HSROC) curve with 95% confidence and prediction regions was plotted. Heterogeneity was assessed using the Higgins inconsistency index (I 2) test with a value >50% indicating the presence of heterogeneity, and a coupled forest plot was used to graphically assess the presence of a threshold effect (a positive correlation between sensitivity and false-positive rate among the selected studies) (23). Deeks' funnel plot was constructed to test for publication bias, with statistical significance being assessed using Deeks' asymmetry test. In addition, indirect (comparison using the pooled estimates from separate studies) and direct comparisons (meta-analytic pooling of head-to-head comparison studies) of sensitivity and specificity between the guidelines were performed when possible. In addition, we investigated unnecessary biopsy rates (defined as the number of benign nodules among FNAB-requiring nodules) for each system.
A subgroup analysis of the studies with respect to nodule size of 1 cm or larger and a meta-regression were performed to explore the sources of heterogeneity. The following variables were considered for the bivariate meta-regression model: study design (prospective vs. retrospective), single-center vs. multicenter, study location (East Asia vs. other countries), proportion of female patients (cutoff at 79%; mean value of the proportions reported by the included studies), proportion of malignant nodules (cutoff at 28%; mean value of the proportions reported by the included studies), and inclusion of follow-up as a reference standard.
Statistical analyses were conducted by one of the authors (C.H.S., with six years of experience in performing systematic reviews and meta-analyses) using the “metandi” and “midas” modules in Stata 15.0 (StataCorp, College Station, TX), the “mada” package in R version 3.4.1, and MedCalc version 18.11 (MedCalc Software, Ostend, Belgium). A value of p < 0.05 was taken to indicate statistical significance.
Results
Literature search
A flowchart summarizing the publication selection process is presented in Figure 1. A total of 411 nonduplicate studies were identified. Of these, 307 articles were excluded on the basis of their titles and abstracts because of the following reasons: (a) not in the field of interest (n = 232); or they were (b) guidelines (n = 63); (c) reviews (n = 8); (d) case reports (n = 2); (e) erratum (n = 1); or (f) an animal study (n = 1). Subsequently, 104 potentially eligible full-text articles were assessed according to the eligibility criteria, and a further 75 studies were excluded because of the following reasons: (a) articles included nonconsecutive nodules (n = 29); (b) articles did not use any of the four risk stratification systems of interest (ACR-TIRADS, ATA, K-TIRADS, or EU-TIRADS; n = 11); (c) articles used data included in subsequent articles (n = 10); (d) articles not in the field of interest (n = 9); (e) articles included inseparable adult and pediatric patients (n = 6); (f) articles not using each guideline's category as a standard for positive test results (n = 4); (g) articles with insufficient details to derive a 2 × 2 table (n = 4); (h) articles using a cytopathologic reporting system other than the Bethesda system (n = 1); and (i) articles not including histopathology as a reference standard (n = 1). Consequently, a total of 29 articles including 33,748 thyroid nodules met the eligibility criteria and were included in the analysis (14 –18,28 –51).

Flowchart of the publication selection process.
Characteristics of the studies included
The detailed study characteristics are summarized in Table 1. Seven of the 29 studies were a prospective study design (28,29,35,36,39,42,44), and 6 were multicenter studies (15,16,29,33,40,45). The number of included patients ranged from 52 to 3190, and the mean patient age ranged from 43 to 59 years. The proportion of female patients in each study ranged from 61.2% to 94.9%. The proportion of malignant nodules in each study ranged from 3.9% to 66.1%. The diagnostic performances of ACR-TIRADS, ATA, K-TIRADS, and EU-TIRADS were reported in 21 (72.4%) (14 –18,29,30,32 –38,40,42,43,46,47,49,51), 13 (44.8%) (14,15,17,20,22,23,30,36,38,39,41,47,48), 8 (27.6%) (14 –16,28,33,34,49,50), and 5 studies (17.2%) (16,17,44,45,49), respectively, and 30,280, 15,504, 12,659, and 7549 nodules were analyzed for evaluating the diagnostic performances of ACR-TIRADS, ATA, K-TIRADS, and EU-TIRADS, respectively. In 10 studies, follow-up imaging was considered as the reference standard in parallel with pathology (14 –16,28,30,33,34,41,42,47), while in the other 19 studies, only pathology from FNAB, CNB, or surgery was considered as the reference standard (17,18,29,31,32,35 –40,43 –46,48 –51), with 7 of these 20 studies considering only postsurgical histopathology as the reference standard (17,18,31,32,39,44,45).
Characteristics of Studies Included
In all studies using follow-up as a reference standard, thyroid nodules with initial benign results of biopsy and decreased or stable size at follow-up US more than 12 months later were considered as benign.
FNAB and CNB were used to obtain the specimen. In other included studies, only FNAB was used.
ACR, 2017 American College of Radiology; ATA, 2015 American Thyroid Association Management Guidelines for Adult Patients with Thyroid Nodules and Differentiated Thyroid Cancer; CNB, core needle biopsy; EU, 2017 European Thyroid Association; FNAB, fine-needle aspiration biopsy; K, 2016 Korean Thyroid Association/Korean Society of Thyroid Radiology; US, ultrasound.
Quality assessment
The results of the quality assessment based on the QUADAS-2 criteria are shown in Supplementary Figure S1. Two (17,29) of the 29 studies had a high risk, and 9 studies (18,31,32,34,35,40,45,48,51) had an unclear risk of bias in patient selection because of nonconsecutive enrollment. One study (31) had a high risk, and 12 studies had an unclear risk (28,29,33 –36,42,44,48 –51) of bias in the index test domain because of no or unclear blinding to the reference standard during the US examinations. One study (29) had a high risk, and 28 studies (14 –18,28,30 –51) had an unclear risk of bias in the reference standard domain because of no or unclear blinding to the index test during pathologic evaluation. Additionally, five studies (15,31,33,34,49) had a high risk, and one study (35) had an unclear risk of bias in the flow and timing domain because of inconsistency or unclear consistency on the reference standard for diagnosing benign nodules across the study population. Six studies (16,35,37,39,48,49) had a high concern, and three studies (18,40,42) had an unclear concern on the applicability of the index test because of single or unreported numbers of readers for the US images. One study (35) had an unclear concern on the applicability of the reference standard because of no information on how the tissue specimens were examined. There were no concerns on the applicability of patient selection.
Diagnostic performance of different US risk stratification systems
The pooled diagnostic performances of each risk stratification system for diagnosing malignant nodules are summarized in Table 2, Supplementary Figure S2 (category 5 as positive), and Supplementary Figure S3 (category 4 or 5 as positive). For ACR-TIRADS, the pooled sensitivity and specificity were, respectively, 66% [CI 56–75%] and 91% [CI 87–94%] for category 5 and 95% [CI 92–97%] and 55% [CI 45–64%] for category 4 or 5. For ATA, the pooled sensitivity and specificity were, respectively, 74% [CI 62–84%] and 88% [CI 82–93%] for category 5 and 91% [CI 84–95%] and 64% [CI 54–74%] for category 4 or 5. For K-TIRADS, the pooled sensitivity and specificity were, respectively, 55% [CI 38–70%] and 95% [CI 90–98%] for category 5 and 89% [CI 83–93%] and 64% [CI 60–69%] for category 4 or 5. For EU-TIRADS, the pooled sensitivity and specificity were, respectively, 82% [CI 71–89%] and 90% [CI 77–96%] for category 5 and 96% [CI 92–98%] and 52% [CI 37–66%] for category 4 or 5. When considering category 3, 4, or 5 as positive test results, the pooled sensitivity reached almost 100% and the pooled specificity decreased to 3–23% for each system (Supplementary Table S1). HSROC curves are presented in Supplementary Figure S4 (category 5 as positive) and Supplementary Figure S5 (category 4 or 5 as positive).
Pooled Sensitivity and Specificity for Malignant Thyroid Nodules for Each Risk Stratification System
Significant heterogeneity (I 2 > 50%) was noted for all meta-analytic calculations described in this table.
Neither indirect (comparison using pooled estimates from separate studies) nor direct comparisons (meta-analytic pooling of head-to-head comparison studies) identified any statistical differences in pooled diagnostic performance between the four guidelines.
CI, 95% confidence interval; HSROC, hierarchical summary receiver operating characteristics.
Deeks' funnel plot and asymmetry test did not show a significant probability of publication bias, except for the diagnostic performance of K-TIRADS for category 5 (p < 0.01). Indirect comparisons did not identify any statistical differences in the pooled diagnostic performance between any of the guidelines. Direct comparisons between ACR-TIRADS and ATA were available in nine studies for category 5 (14,15,17,18,32 –34,43,47) and eight studies for category 4 or 5 (14,15,17,18,32 –34,43,47), but these comparisons did not identify any statistical differences between the guidelines. Additionally, we investigated unnecessary biopsy rates across the systems. Among the included studies, unnecessary biopsy rates were available in eight studies for ACR-TIRADS (14 –16,33,35,43,47,49), five for ATA (14,15,33,43,47), five for K-TIRADS (14 –16,33,49), and two for EU-TIRADS (16,49). Indeed, the reported unnecessary biopsy rates ranged 17–40% (median, 25.5%) in ACR-TIRADS, 35–61% (median, 52%) in ATA, 32–66% (median, 59%) in K-TIRADS, and 25–53% (median, 39%) in EU-TIRADS.
Subgroup analysis
A subgroup analysis was performed on nodules of 1 cm or larger, and the results are summarized in Table 3. For ACR-TIRADS, the pooled sensitivity and specificity were, respectively, 66% [CI 52–77%] and 93% [CI 88–96%] for category 5 and 95% [CI 88–98%] and 60% [CI 41–77%] for category 4 or 5. For ATA, the pooled sensitivity and specificity were, respectively, 76% [CI 52–90%] and 89% [CI 74–95%] for category 5 and 87% [CI 76–93%] and 64% [CI 48–77%] for category 4 or 5. In K-TIRADS, the pooled sensitivity and specificity were, respectively, 41% [CI 24–60%] and 98% [CI 94–99%] for category 5 and 84% [CI 80–88%] and 72% [CI 67–76%] for category 4 or 5.
Pooled Sensitivity and Specificity for Malignant Thyroid Nodules ≥1 cm for Each Guideline
A meta-analysis for EU-TIRADS was not possible because only two studies reported sensitivity and specificity using EU-TIRADS.
Significant heterogeneity (I 2 > 50%) was noted for all meta-analytic calculations described in this table.
When ACR-TIRADS and K-TIRADS were indirectly compared with category 5 as a positive test result, the specificity of K-TIRADS was higher with borderline significance (98% [CI 94–99%] vs. 93% [CI 88–96%]; p = 0.05).
When ACR-TIRADS and K-TIRADS were indirectly compared with category 4 or 5 as a positive test result, the sensitivity of ACR was higher with borderline significance (95% [CI 88–98%] vs. 84% [CI 80–88%]; p = 0.05). Otherwise, indirect comparisons did not identify any statistical differences in pooled diagnostic performance between the four guidelines. A meta-analysis on the direct comparisons was not possible.
TIRADS, Thyroid Imaging Reporting and Data System.
When K-TIRADS and ACR-TIRADS were indirectly compared (comparison between the pooled estimates from separate studies), the specificity of K-TIRADS for category 5 was higher than that of ACR-TIRADS, with a statistical trend (98% [CI 94–99%] vs. 93% [CI 88–96%]; p = 0.05). Conversely, the sensitivity of K-TIRADS for category 4 or 5 was lower than that of ACR-TIRADS, again with a statistical trend (84% [CI 80–88%] vs. 95% [CI 88–98%]; p = 0.05). Otherwise, the indirect comparisons did not identify any statistically significant differences in the pooled diagnostic performance between the four systems. It was not possible to perform a meta-analysis on the direct comparisons.
Meta-regression
The results of the meta-regression analyses are summarized in Supplementary Tables S2, S2, S3, S4, S5. For ACR-TIRADS with both category 5 and category 4 or 5 as positive and ATA with category 5 as positive, the study location (East Asia vs. other countries; p = 0.01 for ACR-TIRADS for both category 5 and category 4 or 5; p = 0.04 for ATA category 5), the proportion of female patients (cutoff set to 79%; p < 0.01 for ACR-TIRADS for both category 5 and category 4 or 5; p = 0.01 in ATA for category 5), and the proportion of malignant nodules (cutoff set to 28%; p < 0.01 for ACR-TIRADS for both category 5 and category 4 or 5; p < 0.01 for ATA category 5) were significantly associated with study heterogeneity. For ATA with category 4 or 5 as positive, only the proportion of female patients was associated with study heterogeneity (p < 0.01). For K-TIRADS (both category 5 and category 4 or 5 as positive), the study design (prospective vs. retrospective; p < 0.01 for category 5; p = 0.02 for category 4 or 5) and the proportion of female patients (p < 0.01 for both category 5 and category 4 or 5) were associated with study heterogeneity. For EU-TIRADS (for both category 5 and category 4 or 5 as positive), the study location (p < 0.01 for both category 5 and category 4 or 5) and the proportion of malignant nodules (p = 0.04 for category 5; p < 0.01 for category 4 or 5) were significantly associated with study heterogeneity.
Discussion
The present meta-analysis investigated the diagnostic performance of four risk stratification systems using 29 studies including 33,748 thyroid nodules. Diagnostic performance was evaluated according to the reference standards of category 5 and 4 or 5. The current meta-analysis demonstrated that diagnostic performance was comparable between the four risk stratification systems for both category 5 and 4 or 5. In the subgroup analysis of nodules of 1 cm or larger, ACR-TIRADS showed lower specificity for category 5 (93% vs. 98%; p = 0.05), but higher sensitivity for category 4 or 5 (95% vs. 84%; p = 0.05) compared with K-TIRADS. In the meta-regression, the study location (East Asia vs. other countries), the proportion of female patients, and the proportion of malignant nodules were common sources of study heterogeneity. To the best of our knowledge, this is the first systematic review and meta-analysis to include the four representative US-based risk stratification systems (ACR-TIRADS, ATA, K-TIRADS, and EU-TIRADS) and perform subgroup and meta-regression analyses to evaluate the diagnostic performance of each system in the variable clinical setting. We believe that our results can not only guide clinical practice and future research but also provide information that is important when developing multinational guidelines.
This meta-analysis showed that the category-based diagnostic accuracies of the four guidelines were closely comparable. In addition, this meta-analysis showed that the sensitivity and specificity of each system varied depending on what category was set to be positive results (category 5 vs. 4 or 5), and the population in which the examinations were performed. Thus, considering these results, US practitioners may flexibly adapt each risk stratification system to the clinical setting to which they practice, considering the proportion of malignant thyroid nodules, the proportion of female patients, and geographic characteristics. Specifically, if the clinical setting is prone to increase sensitivity but decrease specificity (e.g., lower proportion of female and higher proportion of malignant nodules when using ACR-TIRADS), clinicians can select noninvasive strategy such as active surveillance for the nodules with similar category, and one can select FNAB rather than active surveillance in the opposite situation. For the proportion of malignant nodules, although in theory sensitivity and specificity should not change according to disease prevalence, in real settings, they often do vary with disease prevalence (52). For a future international TIRADS, a balanced worldwide collection of data with high and low cancer proportions will be necessary to create a clinically applicable system for both primary care and referral center settings.
Recently, the emphasis has been on the unnecessary biopsy rate in the thyroid, rather than diagnostic performance. As it is well known that most thyroid cancers have a less aggressive natural history, it is important to minimize not only false-negative rates resulting in delayed diagnosis but also unnecessary biopsy rates resulting in increased health care burden and patient anxiety, and unnecessary interventions. In this context, the concept of “active surveillance” for low-risk thyroid cancer has been increasingly recognized in the medical field (53), and one recent meta-analysis reported the pooled proportion of tumor growth (increase in maximum diameter by ≥3 mm) to be only 4.4% in low-risk papillary thyroid cancer (T1a/b, N0, M0) (54). Therefore, US practitioners, especially those who belong to the primary care setting, need to understand this concept well. The reported unnecessary biopsy rates tended to be lower in ACR-TIRADS, but great attention should be given to the interpretation since inclusion criteria for minimum nodule size varied across the studies. In addition, national/institutional policy for the biopsy might act as a confounder.
This meta-analysis has several limitations. First, the majority of the included studies (75.9%; references 22 of 29) were retrospective, implying a risk of categorization error derived from insufficient and unstandardized image acquisition during the examination. Second, all the included studies were performed at a referral center, limiting the application of our results to the primary care setting. Further studies conducted in a primary care setting are required. Third, although category-based comparisons of diagnostic performance are intuitive in their interpretation, they are inherently limited because the malignancy risks of the categories suggested in the guidelines vary. Fourth, substantial between-study heterogeneity was consistently observed in the meta-analytic calculations; we performed subgroup and meta-regression analyses, but there was still an unresolved limitation because individual patient data were not available, and the number of covariates and items to calculate outweighed the sample size. Finally, the meta-analysis for nodule size, which is an important factor when triaging nodules for biopsy, was impossible since most of the included studies emphasized only US features and not nodule size. This likely accounted for differences between the pooled data presented in this meta-analysis and the results from well-conducted studies in which nodule size was considered. Further investigation is necessary to address this issue.
In conclusion, the overall diagnostic performance of the four risk stratification systems of the representative society guidelines was comparable.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
Funding Information
No funding was received for this article.
Supplementary Material
Supplementary Figure S1
Supplementary Figure S2
Supplementary Figure S3
Supplementary Figure S4
Supplementary Figure S5
Supplementary Table S1
Supplementary Table S2
Supplementary Table S3
Supplementary Table S4
Supplementary Table S5
