Rasch analysis of the firefighters’ critical incident inventory questionnaire

Abstract

BACKGROUND:

The Critical Incident Inventory (CII) was developed to assess stressful exposures in firefighters and emergency service workers. The CII includes six subscales: trauma to self, victims known to fire-emergency worker, multiple casualties, incidents involving children, unusual or problematic tactical operations, and exposure to severe medical trauma.

OBJECTIVES:

To examine the construct validity of all subscales of the Critical Incident Inventory (CII) by assessing the unidimensionality of the scales, and the interval properties of CII subscales by examining fit to the Rasch model and ordering of item thresholds.

METHODS:

This was a secondary data analysis based on survey data collected from a sample of 390 firefighters.

RESULTS:

Item 4 and Item 20 were removed with the confirmation of unacceptable fit residual. This revised version of the CII showed satisfactory fit to the Rasch model by non-significant Chi-square test and acceptable level of item fit. We rescored the CII original version and considered all items as only dichotomous response options where 0 represented the original no experience, and 1 presents the combination of experiencing 1, 2, 3 cases.

CONCLUSION:

The re-appraisal of the revised version CII indicated a satisfactory level of Rasch model fit.

Keywords

Construct validity item thresholds psychometric measurement

1 Introduction

Firefighters may be routinely exposed to stressful events as part of their occupation. The Critical Incident Inventory was developed to assess stressful exposures in firefighters and emergency service workers and has been used to understand the potentially negative impacts of such exposures in this population [1]. The CII (firefighter specific) includes six subscales: trauma to self, victims known to fire-emergency worker, multiple casualties, incidents involving children, unusual or problematic tactical operations, and exposure to severe medical trauma [1]. Each of the sub-scales are then further into 2 –6 items of potential critical incident exposures with a total of 24 items. CII enables to record the existence of an exposure (a dichotomous response as no/yes) and frequency of exposures (None, One time, Two times, Three or more times) [1]. For scoring, a value of 0, 1, 2, or 3 is assigned to each response, respectively. The CII scale score ranges from 0 to 72and is generated based on the summed scores [1]. The trauma to self (5-items) subscale included incidents such as serious line of duty injury to self, threat of serious line of duty injury/threat of death to self, incidents necessitating search/rescue involving serious risk to yourself, direct exposure to extremely hazardous materials or to blood and body fluids. The victims known to fire-emergency worker (5-items) subscale comprised of exposures to line of duty death of a fellow emergency worker, serious line of duty injury to fellow emergency worker, threat of serious line of duty injury/threat of death to fellow emergency worker, experience of suicide or attempted suicide by fellow emergency worker or exposure to incidents with the victim(s) known to you [1]. The multiple casualties (3-items) subscale included responses to incidents involving three or more deaths, one or two deaths or multiple serious injuries. The incidents involving children (2-items) subscale consisted of responses to exposures involving serious injury/death to children or severe threat to children. The unusual problematic tactical operations (6-items) subscale included incidents requiring police protection while on duty, verbal/physical threat by public while on duty, failed mission after extensive effort, critical (negative) media interest, use of deadly force by police at an incident or critical equipment failure or lack of equipment in any of the above situations. The exposure to severe medical trauma (3-items) subscale comprised of incidents involving close contact with burned/mutilated victim, removing dead body, or prolonged extrication of trapped victim with life-threatening injuries.

The Rasch model considers the probability of a participant affirming a given item is a logistic function of the discrepancy between participant’s ability of, for instance, upper limb function (θ) and the level of work required (b) expressed by the given item [2 –4]. $Pni = \frac{e^{(θ n - bi)}}{1 + e^{(θ n - bi)}}$

Further advances in research led to expansion of the Rasch model from its dichotomous form to polytomous case by rating scale and partial credit model. The underlying assumption of the rating scale indicates that it is equidistant between thresholds across items, which is not entirely the case with the partial credit model.

Rasch analysis was established based on item response theory (IRT) [5]. The Rasch model provides a valuable methodology to examine the validity properties of patient-reported outcome measures (PROM). Rasch analysis is grounded on the assumption that the item scores on a given PROM are highly correlated with item difficulty and individual (respondents’) ability [5]. For example, participants with advanced skills and ability will perform better (get higher scores) than those with lower skills and capabilities. Rasch analysis provides researchers with the opportunity to assess the assumption of unidimensionality: that is, whether the items on any scale or subscale are assessing a single construct. It also evaluates as the structure of rating scales to propose recalibration to convert the ordinal scaling into interval-level measurement, supporting the valid summing to create a total score from individual items [4]. The original Rasch model was developed for dichotomous items, however, it has been extended to polytomous cases, also referred to as the rating scale model. The core assumption of the rating scale model is that the distance between adjacent scoring options is approximately 0.5 probability point [4]. However, this assumption does not apply to the partial credit model. Therefore, Rasch analysis with the application of the partial credit model can be applied for PROM that capture ordinal response options. For example, in the Critical Incident Inventory (CII) questionnaire, respondents (e.g. firefighters) report whether a particular critical (traumatic) event had occurred throughout their careers by indicating “No” or “Yes”, and whether it had occurred “one time,” “two times,” or “three or more times” [1]. While the scoring adds a number value to these descriptors, conducting a Rasch analysis can interrogate whether this scoring structure can discriminate between persons with low or high risk of poor psychological outcomes after traumatic stressors.

1.1 Study purpose

We aimed to apply Rasch analysis 1) to test the construct validity of all subscales of the CII by assessing the unidimensionality of the scales, 2) to examine the interval properties of the six subscales of the CII by examining fit to the Rasch model and ordering of item thresholds, 3) to examine the potential of CII score bias based on age, sex and years of service of firefighters and then explore solutions for minimizing any bias by altering the scale.

2 Methods

2.1 Sample

This study was a secondary data analysis based on survey data collected from firefighters in 160 locations across Canada. Three hundred ninety firefighters (272 males and 118 females) completed the survey –response rate 100% (–a prospective study from which we obtained the data used in this study). Institute Review Board (IRB) approval was obtained from the the Hamilton Integrated Research Ethics Board in 2019 and participants provided informed written consent to have data used in research. All data were fully anonymized before we accessed them. Previous research has indicated that a sample of 250 subjects be included at minimum for Rasch analysis to present a stable estimates of item analysis [5].

In RUMM2030, the selection of partial credit model was derived based on the significant results from likelihood ratio test [6]. The Rasch analysis involved the testing of unidimensionality fit of residual, ordering of item thresholds, Pearson separation index, differential item functioning, and dependency [5]. All the analyses were conducted by RUMM 2030 professional suite software. The significance level of ANOVA and chi-square test were set at 0.05. Bonferroni correction was applied with multiple comparisons. Class interval was set as the default setting, and then tailored based on iterations of analysis. Descriptions of the specific procedures and their rationale are listed as follow.

2.2 Test of fit

The test of fit assesses to what degree the items from the PROM fit with the expectations of the Rasch model. Fit statistics can be examined at overall and individual item levels. Concerning the overall fit, p-value from chi-square test of item-trait interaction must be non-significant after applying Bonferroni corrections [5], as a significant p-value here indicates the lack of invariance and hierarchy concern across items. In contrast, the item–person interaction statistics are transformed to estimate a z score (logits) followed a standardized normal distribution. Thus, the item mean is set at zero, and we expect a person mean of nearly zero and a standard deviation of approximately 1 to satisfy the assumption of a normal distribution. In regard to the individual level, a fit residual localised within±2.5 logits display adequate fit to the model. The Chi-square test is still presented at this level to estimate if individual person abilities varies from what is predicted. The item characteristic curve (ICC), displays the observed respondent scores plotted against the expected model curve to allow for visual inspection of the model fit [5]. In the case of a good fit, nearly the all scores (plots) would follow the expected curve. Steeper plots indicate the measure is likely over-discriminating, and vice versa. Furthermore, the graph facilitates the inspection and identification of outliers. Extreme outliers tend to influence the observed score and cause deviation from expected model and lead to misfit concerns. ICC enables the identification of such outliers and allows the misfit individuals to be removed from the analysis. From a practical and clinical point of view, respondents with low literacy, cognitive deficits and/or other co-morbid conditions may ultimately misunderstand specific questions and report extreme scores to corresponding questions.

2.3 Threshold

The threshold demarcates the point between two response categories in which either of the two responses is equally probable [5]. A disordered threshold indicates the respondents are likely to fail to discriminate between different response options. Threshold maps illustrated the relative distance between thresholds, and response plots allow for visual inspection of disordered progression of person abilities across the response options. This concern could be fixed by collapsing adjacent categories.

2.4 Targeting

Targeting is also referred to as the scale-to-sample targeting [5]. Here, researchers assess to what degree items can quantify the range of individual abilities demonstrated in the sample. The person item threshold distribution indicates the relative difficulty (item locations) and relative ability (person location) on the same scale of logits. Precise person measurement depends on how well the ranges match each other: the better the ranges match, the higher the precision. Poor targeting often leads to floor or ceiling effects.

2.5 Differential item function

Examination of DIF is used in Rasch analysis to verify if items of PROMs are stable across different subgroups of the sample [5]. For example, the probability of confirming an item must be equal for men and women in study sample. Uniform DIF takes place where divergences are consistent across individual groups (e.g. where the scoring progression for an item is different for men and women). Uniform DIF can be addressed by ‘splitting’ the item to create a unique scoring structure for these individual groups to ensure the equality of estimates (in our example, for men and women). It is important to note this does not mean there is an actual difference in the amount of the trait of interest between the groups, but rather that the groups may different appraisal systems to give ratings of this trait. Non-uniform DIF occurs from random error creating differences in item estimates between groups with no possible solution with the exception of item removal [4, 7]. The ANOVA statistic and visual inspection (ICC) will be assessed and cross referenced during the analysis.

2.6 Dimensionality

Unidimensionality (the basic assumption of Rasch model) will be assessed through the principal component analysis (PCA) within item response theory (IRT) [5]. It will verify that all items within one subscale and all subscales within the CII measure are measuring the same latent construct. Statistics will be assessed on the first important factor using independent t-test to compare positive and negative item loadings. After item reduction and the rescoring of individual response options on the CII, the PCA will be re-visited as validation of the unidimensionality [8]. We chose the number of significant t-test as larger 5% of the total comparisons as an indicator of multidimensionality.

2.7 Local independency

As the additional analysis provided by RUMM 2030, the residual correlation was displayed after PCA [5]. Factors contributing to the appearance of local dependency (LD) ultimately indicate response dependency and multidimensionality. Residual correlations between any two items greater than 0.3 indicate LD. Item deletion has been suggested as the priority solution to address LD [3]; however, combining two or more locally depend items into one ‘super item’ is another effective strategy [7].

2.8 Reliability

The person separation index (PSI) displays the level of precision of the estimate for each person and is utilized as the statistic for internal consistency under the Rasch model [5]. The acceptable threshold was set as 0.7 for treatment groups displaying the scale as sufficiently reliable to distinguish between at least two groups [4 , 9]. Furthermore, 0.8 was used as the satisfactory threshold for the traditional Cronbach’s alpha(α) calculated by the RUMM2030 program [10,11, 10,11].

3 Results

3.1 Study participants

Rasch analysis requires a full set of data without any missing values. Data from a total of 390 full-time firefighters (272 males and 118 females) was available for analysis (Table 1). In the 390 firefighters included, 376 (96.4%) reported exposure to some type of critical incident. A mean of 30 and range of 0–72 critical incidents were experienced by the firefighters over the span of their entire careers. The total number of firefighters exposed and types and frequency of exposure to each CII item, has been reported in Table 2. From the 390 respondents, 351 (90%) endorsed “respond to incident involving one or two deaths”, 314 (81%) indicated “respond to incident involving multiple serious injuries”, 312 (80%) reported “direct exposure to blood and body fluids”, and 300 (77%) indicated being exposed to incidents involving “removal of dead body or bodies”. The most infrequent event exposures were “use of deadly force by police at an incident” (7%) and “serious line of duty injury to self” (16%).

Table 1
Demographic characteristics

Men Women

Sample 272 118

Age (years) 41.0±9.5 (n = 269) 34.3±8.50

Height (m) 1.80±0.30 (n = 269) 1.67±0.07

Weight (kg) 90.4±14.0 (n = 269) 70.2±11.2

Service (years) 14.3±9.0 (n = 269) 6.80±6.0

Prevalence of critical incidents (%) 97% 93%

Ranks (n = 388; 2 missing)

1st Rank Firefighter 59

Captain/Acting Captain 113

Career Firefighter 159

Volunteer Firefighter 34

Recruit 6

Other 17

	Men	Women
Sample	272	118
Age (years)	41.0±9.5 (n = 269)	34.3±8.50
Height (m)	1.80±0.30 (n = 269)	1.67±0.07
Weight (kg)	90.4±14.0 (n = 269)	70.2±11.2
Service (years)	14.3±9.0 (n = 269)	6.80±6.0
Prevalence of critical incidents (%)	97%	93%
Ranks	(n = 388; 2 missing)
1st Rank Firefighter	59
Captain/Acting Captain	113
Career Firefighter	159
Volunteer Firefighter	34
Recruit	6
Other	17

Table 2

Number of firefighters exposed and times of exposure to critical incidents based on each item (Canada)

Critical Incident Inventory item	All Firefighters
	Yes n (%)	No n (%)	One Time	Two Times	Three or More times
Trauma to Self
1. Serious line of duty injury to self.	64 (16%)	326 (84%)	42	16	6
2. Threat of serious line of duty injury or threat of death to self (that did not result in actual serious injury).	196 (50.3%)	194 (49.7%)	60	42	94
19. Incident necessitating search or rescue involving serious risk to yourself.	232 (59%)	158 (41%)	38	54	140
22. Direct exposure to extremely hazardous materials.	204 (52%)	186 (48%)	49	35	120
23. Direct exposure to blood and body fluids.	312 (80%)	78 (20%)	29	34	249
Victims Known to Fire-Emergency Worker
3. Line of duty death of a fellow emergency worker.	134 (34%)	256 (66%)	64	32	38
4. Serious line of duty injury to fellow emergency worker (that did not result in death).	185 (47%)	205 (53%)	82	45	58
5. Threat of serious line of duty injury or threat of death to fellow emergency worker (that did not result in actual serious injury or death).	222 (57%)	168 (43%)	52	43	127
6. Suicide or attempted suicide by fellow emergency worker.	100 (26%)	290 (74%)	63	22	15
14. Victim(s) known to you.	211 (54%)	179 (46%)	70	53	88
Multiple Casualties
7. Responded to incident involving three or more deaths.	154 (39)	236 (61)	69	36	49
8. Responded to incident involving one or two Deaths.	351 (90%)	39 (10%)	31	28	292
9. Responded to incident involving multiple serious injuries (three or more victims sustained serious injuries).	314 (81%)	76 (19%)	35	41	238
Incidents Involving Children
12. Incident involving serious injury or death to children.	278 (71%)	112 (29%)	76	76	126
13. Incident involving severe threat to children (that did not result in actual serious injury or death to children).	244 (63%)	146 (37%)	59	50	135
Unusual or Problematic Tactical Operations
10. Incident requiring police protection while on duty.	262 (67%)	128 (33%)	56	76	130
11. Verbal or physical threat by public while on duty (that did not result in police protection).	230 (59%)	160 (41%)	68	64	98
15. Failed mission after extensive effort.	235 (60%)	155 (40%)	44	47	144
16. Critical (negative) media interest.	174 (45%)	216 (55%)	44	53	77
21. Use of deadly force by police at an incident.	26 (7%)	364 (93%)	20	2	4
24. Critical equipment failure or lack of equipment in any of the above situations.	132 (34%)	258 (66%)	57	38	37
Exposure to Severe Medical Trauma
17. Close contact with burned or mutilated victim.	267 (68%)	123 (32%)	63	67	137
18. Removing dead body or bodies.	300 (77%)	90 (23%)	37	46	217
20. Prolonged extrication of trapped victim with life-threatening injuries.	251 (64%)	139 (36%)	53	68	130

3.2 Test of fit

The partial credit model was selected for the current analysis since the likelihood ratio test was significant at overall questionnaire and subscales level (p < 0.001).

The initial inspection of the overall questionnaire including 24 items revealed the lack of invariance across the trait by a significant Chi-square test (χ² = 249.04, df = 144, p < 0.001) for item-trait interaction. Only one subscale, named as ‘Unusual or Problematic Tactical Operations’, demonstrated the model fit across all individual subscales. (χ² = 38.85, df = 36, p = 0.34) on initial inspection. (Table 3) The overall item fit residual showed unacceptable fit to the Rasch model as moderate deviation was found in the standardised item fit residual statistic (mean = 0.02 SD = 1.59). Similar misfit was revealed by the out-of-range mean and SD values in all 6 subscales. (Table 3)

Table 3
Overall fit statistic

N Item-person interaction Item-trait interaction Reliability Unidimensionality T -Tests (CI) (5%)

Item fit residual Person fit residual PSI Alpha

with ext w/o ext Mean SD Mean SD Chi- square df p with ext without ext with ext without ext No of sig tests total %

Total 390 374 0.02^§ 1.59 –0.13^§ 0.84^§ 249.04 144 < 0.001^* 0.88^¶ 0.89^¶ 0.91^¶¶ 0.90^¶¶ 48 374 12.83%

Trauma to self 390 343 –0.57 0.29 –0.28^§ 0.59 59.2 25 < 0.001^* 0.59 0.52 0.7 0.59 5 343 1.46%

Victims 390 306 –0.05^§ 1.67 –0.2^§ 0.71^§ 57.47 20 < 0.001^* 0.41 0.34 0.7 0.54 2 306 0.65%

Children 390 216 1.12 0.54 –0.4 1.12^§ 24.16 6 < 0.001^* 0.11 –1.18 0.72 –0.37 0 216 0.00%

Operation 390 339 –0.13^§ 0.5 –0.19^§ 0.5 38.85 36 0.34 0.55 0.48 0.68 0.56 16 339 4.72%

Exposure 390 269 0.27^§ 0.47 –0.19^§ 0.85^§ 29.37 12 0.003^* 0.35 –0.11 0.74 0.3 0 269 0.00%

Casualties 390 312 –0.87 0.36 –0.32 0.41 75.76 9 < 0.001^* 0.49 0.09 0.64 0.29 0 312 0.00%

Revised total 390 374 –0.21^§ 1.05^§ –0.2^§ 0.72^§ 127.01 110 0.13 0.86^¶ 0.83^¶ 0.88^¶¶ 0.85^¶¶ 3 374 0.80%

	N	Item-person interaction	Item-trait interaction	Reliability	Unidimensionality T -Tests (CI) (5%)
Total	390	374	0.02^§	1.59	–0.13^§	0.84^§	249.04	144	< 0.001^*	0.88^¶	0.89^¶	0.91^¶¶	0.90^¶¶	48	374	12.83%
Trauma to self	390	343	–0.57	0.29	–0.28^§	0.59	59.2	25	< 0.001^*	0.59	0.52	0.7	0.59	5	343	1.46%
Victims	390	306	–0.05^§	1.67	–0.2^§	0.71^§	57.47	20	< 0.001^*	0.41	0.34	0.7	0.54	2	306	0.65%
Children	390	216	1.12	0.54	–0.4	1.12^§	24.16	6	< 0.001^*	0.11	–1.18	0.72	–0.37	0	216	0.00%
Operation	390	339	–0.13^§	0.5	–0.19^§	0.5	38.85	36	0.34	0.55	0.48	0.68	0.56	16	339	4.72%
Exposure	390	269	0.27^§	0.47	–0.19^§	0.85^§	29.37	12	0.003^*	0.35	–0.11	0.74	0.3	0	269	0.00%
Casualties	390	312	–0.87	0.36	–0.32	0.41	75.76	9	< 0.001^*	0.49	0.09	0.64	0.29	0	312	0.00%
Revised total	390	374	–0.21^§	1.05^§	–0.2^§	0.72^§	127.01	110	0.13	0.86^¶	0.83^¶	0.88^¶¶	0.85^¶¶	3	374	0.80%

Abbreviations: PSI, person separation index; SD, standard deviation; ^§Criteria for acceptable distribution of fit residual: Mean = 0, SD = 1. ^*Criteria for significant p-value of chi-square test for item-trait interaction: p < 0.05. ^¶Criteria for acceptable level of PSI index: PS I > 0.7. ^¶¶Criteria for satisfactory level of Cronbach’s alpha: α> 0.8.

To identify the problematic items which may cause mis-fitting, the next step was to check the individual item fit statistics. At the level of the overall CII, Item 4 (Direct exposure to extremely hazardous materials) and Item 20 (Serious injury to fellow without death) were removed with the confirmation of unacceptable fit residual. At the subscales level, the fit residual was equal to 2.59 for item 14 (Victim/s known to you), revealing mild misfit within the ‘Victims Known to Fire-Emergency Worker’ subscale. This was not supported by the non-significant p value of the chi-square test. No other individual item misfit was noted in the rest of items at subscale level. This revised version of the CII shows satisfactory fit to the Rasch model by non-significant Chi-square test (χ² = 127.01, df = 110, p = 0.13), acceptable level of item fit (mean = –0.21, SD = 1.05) and person fit (mean = –0.2, SD = 0.72). All resultant individual item fit statistics met with the Rasch expectation (Table 4).

Table 4

Individual item fit statistic in original CII with item reduction and rescoring strategy

Item	Location	SE	FitResid	DF	ChiSq	DF	Prob	Item modification	Subscale
Item 1	1.5	0.09	–0.23	272	3.25	5	0.662	Rescore to 0,1 Merge subscales	Trauma to Self
Item 2	0	0.05	–0.95	272	14.72	5	0.012
Item 19	–0.34	0.05	–0.71	272	11.06	5	0.05
Item 22	–0.13	0.05	–0.61	272	12.69	5	0.026
Item 23	–1.03	0.06	–0.35	272	17.48	5	0.004
Item 3	0.27	0.06	–0.07	242	4.84	4	0.304		Victims Known to Fire-Emergency Worker
Item 4	–0.05	0.06	–1.81	242	27.91	4	< 0.002^*	Item removal
Item 5	–0.58	0.05	–1.2	242	15.32	4	0.004	Rescore to 0,1 Merge subscales
Item 6	0.69	0.07	0.25	242	3.15	4	0.533
Item 14	–0.33	0.06	2.59^§	242	6.25	4	0.182
Item 7	1.39	0.07	–0.45	205	22.73	3	< 0.002^*		Multiple Casualties
Item 8	–1.01	0.07	–1.04	205	14.2	3	0.003
Item 9	–0.38	0.06	–1.12	205	38.83	3	< 0.002^*
Item 12	–0.09	0.07	1.5	106	6.3	3	0.098		Incidents Involving Children
Item 13	0.09	0.07	0.74	106	17.86	3	< 0.002^*
Item 10	–0.67	0.05	0.52	280	6.43	5	0.267		Unusual or Problematic Tactical Operations
Item 11	–0.42	0.05	–0.08	280	7.74	5	0.171
Item 15	–0.63	0.05	–0.49	280	3.73	5	0.589
Item 16	–0.15	0.05	0.34	280	5.82	5	0.324
Item 21	1.59	0.14	–0.29	280	2.74	5	0.74
Item 24	0.28	0.06	–0.79	280	8.25	5	0.143
Item 17	0.14	0.06	0.69	177	5.63	4	0.229		Exposure to Severe Medical Trauma
Item 18	–0.39	0.06	–0.23	177	14.33	4	0.006
Item 20	0.26	0.06	0.34	177	9.42	4	0.051	Item removal

^§Fit residual within±2.5. ^*Significant P value after Bonferroni correction at 0.002.

3.3 Threshold

Initially, no items were found to be ordered on the threshold map in the original version of CII and its subscales, necessitating rescoring. Ultimately, rescoring supporting considering all items as requiring only dichotomous response options where 0 represented the original no experience, and 1 presents the combination of experiencing 1, 2, 3 cases. The new threshold map is presented in Fig. 1.

Fig. 1

Threshold map for revised CII.

3.4 Targeting

Figure 2 shows the targeting of the revised CII scale. The mean of person logits is 0.13 indicating the total score of original CII from the participants is slightly higher than the target of the scale (Fig. 2).

Fig. 2

Person-item threshold distribution.

3.5 Differential item function

In compliance with the Rasch model, all continuous variables including age, service years were transferred to categorical variables according to their 25, 50, 75 percentiles. Therefore, the personal factors for DIF analysis were set as four groups of service years (0 to 4, 5 to 11, 12 to 18, and 19 to 34 years), four groups of age (20 to 31, 32 to 39, 40 to 47, and 48 to 60 years old). The questionnaire data reported sex as two groups (male versus female). DIF was examined using both statistical critical values (after Bonferroni correction) and visual inspection (ICC curve). The visual inspection was facilitated by plotting the item characteristic curve along with the person trait for given person factors (See Fig. 3). Under the revised version of CII, uniform DIF was found in item 2 (Threat of serious injury/death to self), item 19 (Incident involving serious risk to yourself), item 6 (Suicide or attempted suicide by fellow), and item 17 (Close contact with burned) across male and female groups. Uniform DIF was also detected in item 12 (Incident involving serious injury or death to children) for different levels of service years. Item splitting by sex and service year groupings resolved all uniform DIF issues (Table 5).

Fig. 3

Uniform DIF presented in Item 2: Threat of serious line of duty injury or threat of death to self (that did not result in actual serious injury).

Table 5

DIF summary based on the revised CII

Item description		DIF	Person factors	Solution
item 2	Threat of serious injury/death to self	Uniform DIF	Sex (male and female)	Item split
item 19	Incident involving serious risk to yourself)
item 6	Suicide or attempted suicide by fellow
item 17	close contact with burned
item 12	Incident involving serious injury or death to children	Uniform DIF	Groups of service year	Item split

3.6 Dimensionality

The whole CII questionnaire failed to meet the unidimensionality criterion as 12.83% of the independent t-test was found to be significant at 5% level. When dimensionality was inspected at the subscale level, the percentage of significant t-tests of all subscales were all lower than 5%, indicating the unidimensionality of each CII subscale. After item reduction and response rescoring, the overall CII demonstrated unidimensionality (Table 3).

3.7 Local independency

The assumption of local independence was mildly violated between Item 8 (Responded to incident involving one or two deaths) and item 9 (Responded to incident involving multiple serious injuries) (r = 0.35) in the revised CII. Since the two items represented diverse conceptual meanings, we elected not to delete or create a ‘super item’ (Table 6). Further, as both items are on the same subscale, creating a summary score for that subscale has the same statistical effect a ‘super-item’.

Table 6
Residual correlation based on the revised CII.

Item 1 2 9 22 23 3 5 6 14 7 8 9 12 3 10 11 15 16 21 24 17 18

1 1.00

2 –0.07 1.00

9 –0.04 0.08 1.00

22 0.05 –0.01 0.06 1.00

23 –0.09 –0.01 0.01 0.16 1.00

3 –0.05 –0.09 –0.09 –0.09 0.00 1.00

5 –0.06 0.24 0.03 –0.04 –0.13 –0.05 1.00

6 –0.06 –0.20 –0.13 –0.01 0.02 0.09 –0.06 1.00

14 –0.14 –0.10 –0.12 –0.10 –0.01 0.00 –0.03 0.02 1.00

7 0.03 –0.07 –0.07 –0.13 –0.11 0.01 –0.10 –0.01 –0.01 1.00

8 –0.01 0.01 –0.01 0.00 –0.06 –0.11 –0.12 –0.10 0.03 0.02 1.00

9 –0.01 –0.05 –0.04 –0.07 –0.13 –0.03 –0.07 –0.15 0.11 0.09 0.35^* 1.00

12 –0.06 –0.03 –0.08 –0.13 –0.08 –0.02 –0.08 –0.12 –0.06 0.04 0.03 –0.05 1.00

3 –0.01 –0.06 –0.14 –0.09 –0.12 –0.09 0.07 –0.09 0.05 –0.05 –0.03 0.01 0.19 1.00

10 –0.01 –0.11 0.04 –0.02 –0.09 –0.05 –0.16 –0.12 –0.12 –0.13 –0.05 –0.01 –0.05 –0.04 1.00

11 0.00 –0.12 –0.12 –0.07 –0.02 –0.07 –0.10 –0.02 –0.06 –0.05 0.00 –0.02 –0.09 –0.05 0.11 1.00

15 0.02 –0.07 0.01 –0.03 –0.09 –0.11 –0.08 –0.08 –0.08 –0.13 0.06 –0.05 –0.02 –0.11 0.00 –0.05 1.00

16 –0.15 –0.05 –0.12 –0.09 –0.07 –0.03 –0.01 0.09 –0.04 –0.15 0.03 –0.07 –0.13 –0.12 –0.05 0.05 0.03 1.00

21 –0.06 –0.06 –0.02 –0.07 –0.09 –0.09 –0.02 –0.05 –0.05 –0.01 –0.10 –0.13 0.06 0.00 0.04 –0.04 –0.02 –0.03 1.00

24 –0.10 0.02 0.03 0.03 0.06 –0.13 –0.11 –0.10 –0.08 –0.06 –0.03 –0.11 –0.12 –0.05 0.01 –0.02 0.02 0.03 0.00 1.00

17 –0.07 –0.04 0.01 –0.08 –0.09 –0.05 –0.05 –0.09 –0.07 –0.10 –0.03 0.07 0.08 0.03 –0.09 –0.03 –0.08 –0.09 0.03 –0.15 1.00

18 0.05 –0.07 –0.03 –0.10 0.01 –0.06 –0.10 –0.06 –0.04 0.04 –0.06 –0.01 –0.02 –0.08 –0.09 –0.10 –0.09 –0.09 –0.02 0.04 0.07 1.00

^*residual correlation > 0.03 or < –0.03.

3.8 Reliability

Both reliability statistics achieved satisfactory levels for the original (PSI = 0.88, α= 0.91) and revised (PSI = 0.86, α= 0.88) versions of the CII. Moreover, according to previous studies, a PSI value above 0.8 indicates the ability of discriminating between at least 3 groups [9]. The PSI was not applicable for each subscale due to the limited number of items.

4 Discussions

To our knowledge, there is no previous study utilizing same analytic strategy on the Critical Incident Inventory to examine 1) the construct validity including unidimensionality and internal consistency, 2) threshold of the response option, 3) and the impact of personal factors such as age, sex, and years of service. The findings of the current study contribute to the body of psychometric validation literature supporting the CII and enhance its application in both research and clinical settings.

The CII in its original format using a 4-point ratings scale for all 24 items summed to calculate a total score, failed to meet the expectations of scale measurement and demonstrated a significant misfit to the Rasch model. All six subscales individually achieved the acceptable values for the unidimensionality test, however, the combination of all 24 items did not support this assumption. Since there were no studies that had previously analyzed the CII under item response theory to examine if the scale represents a single underlying factor, results from our study warrant confirmation by explanatory/confirmatory factor analysis.

With the consideration of both Rasch statistics (threshold map) and the occupational context of firefighting, we merged the adjacent response options of the original CII questionnaire. Firefighters may experience difficulty recalling the number of critical incidents during the extreme working conditions that they are constantly involved in over the course of their career. Such long recall intervals may be particularly prone to response error or bias [12]. Furthermore, their primary focus revolves around the task at hand –the rescue of potential victims, extinguishing the fire, maintaining ventilation, hauling equipment, and forcible entry operations [1]. These physically and cognitively demanding tasks are performed in evolving, risk-intense environments. Other Rasch statistics including over fit, individual fit, and unidimensionality all met the expectations. Therefore, the proposed revised version of the CII, which contains 22 items with dichotomous response options, fits the Rasch model.

Based on previous studies, the value of mean and SD for item-person interaction should achieve 0 and 1 to maintain the normal distribution of residuals respectively [13]. Throughout our analyses, we monitored these values after every iteration of the item reduction and response rescoring. As a result, the distribution of residual was improved for the revised CII with an acceptable level of mean and SD.

Within RUMM2030, the various configurations of the class interval results in different over fit statistics. In addition, no unified rules exist in the literature to support the choice of class interval setting. According to the operation manual [5], the default class interval should be set as 10 and endorsement on each interval should be close to 50. However, we failed to maintain the default setting through our analyses since endorsements on each class interval were not evenly distributed. Therefore, we decided to adjust the setting according the PSI value (0.86) since it indicated that the revised CII has sufficient ability to discriminate between 3 groups.

4.1 Clinical implications

Items 8 and 9 should not be considered in isolation –testing of alternate forms where 9 does not always follow 8 could be instructive on the nature and persistence of the dependency. Evidence indicates that subjective experience of intense fear and helplessness in response to critical incidents have been considered as factors that contribute to posttraumatic stress disorders (PTSD) [14]. Our Rasch analysis and subsequently, the introduction of the new CII version, may potentially further improve the predictive validity of PTSD and various mental health disorders by increasing the precision of the scores.

4.2 Research implications

Informed by the results of our study, future studies could perform field testing for the revised version of CII to establish the measurement properties (reliability) under classical test theory (CTT) and consider order dependency related to the order of answering individual items [15]. Cognitive debriefing of each individual item is also essential to examine the content validity of CII, as such evidence is lacking [5]. This would further be informative to understand potential threats to reliability in the new scoring scheme.

5 Conclusions

Out Rasch analysis indicated that the original CII with 24 items failed to meet the expectations of Rasch model including the appearance of multi-dimensionality, misfit statistics, and disordered thresholds. After item reduction of item 4 and item 20, rescoring the original response options into dichotomous format, the revised version of CII showed acceptable level of the fit to the Rasch model. To address the DIF issues, several items need to be split by sex groups (items 2, 19, 6, 17) and service year (items 12). The re-appraisal of the revised version CII indicated a satisfactory level of Rasch model fit. There is a need of further field-testing to establish the psychometric properties of the revised CII under Classical test theory.

Conflict of interest

None to report.

Funding

This work was funded by the Ontario Ministry of Labour (FRN #13-R-027).

References

Monnier

, Cameron

, Hobfoll

, Gribble

. The impact of resource loss and critical incidents on psychological functioning in fire-emergency workers: A pilot study. International Journal of Stress Management. 2002;9(1):11–29.

Pallant , Julie

, Alan

Tennant

. An Introduction to the Rasch Measurement Model: An Example Using the Hospital Anxiety and Depression Scale (HADS). British Journal of Clinical Psychology. 2007;46(1):461–18. https://dx-doi-org.web.bisu.edu.cn/10.1348/014466506X96931

Andrich , David , Curt Hagquist . Real and Artificial Differential ItemFunctioning in Polytomous Items. Educational and PsychologicalMeasurement. 2015;75(2):185–207. https://dx-doi-org.web.bisu.edu.cn/10.1177/0013164414534258

Cano

, Barrett

, Zajicek

, Hobart

. Beyond the Reach ofTraditional Analyses: Using Rasch to Evaluate the DASH in Peoplewith Multiple Sclerosis. Multiple Sclerosis Journal. 2011;17(2):214–22. https://dx-doi-org.web.bisu.edu.cn/10.1177/1352458510385269

Tennant

, Horton

, Pallant

. Introductory rasch analysis: a workbook. Leeds, UK: Department of Rehabilitation Medicine, University of Leeds; 2011.

Nilsson , Åsa Lundgren , Alan Tennant . Past and Present Issuesin Rasch Analysis: The Functional Independence Measure(FIM^TM) Revisited. Journal of Rehabilitation Medicine. 2011;43(10):884–91. https://dx-doi-org.web.bisu.edu.cn/10.2340/16501977-0871

Jerosch-Herold , Christina , Rachel Chester , Lee Shepstone , Joshua Vincent

, Joy

. MacDermid. An Evaluation of the Structural Validityof the Shoulder Pain and Disability Index (SPADI) Using the RaschModel. Quality of Life Research. 2018;27(2):389–400. https://dx-doi-org.web.bisu.edu.cn/10.1007/s11136-017-1746-7

Covic , Tanya , Julie Pallant

, Philip Conaghan

, Alan Tennant . ALongitudinal Evaluation of the Center for EpidemiologicStudies-Depression Scale (CES-D) in a Rheumatoid ArthritisPopulation Using Rasch Analysis. Health and Quality of LifeOutcomes. 2007;5:1–8. https://dx-doi-org.web.bisu.edu.cn/10.1186/1477-7525-5-41

Gothwal , Vijaya

, Thomas Wright , Ecosse Lamoureux

, Konrad Pesudovs . Psychometric Properties of Visual Functioning Index UsingRasch Analysis. Acta Ophthalmologica. 2010;88(7):797–803. https://dx-doi-org.web.bisu.edu.cn/10.1111/j.1755-3768.2009.01562.x

10.

Packham , Tara , Joy Macdermid

. Measurement Properties of thePatient-Rated Wrist and Hand Evaluation: Rasch Analysis of Responsesfrom a Traumatic Hand Injury Population. . Journal of Hand Therapy. 2013;26(3):216–24. https://dx-doi-org.web.bisu.edu.cn/10.1016/j.jht.2012.12.006

11.

Gliem , Joseph

, Rosemary Gliem

. Calculating, Interpreting, and Reporting Cronbach’s Alpha Reliability Coefficient for Likert-Type Scales. 2003 Midwest Research to Practice Conference in Adult, Continuing, and Community Education Calculating. 2003;82-88. https://doi.org/10.1016/B978-0-444-88933-1.50023-4

12.

Streiner

, Norman

. Health 575 Measurement Scales: A Practical Guide to Their Development and Use. 4th ed. Oxford, UK: Oxford University Press; 2008.

13.

Lambert , Sylvie

, Julie Pallant

, Allison Boyes

, Madeleine King

, Benjamin Britton , Afaf Girgis . A Rasch Analysis of theHospital Anxiety and Depression Scale (HADS) among Cancer Survivors. Psychological Assessment. 2013;25(2):379–90. https://dx-doi-org.web.bisu.edu.cn/10.1037/a0031154

14.

Nazari

, MacDermid

, Sinden

, D’Amico

, Brazil

, Carleton

, Cramm

. Prevalence of Exposure to Critical Incidents in Firefighters Across Canada. WORK. 2019.

15.

Schwarz

. Self-reports: How the questions shape the answers. American Psychologist. 1999;54(2), 93–105. https://dx-doi-org.web.bisu.edu.cn/10.1037/0003-066X.54.2.93

Item	1	2	9	22	23	3	5	6	14	7	8	9	12	3	10	11	15	16	21	24	17	18
1	1.00
2	–0.07	1.00
9	–0.04	0.08	1.00
22	0.05	–0.01	0.06	1.00
23	–0.09	–0.01	0.01	0.16	1.00
3	–0.05	–0.09	–0.09	–0.09	0.00	1.00
5	–0.06	0.24	0.03	–0.04	–0.13	–0.05	1.00
6	–0.06	–0.20	–0.13	–0.01	0.02	0.09	–0.06	1.00
14	–0.14	–0.10	–0.12	–0.10	–0.01	0.00	–0.03	0.02	1.00
7	0.03	–0.07	–0.07	–0.13	–0.11	0.01	–0.10	–0.01	–0.01	1.00
8	–0.01	0.01	–0.01	0.00	–0.06	–0.11	–0.12	–0.10	0.03	0.02	1.00
9	–0.01	–0.05	–0.04	–0.07	–0.13	–0.03	–0.07	–0.15	0.11	0.09	0.35^*	1.00
12	–0.06	–0.03	–0.08	–0.13	–0.08	–0.02	–0.08	–0.12	–0.06	0.04	0.03	–0.05	1.00
3	–0.01	–0.06	–0.14	–0.09	–0.12	–0.09	0.07	–0.09	0.05	–0.05	–0.03	0.01	0.19	1.00
10	–0.01	–0.11	0.04	–0.02	–0.09	–0.05	–0.16	–0.12	–0.12	–0.13	–0.05	–0.01	–0.05	–0.04	1.00
11	0.00	–0.12	–0.12	–0.07	–0.02	–0.07	–0.10	–0.02	–0.06	–0.05	0.00	–0.02	–0.09	–0.05	0.11	1.00
15	0.02	–0.07	0.01	–0.03	–0.09	–0.11	–0.08	–0.08	–0.08	–0.13	0.06	–0.05	–0.02	–0.11	0.00	–0.05	1.00
16	–0.15	–0.05	–0.12	–0.09	–0.07	–0.03	–0.01	0.09	–0.04	–0.15	0.03	–0.07	–0.13	–0.12	–0.05	0.05	0.03	1.00
21	–0.06	–0.06	–0.02	–0.07	–0.09	–0.09	–0.02	–0.05	–0.05	–0.01	–0.10	–0.13	0.06	0.00	0.04	–0.04	–0.02	–0.03	1.00
24	–0.10	0.02	0.03	0.03	0.06	–0.13	–0.11	–0.10	–0.08	–0.06	–0.03	–0.11	–0.12	–0.05	0.01	–0.02	0.02	0.03	0.00	1.00
17	–0.07	–0.04	0.01	–0.08	–0.09	–0.05	–0.05	–0.09	–0.07	–0.10	–0.03	0.07	0.08	0.03	–0.09	–0.03	–0.08	–0.09	0.03	–0.15	1.00
18	0.05	–0.07	–0.03	–0.10	0.01	–0.06	–0.10	–0.06	–0.04	0.04	–0.06	–0.01	–0.02	–0.08	–0.09	–0.10	–0.09	–0.09	–0.02	0.04	0.07	1.00