Abstract
There are no validated measures of psychiatric disability for traumatized refugees in Western psychiatric care. This is a serious shortcoming as it precludes monitoring of global treatment outcomes in this group, as well as appropriate matching of treatment needs to the disability levels. Using Rasch analysis, we evaluated the psychometrics of the Health of Nation Outcome Scales (HoNOS) in pretreatment data of consecutive refugee patients (N = 448) from a Danish psychiatric clinic. Then, we carried out a cross-validation of the pretreatment HoNOS model on posttreatment data from the same group. A revised 10-item HoNOS fit the Rasch model at pretreatment and also showed excellent fit within the cross-validation data. Culture, gender, and need for translation did not exert serious bias on the measure’s performance. The results establish good monitoring properties of the 10-item HoNOS as the first validated measure of psychiatric disability for traumatized refugees in Western psychiatric care.
Keywords
Traumatized refugees in Western countries often have diverse psychiatric problems, the most common being posttraumatic stress disorder (PTSD), anxiety, and depression (Fazel, Wheeler, & Danesh, 2005). In addition, traumatized refugees often have other complaints such as chronic pain, social isolation, cognitive impairment, psychosis-like symptoms, as well as problems regarding family and occupational roles (Carlsson, Olsen, Mortensen, & Kastrup, 2006; Hondius, van Willigen, Kleijn, & van der Ploeg, 2000; Norredam, Jensen, & Ekstrom, 2011; Olsen, Montgomery, Bojholm, & Foldspang, 2007). The complex problems that span over several areas of functioning call for assessment of psychiatric disability in traumatized refugees (defined as the sum of impairments in the biological, psychological, and social functioning; World Health Organization [WHO], 2001). However, reliable psychological assessment of traumatized refugees in Western treatment in general presents a great challenge. Most standard measures are translated into multiple refugee languages in the clinics. Given the large differences in culture, values, education, social norms, and so on between the host countries and the refugees’ countries of origin, the validity and reliability of these translated measures are often questionable. In the meantime, very few currently applied translations of measures for traumatized refugees have undergone any psychometric validation (Hollifield et al., 2002). Also, to the best of our knowledge, there are no validated measures (self-report orrated measures) of any global psychiatric outcome for this group.
Assessment of psychiatric disability is generally inherent in Western psychiatric care, because such disability is often related to different risk and protective factors than those related to the development of specific psychiatric symptoms. Recovery from psychiatric disability is therefore usually also known to lag behind that of symptoms of specific mental disorders (Narrow & Kuhl, 2011). Hence, assessment of psychiatric disability in addition to the symptoms of specific mental disorders is important in guiding individual, as well as political decisions about treatment needs and prognosis in different groups of psychiatric patients. The lack of appropriate measures of psychiatric disability creates extensive problems in the monitoring of treatment progress and identification of appropriate treatment goals in traumatized refugees in the West. Lack of such measures also leads to the ineffective utilization of psychiatric and social services as the provision of these services cannot be matched to the need for help (based on the degree of disability) in a consistent manner. Want of these measures comes with a high price for society, and an even higher price for the affected individuals.
A valid measure of psychiatric disability in traumatized refugees will have to address several complex challenges. Some of these are as follows: (1) First, the many diverse symptoms and social problems of the traumatized refugees should optimally be captured by just one measure; (2) the measure should ideally be short and easy to apply, because the need for translation often doubles the assessment time in clinical refugee settings; (3) the measure should be cross-culturally validated, as the use of the same measure for patients with many different cultural backgrounds often results in insecurities with the clinicians about its appropriateness; and (4) finally, to facilitate its use in everyday clinical practice, a routine monitoring measure should be easy to score, and the scores should be easily interpretable to the clinicians.
The Health of Nation Outcome Scales (HoNOS) is an instrument used for routine treatment monitoring in psychiatry. It has good, concurrent, content, and predictive validity as well as adequate interrater reliability and sensitivity to change (Pirkis et al., 2005). It is widely used in Western countries, and it has been validated in Danish (Bech et al., 2003). However, regarding the internal structure of the HoNOS, a number of competing factor structures have been suggested. These include a four-factor structure that corresponds to HoNOS’ original subscales (Pirkis et al., 2005), as well as a five-factor structure in which the organic and social subscales retain their conventional form, but the self-harm item from the behavior subscale becomes a part of the depression/psychological subscale, and the hallucinations/delusions item comprises a “subscale” of its own (Eagar, Trauer, & Mellsop, 2005). In addition, Rasch analyses of the HoNOS indicate a lack of unidimensionality of a higher order construct of psychiatric disability in Italian psychiatric populations (Lovaglio & Monzani, 2011, 2012). This would imply that the HoNOS in general does not measure a single underlying construct of psychiatric disability across psychiatric populations with different diagnoses.
On the other hand, the HoNOS has obvious appeal for use with traumatized refugees in Western countries. First, with regard to content validity as a broad psychiatric measure, it is able to encompass the many problems in treatment-seeking refugees. Second, being an observer-rated measure, it does not have to be translated into the many refugee languages. Third, as a routine measure, it is short and is easy to apply.
The goal of this study is to use routinely collected monitoring data to assess the quality of the HoNOS as a monitoring tool for the assessment of psychiatric disability in traumatized, treatment-seeking refugees in Western psychiatric care. Furthermore, this study aims to shed light on the general issues of unidimensionality that have been raised in the literature concerning the HoNOS. Rasch analysis will be used for these purposes because it facilitates the analysis of features that are uniquely associated with measurement challenges in clinical refugee settings. The influence of cultural diversity and the need for interpretation of the HoNOS’s performance will also be evaluated.
Method
Participants and Procedures
A total of 448 patients from three departments (Aarhus, Horsens, Randers) in a Clinic for PTSD and Transcultural Psychiatry (CPTP), Aarhus University Hospital (Denmark), were included in the study. CPTP is a specialized service within the psychiatric services of the Danish mental health system. It covers the catchment area of the second largest city in Denmark, two or three smaller towns, and the surrounding rural areas. Individuals are referred to CPTP on the suspicion of refugee trauma-related psychiatric problems of diverse kinds. Furthermore, all individuals must fulfill the diagnostic criteria for one or more of the following International Statistical Classification of Diseases and Related Health Problems ((10th rev.; WHO, 1994) diagnoses: “Depressive disorders” (F32-34); “Neurotic, stress-related and somatoform disorders” (F40-49—most often PTSD = F43.1); and “Enduring personality change after catastrophic experience” (F62.0). Individuals who have a primary diagnosis for a psychotic disorder and severe substance abuse are not a part of the CPTP’s target group and are referred to other treatments.
Data collection began in May, 2009, and continued until April, 2012. All patients who started and finished treatment during this 3-year period were eligible as participants for this study. The HoNOS was used to assess patients at two time points: pretreatment and posttreatment. The measure was administered by 11 different psychologists. The same psychologist rated the measures at both time points. Demographic information was collected during the initial interview. The study was approved according to the Aarhus University Hospital’s ethical rules for analysis of data collected as a part of the routine practice.
Measures
The HoNOS is a 12-item, observer-rated scale designed to reflect psychological symptoms as well as behavioral, organic, and social problems in psychiatric patients (Wing, Curtis, & Beevor, 1996). It is scored on a scale from 0 to 4, where higher values indicate higher impairment. In theory, the total score reflects the level of psychiatric disability. A validated Danish version of the HoNOS (Bech et al., 2003) was applied in this study. Ordinarily, the HoNOS should be rated soon after the first patient contact; however, the specific point of first assessment and subsequent monitoring frequency are dependent on the treatment setting. In the present study, the psychologists were allowed up to four sessions (including the initial assessment session) with the patient before rating the pretreatment HoNOS. This was deemed necessary in the refugee treatment setting as the need for translation reduces the amount of information that can be conveyed during a single session. Furthermore, the patient and the psychologist have to spend some time together so that the impact of cultural differences between the rater and the patient can be appraised.
Data Analysis
The Rasch Model
The Rasch model (Rasch, 1960) is part of the framework of item response theory (IRT; Hambelton, Swaminathan, & Rogers, 1991). It describes the association between a person’s level of an underlying trait (e.g., general psychiatric impairment) and the probability of a specific item response on a measure of that trait. This association places the individual level of the underlying trait and the item difficulty of a specific measure on the same metric. Observed data are tested against the assumptions of the Rasch model. If the assumptions are met, the raw score of a scale can be said to accurately reflect the severity of that underlying trait. In the practice of psychiatry, this permits the direct interpretation of test scores as ratings of severity, as well as a more precise monitoring of patients, than that which is possible using measures built on the assumptions of classical test theory (Hays, Morales, & Reise, 2000). An extension of the Rasch model to items with more than two response options (polytomous items)—the partial credit model (PCM; Masters, 1982)—was applied in the study.
There were two main motives for selecting the PCM over other IRT models. The first and most important is that fit to the PCM means that the obtained raw scores represent a sufficient statistic (e.g., Rasch, 1960). That is, the person’s total score contains all relevant information within the specific context about that individual, and the item total score contains all relevant information about the item. This is important in the given setting because raw total scores are used by clinicians in making clinical decisions. Other IRT models require a weighting of items, which entails using a scoring algorithm. The second reason that the PCM was chosen is because it provides a more comprehensive evaluation of the validity of the scale: Fit to the PCM requires that the data fulfill a number of rigorous criteria. These evaluation criteria—which will be discussed in detail below —include unidimensionality, item fit, and item invariance. General fit statistics are often reported in addition to these.
General Model Fit Statistics
The most common general fit statistic reported in Rasch analyses is the chi-square statistic, which reflects the property of invariance across the trait. The property of invariance requires that all items differentiate equally well in relation to different levels of the underlying trait. This is a necessary condition for the summation of item scores to an interval-level total score. A significant chi-square violates the requirements of the Rasch model because it indicates that the hierarchical ordering of the items varies across the trait. Item–person interaction statistics are often also reported. These statistics are transformed into a z score, which represents the standardized normal distribution. Thus, when items and persons fit the Rasch model, a mean of approximately zero and a standard deviation of one are expected. A significance value of .5, with a Bonneforroni adjustment to account for the number of hypotheses tested, was used in this study.
Unidimensionality
A fundamental assumption of the Rasch model is unidimensionality, which implies that the items of a scale measure a single underlying construct. One way of testing the unidimensionality assumption is by testing for the local independence of the items (Wright, 1996). Item dependency occurs when items are redundant or linked in some way such that the response on one item determines the response on another. Thus, local independence implies that there should be no further associations between items other than random associations once the Rasch factor (i.e., the underlying trait) has been taken into consideration. Local dependence is assessed using the residual correlation matrix. Items with residuals over 0.2 are typically labeled as being locally dependent. A formal test of unidimensionality (Smith, 2002) was also used in the study. This test uses the first residual factor in a principal components analysis (of residuals) to determine two groups of items; those with positive and those with negative loadings. Each set of items is then used to make an estimate of psychiatric disability for each person in the sample. Given that the items form a unidimensional scale, it is expected that there should not be much difference between the person estimates from the two item subsets. An independent samples t test is used to determine whether there is a significant difference between these two estimates. This was repeated for each person with the expectation that the percentage of tests lying outside the range of 1.96 to 1.96 should not exceed 5%. A confidence interval for the binomial test of proportions for the observed number of significant tests was also applied. If the value does not overlap the 5% expected value, then the scale is said to be unidimensional.
Item fit
Item fit is concerned with whether or not items fit the unidimensional Rasch model. A commonly used method for assessing item fit is the chi-square statistic. The chi-square statistic compares the difference between observed values and expected values for groups representing different severity levels (class intervals) across the trait being measured (psychiatric disability). Residuals in the range of ±2.5 indicate a good fit, whereas significant chi-square statistics indicate a misfit.
Item invariance
Item invariance requires that item estimation be independent of the subgroups of individuals completing the measure. In other words, item parameters have to be invariant across populations (Bond & Fox, 2001). Items that do not demonstrate invariance are commonly referred to as exhibiting differential item functioning (DIF). DIF occurs when different subgroups of individuals within a sample (e.g., persons with different cultural backgrounds) have different scores on specific items despite there being equal levels of impairment on the underlying trait. The presence of DIF is detected through the analysis of variance of item scores across each level of the person factor (e.g., cultural background), and it is indicated by a significant main effect of the person factor.
In addition to these three criteria, inappropriate category ordering (i.e., disordering of thresholds) was assessed. Optimally, every response option for each item should correspond to a distinct portion of the impairment continuum. Disordering of thresholds occurs when certain response options are not endorsed well, or when response options are difficult to differentiate from one another. Disordering of thresholds can be solved by collapsing or recoding redundant categories. It is important, particularly in clinical practice, that measures are appropriately targeted to assessment populations. Poorly targeted measures often result in floor or ceiling effects. Well-targeted measures have a mean person location score close to zero, which represents the mean location of the items. Negative mean person locations indicate that the group has a lower level of psychiatric disability than the mean of the scale, whereas positive mean person locations indicate a higher level of psychiatric disability. Reliability is assessed using a Person Separation Index (PSI) as well as Cronbach’s alpha (with a similar interpretation for the two measures).
Rasch Analysis of the HoNOS Data
The highest percentage of missing data at the variable level was 16%. Little’s MCAR test indicated that the data were missing completely at random. Missing data were imputed in SPSS 20.0 using the expectation–maximization algorithm (Bunting, Adamson, & Mulhall, 2002). Data were then fitted to the Rasch model using RUMM2030 software (Andrich, Sheridan, & Luo, 2010). As to the pretreatment data, items with disordered thresholds were recoded into appropriate categories, and the general fit of the data was then assessed. Revisions were made to the pretreatment data in order to obtain a scale that fitted the Rasch model. The posttreatment data were used to cross-validate the fit of the revised HoNOS in order to determine whether it had the same psychometric properties at posttreatment than at pretreatment. Stable psychometric properties at pretreatment and posttreatment indicate good treatment monitoring properties. The impact of cultural diversity on the measure’s performance was assessed by dividing the patients into two large cultural subgroups, the “Middle East” (65.2% = Iraq, Iran, Lebanon, Jordan, Syria, Kuwait, Yemen, Afghanistan) and the Balkans (25.8% = Bosnia and Herzegovina, Kosovo, Serbia, Montenegro, Macedonia, Croatia), as well as an additional, smaller heterogeneous group (8.9% = Somalia, Sri Lanka, Indonesia, Congo, Vietnam, Ethiopia, Georgia, Colombia).
Results
Approximately 93% of all eligible patients were included in the study. Of these, 45.5% were female, 38.6% had at least 10 years of formal education, 12.1% had 1 to 5 years of education, and 8.5% had no education. With regard to exposure to severe war trauma, 31.3% reported having been tortured, 35.7% reported imprisonment (not mutually exclusive). The mean length of resettlement time in Denmark at the start of treatment was 12.5 years (SD = 6.4, range = 0-29 years) and 63.4% used an interpreter. The mean HoNOS score at pretreatment was 15.7 (SD = 5.6), indicating very high levels of psychiatric disability in this sample. Items 1, 3, 6, 8, and 11 had disordered thresholds. The disordering of thresholds was mainly caused by overlap in certain response categories. That is, Response Category 1 (minor problem requiring no action) was redundant in Items 3, 6, and 8, and Response Category 3 (problem of moderate severity) was redundant in Items 1 and 11. Accordingly, the thresholds were collapsed before the initial Rasch analysis (see the appendix for details).
Initial Fit of the HoNOS to the Rasch Model
The initial analysis of the fit of the pretreatment data to the Rasch model showed a significant item–trait interaction. This suggests some misfit between the data and the model. The residual mean value for the items was −0.10 (SD = 1.9). The residual mean value for persons was 0.22 (SD = 0.84). This indicates no serious misfit between the participants in this sample and the model. Lack of invariance of item difficulty across the scale was indicated by a significant chi-square value, χ2 = 155.04, degrees of freedom (df) = 60, p < .00001. With regard to reliability, the PSI statistic and Cronbach’s alpha were both .78. This indicates acceptable person separation and internal consistency reliability of the HoNOS. The fit of the individual items revealed that Item 5 (physical illness) and Item 10 (problems with activities of daily living) deviated significantly from the partial credit Rasch model (see column 3 in Table 1). The positive fit residual value for Item 5 suggested low levels of discrimination. The content of Item 5 (physical illness) is not necessarily central to the theoretical construct of psychiatric disability and was, therefore, eliminated from the scale in order to assess whether a measure that fitted the Rasch model could be obtained. The negative fit residual value for Item 10 (problems with activities of daily living) suggested that this item had higher levels of discrimination than the scale’s remaining items. A possible source of deviation could be local dependence or multidimensionality. An analysis of the residuals between Item 10 and the remaining items did not indicate any local dependence. However, the content of Item 10 suggests that this item is quite broad, which could mean that clinicians use it as a general summary item. To fit the data to the model, Item 10 was removed from the scale (the implications of this are addressed in the Discussion section). Furthermore, Item 2 (nonaccidental self-injury) had a significant chi-square value. However, the residual value of the item did not indicate any problems with over- or underdiscrimination. Due to repeated hypothesis testing, a significant chi-square for Item 2 may have been caused by chance. Therefore, Item 2 was retained in the revised version of the HoNOS. The results from follow-up analyses indicated that the item fit the Rasch model (see columns 5-8 in Table 1).
Overview of Item Fit in the Three Rasch Analyses.
Note. HoNOS = Health of Nation Outcome Scales. Values in boldface represent significant deviations from the model expectations.
Fit of the 10-Item HoNOS to the Rasch Model in the Pretreatment Data
The revised 10-item HoNOS fit the Rasch model in the pretest sample. The residual mean value was −0.29 (SD = 0.88) and −0.24 (SD = 0.74) for items and persons, respectively. There was also a nonsignificant chi-square interaction of χ2 = 73.39, df = 50, p = .0173, which indicates invariance of item difficulty across the scale. In addition, all 10 items of the revised HoNOS had appropriate fit with nonsignificant residuals (columns 4 and 5 in Table 1). The average mean person location value of −1.25 suggests that patients had a slightly lower level of general psychiatric impairment than the average of the scale items. Significant positive local dependence in the form of a residual correlation of 0.24 was found only between Item 11 (problems with living conditions) and Item 12 (problems with occupation and activities), which indicates that these items have a higher correlation than what is expected by the Rasch model.
The unidimensionality test (Smith, 2002) of the revised 10-item HoNOS resulted in 27 of 448 significant tests, with a confidence interval between 0.04 and 0.08. As this does not exceed the 5% level, the test failed to reject the unidimensionality assumption. The revision of the HoNOS resulted in a PSI statistic of 0.74 and Cronbach’s alpha of .73, indicating that it had retained acceptable person separation and internal consistency reliability. The 10-item HoNOS had DIF by gender on Item 1 (aggressive behavior), f(1) = 16.60, p < .0005, favoring males; and Item 7 (problems with depressed mood), f(1) = 11.57, p < .0005, favoring females. That is, males were more likely to endorse Item 1 (aggressive behavior) than females, despite having the same level of the latent trait of psychiatric disability . The opposite was true for Item 7 (problems with depressed mood), where females were more likely to endorse the item compared to males with the same level of the latent trait of psychiatric disability. Item 12 (problems with occupation) had DIF by culture, f(1) = 14.69, p < .0005, favoring the refugees from the Balkan countries compared to the Middle Eastern countries. None of the items had significant DIF by translator. This revised version of the HoNOS, with a maximum total score of 32, was used in the remaining analyses and is referred to hereafter as the Refugee HoNOS.
Cross-Validation: Assessing the Fit of the Refugee HoNOS Within the Posttreatment Data
It was important to carry out a cross-validation of the Refugee HoNOS in order to ensure that the suggested changes did not just represent empirical adjustments and that the measure had retained its psychometric properties when retested in the same population after treatment. The cross-validation results provided excellent support for the validity of the Refugee HoNOS. The residual mean value for items and persons was −0.46 (SD = 1.10) and −0.23 (SD = 0.66), respectively. There was also a nonsignificant chi-square interaction of χ2 = 82.50, df = 90, p = .70, indicating invariance of item difficulty across the scale. All items apart from Item 9 had appropriate fit at posttreatment. Item 9 had a slightly lower residual but not a significant chi-square value (see Table 1). The Refugee HoNOS resulted in a PSI statistic and Cronbach’s alpha of .80. There was no local dependence with residual values over 0.2, indicating that the high local dependence found between Items 11 and 12 at pretest may be due to chance. The unidimensionality test resulted in 4.69% significant t tests, which is below the 0.05 value. Significant DIF for cultural origin was found for Item 7 (problems with depressed mood), f(1) = 11.84, p < .000, favoring the refugee sample from the Middle East; Item12 (problems with occupation), f(1) = 8.41, p < .001, favoring the sample from the Balkan countries. Significant DIF for translator was found on Item 2 (nonaccidental self-injury), f(1) = 13.39, p < .0005, favoring the group that did not have a translator; Item 4 (cognitive problems), f(1) = 13.67, p < .0005, favoring the group that did have a translator. No DIF was found across gender groups.
Discussion
Overall, the results of the present study provide support for the application of the Refugee HoNOS as a measure of psychiatric disability in treatment–seeking, traumatized refugee populations in Denmark. They also demonstrate the utility of Rasch analysis in providing detailed and valuable information about the function of every item, the measure’s overall targeting for the group, and its performance in complex clinical conditions requiring cross-cultural use and interpretation. The Refugee HoNOS was found to be stable across different test points, which supports the construct validity and reliability of this measure. Furthermore, given that DIF was only found consistently across the measurement points on Item 12 (problems with occupation and daily activities), the DIF on the remaining items could be coincidental. One possible explanation for why Item 12 is dependent on amount of psychiatric disability as well as cultural origin is that educational level and labor market participation before resettlement may play a role in clinicians’ evaluations of occupational problems in refugees. Hence, individuals originating from countries that have a higher level of education and where women are more integrated into the labor market (e.g., the Balkans) are more likely to be judged by clinicians as having fewer occupational problems. We suggest that Item 12 be retained in the Refugee HoNOS as the evaluation of occupational problems is an important clinical aspect of psychiatric disability. Moreover, the item does not appear to threaten the overall validity of the measure. Clinicians should be aware of the possibility that this item can introduce some rater bias if there are large ethnic differences in the educational and work norms among patients. Raters can also be trained to assess this item more closely with regard to the underlying dimension (i.e., focus on rating occupational problems only as a function of other problems on the HoNOS, and disregard knowledge of education and prior work experience in their decision making).
Alongside the established construct validity and reliability for this measure, the results indicate that the Refugee HoNOS has very good psychometric properties and that it can be used for treatment monitoring in traumatized refugees. Although our validation of this measure was carried out on treatment-seeking refugee patients in Denmark, the validation was estimated on a well-represented sample of patients from standard treatment facilities in the country. Therefore, it is likely that the psychometric properties of the Refugee HoNOS can be generalized to refugee patients in the other parts of Denmark and possibly to other European countries given similar demographic compositions.
Considerations for Clinical Application of the Refugee HoNOS
With regard to Item 10 (problems with activities and daily living), from a clinical perspective the exclusion of an item with higher levels of discrimination seems counterintuitive; however, it makes good sense in terms of the paucity of measurements. That is to say, if one were to assess disability due to psychiatric illness using only one item, then Item 10 would probably provide a good general picture of the construct. This is the reason why Item 10 functions well on its own but not in relation to the other items. The question that naturally arises now is “Why not use this item on its own to assess general psychiatric impairment?” The reason is that measures preceding the HoNOS (e.g., the Global Assessment of Functioning, which represents a complex, global disability concept by using a single numerical indicator; American Psychiatric Association, 2000) have problems in the reliability of scoring, and have limited clinical utility (Aas, 2010). Given that the remaining 10 items of the HoNOS all measure aspects of the same global construct, “problems with activities and daily living (due to psychiatric illness),” they actually create a more clinically meaningful profile of problematic disability areas. Thus, the evaluation of Item 10 becomes clinically redundant, and in terms of the Rasch model, it threatens the fit of the scale and hinders the summation of items to an interval-level total severity score.
Although from a psychometric perspective Item 5 (physical illness or disability problems) does not appear to fit the same dimension of psychiatric disability as the remaining 10 items of the HoNOS, assessment of physical problems is clinically relevant for many traumatized refugee patients. The most plausible explanation for why this conceptually important item does not appear to fit psychometrically well with the other items is that physical problems in this patient group require complex evaluation, which cannot be accomplished through the use of a single item. The causes of physical problems in traumatized refugee patients can be attributed to a number of different underlying mechanisms such as chronic pain from torture, ill health as a result of having had a strenuous life, and psychosomatic reactions, as well as different combinations of these. Depending on the type of underlying mechanism, the relationship between Item 5 and the remaining items of the HoNOS is likely to change resulting in unstable estimates of the underlying dimension of psychiatric disability. Therefore, the evaluation of the Refugee HoNOS can be supplemented by other more specific measures of physical impairment for patients with very salient physical problems.
The recoding of thresholds for some of the HoNOS items meant that infrequently endorsed score options were collapsed into larger categories. The same pattern of collapsing was necessary at both pre- and posttreatment, which indicates that some parts of the original score continuum of the HoNOS have rarely been applied by clinicians in the present refugee population. Although it is possible that the scoring continuum may have been incorrectly applied by clinicians, this is not the most likely explanation, as this study had 11 different raters who almost never applied certain categories. Therefore, the recoding of thresholds in the Refugee HoNOS appears to have had the primary effect of accomplishing a better targeted measure of the level of psychiatric disability in this population. Furthermore, the recoding of thresholds in the Refugee HoNOS improves the fit of the measure to the Rasch model and results in desirable psychometric properties.
HoNOS training practices are built around clinical differentiation of problems that are important to distinguish in psychiatric patients in general. The current results indicate that all these clinical distinctions are not important in the group of traumatized refugees. However, due to training implications, we recommend that the Refugee HoNOS is scored on the original 0 to 4 scale in everyday clinical practice. That is to say, the raters are currently trained to do general psychiatric ratings on the HoNOS: Therefore, the rating of the Refugee HoNOS according to the collapsed thresholds would require new training practices. Also, it will ultimately narrow down the competence of the raters to the traumatized refugee patients only. This is costly to implement, and at the same time, the narrowing down of competencies is, in general, not desirable. The most obvious solution is therefore to maintain the current clinical HoNOS rating practices and rescore the Refugee HoNOS numerically for specific purposes. That is, for purposes of obtaining a more precise measurement and monitoring of traumatized refugees in psychiatric care, the 10-item HoNOS should be numerically rescored, as shown in the appendix.
Implications for the Unidimensionality of the HoNOS
The 10-item Refugee HoNOS—which consists of 3 items for behavioral, 1 item for physical, 3 items for psychological, and 3 items for social problems—was found to be unidimensional in the present sample. However, previous studies concerning the dimensionality of the HoNOS show that the HoNOS is not unidimensional in Italian psychiatric patients (Lovaglio & Monzani, 2011, 2012). One possible explanation for the dissimilarity between the present findings and those of previous studies is that the unidimensionality found in the current sample is present because of refugee patients’ overall high-problem profile on the HoNOS. Our study involving this same refugee sample indicates that the average ratings on the HoNOS are comparable to those of Danish inpatients—specifically addiction and dementia patients who have the highest HoNOS ratings (Palic, Kappel, Nielsen, Carlsson, & Bech, 2014). Thus, the total ratings of the traumatized refugees on the HoNOS (and most of the HoNOS’ 12 impairment domains) are significantly higher than the ratings of large groups of Danish in-patients with affective, anxiety, and personality disorders, as well as schizophrenia. The structure of the HoNOS in the Italian study may, therefore, be a reflection of the patient composition in the Italian psychiatric system. Patients with schizophrenia were overrepresented as compared to patients with all other psychiatric diagnoses in both Italian samples (Lovaglio & Monzani, 2011, 2012). Hence, the HoNOS was psychometrically reduced to a measure of four social problems and problems with cognition and hallucinations (Lovaglio & Monzani, 2012). Indeed, this is an item constellation that reflects the general representation of impairment regarding schizophrenia. The Italian studies also applied factor analytic approaches to test the dimensionality of the HoNOS; however, factor analysis is based on principles of correlation between items. Therefore, the clustering of items may reflect similar patterns of endorsement difficulty in the items of a scale, rather than the underlying dimensionality (Nunnally & Bernstein, 1994), in which case the “dimensionality” of the measure again becomes a reflection of the dominant problem areas in the population of interest.
The dimensionality of the 10-item Refugee HoNOS should be tested in other psychiatric populations with overall high-problem profiles. An example could be patients with double diagnoses, whose addiction and psychiatric comorbidity create extensive behavioral and social problems. Another example could be patients with psychiatric problems due to severe childhood maltreatment. These patients usually also have overall high-problem profiles that span across the areas of behavioral inhibition, psychiatric comorbidity, and impaired social functioning due to compromised socioemotional and physiological development in relation to severe early trauma. Furthermore, early childhood maltreatment and trauma are known risk factors for many psychiatric diagnoses and are rather prevalent in psychiatric patients (MacMillan et al., 2001).
Finally, the collapsing of infrequently used score options could be used as a way of establishing a broader application of the 10-item HoNOS as a measure psychiatric disability in different psychiatric populations. The original 12-item HoNOS is conceptually intended to evaluate the typical areas of concern for psychiatric patients. Although they are all clinically relevant in relation to psychiatric disability, the levels of impairment on specific items will probably vary as a consequence of the patients’ diagnoses. Therefore, in measures of complex global constructs, the recoding of items into appropriate categories allows for the better targeting of measures to specific diagnostic populations while retaining sound measurement of global problem areas, which are important outcomes in psychiatry. If the 10-item HoNOS can be applied to other psychiatric populations with overall high-problem profiles, this may lead to the development of other versions of the 10-item HoNOS that have different rescoring algorithms for different primary diagnoses.
Limitations
As this study was of naturalistic design, data were collected as a part of the standard evaluation of the CPTP’s services and interrater agreement was, therefore, not established. However, the psychometric stability of the measure that is demonstrated in this study could represent an indirect indicator of relatively consistent ratings. It would have been hard to establish psychometric stability had the 11 different raters been making very different evaluations. Other validity criteria, including the clinical utility of the HoNOS and the differentiation of HoNOS profiles from those of other psychiatric patients for the same sample, are presented in a different study (Palic et al., 2014). The small size of the heterogeneous group in the DIF analysis of cultural origin probably means that it is difficult to detect a possible cultural bias of this group on the Refugee HoNOS. Claims of the measure’s cultural stability are, therefore, best substantiated in the two larger cultural groupings—the Middle East and the Balkans groups. Future studies should also examine the overall clinical utility of the Refugee HoNOS in relation to its ability to predict service utilization and long-term outcomes.
Conclusions
This study presents the Refugee HoNOS as a first-ever validated measure for monitoring of psychiatric disability in traumatized refugees in Western psychiatric care. As indicated in this study, the majority of the participants had been resettled in Denmark for over a decade. Therefore, it is more correct to characterize the participants in this sample as former refugees (they were interacting with different social and health professionals in Denmark on the same level as other Danish patients). Thus, the sound assessment of psychiatric disability appears to be important for appropriate service utilization and the overall protection of this highly vulnerable psychiatric population. Rasch analysis lends itself well to validation of measures in highly complex clinical settings, such as clinics for traumatized refugees in Western countries. There are also obvious benefits of applying DIF analysis in Rasch models for testing the translation equivalence of self-report measures in refugee settings. Better validated measures, known to have robust performance in refugee settings, are a prerequisite for better clinical studies and informed decisions on behalf of this patient group.
Footnotes
Appendix
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
