Abstract
Self-report screening is an important element of transcultural research. Problems concerning illiteracy, cultural sensitivity, and possible misunderstandings have been handled differently in different settings. The aim of this study was to evaluate the validity of two well-known instruments: the Hopkins Symptoms Check List (HSCL-25), and the Harvard Trauma Questionnaire (HTQ, Part IV), with a sample of 160 unaccompanied asylum-seeking adolescents from Afghanistan and Somalia. Assessments were performed 4 months after arrival in Norway, and the screening instruments were presented to the informants on computers with touch-screen function, using the program MultiCASI. Sound-files in the native languages of the informants appeared simultaneously with the written items and could be repeated by touch. We found that the screening procedures were well received and understood by the informants regardless of reading and writing abilities. Agreement between diagnoses (CIDI) and screening results were similar to other studies. Computer-based assessment in this setting was practical, cost effective, and can be recommended.
Introduction
Studies from different countries throughout the world have gradually expanded our knowledge about the complex life situations of asylum seekers and refugees. We know that they have a greater risk for developing mental health problems compared to the host populations and to other migrant populations (Bean, Derluyn, Eurelings-Bontekoe, Broekaert, & Spinhoven, 2007a). We also know that adverse life events, together with postmigration stressors, are important factors in the totality of burdens they have to endure (Silove, Sinnerbrink, Field, Manicavasagar, & Steel, 1997). Transcultural psychiatric research has sought to explore the universal cross-cultural use of psychiatric diagnostic categories while also making room for alternative explanations of misfortune, illness, and symptom expression (Wintrob, 2013). Diagnostic instruments, with cut-off values, have often been used for identification of persons who are at risk for developing a certain disease (Mollica, McDonald, Massagli, & Silove, 2004; Vervliet et al., 2014). The consistency and validity of these cut-off values depend, among other things, on how well scores reflect the underlying traits that are being measured.
A main topic in the field of cross-cultural research has been the prevalence of psychiatric disorders in relation to migration issues (Laban & van Dijk, 2013), and the quality of research has been improving over the years. Initiatives like the International Test Commission (International Test Commission, 2010) and the Translation Monitoring Form (van Ommeren et al., 1999) have been helpful in order to minimize the impact of cultural differences and establish cross-cultural equivalence on all levels of assessment. As a result of this, researchers can make meaningful comparisons of the prevalence, severity, and trajectories of universally known conditions between ethnic groups and over time (Achenbach, 2010). In spite of all these efforts, there are still several challenges connected to the cross-cultural assessment of psychiatric diagnoses, ranging from methodological issues to criterion validity.
Self-report instruments may be easy to administer, and less costly than clinical interviews, but the method is limited by the subject’s ability to read and understand the items (van Ommeren, 2003). Self-report instruments are sometimes filled in by helpers or teachers in addition to the subjects themselves. This help often includes detailed explanations or reading and writing aid in cases of illiteracy (Bean et al., 2007a). In some of these cases one can hardly call the result “self-report.” Researchers sometimes solve this problem by excluding informants who cannot read and write on their own (Oppedal, Seglem, & Jensen, 2009). Others decide to treat all self-report instruments as if they are structured interviews and read them out loud to all informants in order to avoid differences in assessment procedures (Verduin, Scholte, Rutayisire, Busschers, & Stronks, 2014).
In an often cited article, Flaherty et al. (1988) proposed a “stepwise validation for cross-cultural equivalence” (p. 258). One of the five steps, called “technical equivalence,” refers to describing the exact method of assessment that is used and whether or not it produces the same kind of data in different settings. Filling in a self-report instrument by oneself, with pencil and paper, may be different from receiving reading and writing aid from an interpreter and still different from doing it in a group setting. Literacy is a skill that research subjects master differently and it can be challenging to know where to draw the line between illiterate and literate in a concrete situation. How researchers deal with this is often not discussed in detail in publications and this makes it difficult to determine how the technical quality of the studies has been secured.
When describing the evaluation of technical equivalence, van Ommeren et al. (1999) emphasize the possible presence of unacceptable or offensive items in the psychometric instruments. They refer specifically to items about sensitive matters, such as sexual behaviour or the use of illegal drugs, and recommend rephrasing the items or sometimes deleting them. Even so, after local health workers and lay people have determined that the content and meaning of each remaining item is relevant to a new cultural setting, it may still be unacceptable to say some of these words out loud in the test situation. This can become a concern in studies that seek to determine the prevalence of psychiatric diseases, related to the use of cut-off values (Jakobsen, Johansen, & Thoresen, 2011). In a recent study with adult asylum seekers, our research group found that the screening instruments performed very differently in two subgroups with different language backgrounds. One likely explanation was that one of the groups had a high rate of illiteracy and had difficulties understanding the items, even when interpreters were present and rephrased the items. Another possible explanation was that informants underreported their symptoms when asked directly by the interpreters, because it was socially unacceptable for them to respond otherwise (Jakobsen et al., 2011).
To facilitate standardized data collection, mental health researchers have introduced computer-administered clinical rating scales (Kobak, Greist, Jefferson, & Katzelnick, 1996), and have gathered positive responses from participants who often prefer the computer for reporting about sensitive or illegal behaviours and symptoms (C. F. Turner et al., 1998). The Berlin Center for Torture Victims has taken this method further and developed a computer-based tool (Multilingual Computer Assisted Self-Interview [MultiCASI]) that can be used to present the same self-report instruments to a group of informants with several different languages (Knaevelsrud & Müller, 2008). Each item appears on the screen together with a sound-file that can be activated by touch. This reduces the problems connected to limited reading or writing skills (Hahn, Choi, Griffith, Yost, & Baker, 2011; Knaevelsrud, Wagner, Karl, & Müller, 2007) and makes it possible for informants to answer sensitive questions in private.
In the ongoing process of validation, the reliability of the instrument is important, but not sufficient to make inferences about prevalence rates (van Ommeren, 2003). If the instruments are to be used to screen subjects according to specific psychiatric illnesses, criterion validity must be established and the psychometric properties need to be evaluated against a more reliable diagnostic procedure (Switzer, Wisniewski, Belle, Dew, & Schultz, 1999). The relationship between the screening instruments and a diagnosis by independent health workers who are trained in using a diagnostic interview can determine how well the instruments fit independent criteria for the same phenomena.
Several studies have examined the rates of psychological distress in different samples of refugee youth using self-report instruments with estimates of diagnostic “caseness” based on cut-off values from percentile-based estimates or from earlier studies (Bronstein, Montgomery, & Dobrowolski, 2012; Derluyn, Broekaert, Schuyten, & Temmerman, 2004; Heptinstall, Sethna, & Taylor, 2004; Hodes, Jagdev, Chandra, & Cunniff, 2008; Vervliet et al., 2014). Studies that both conduct diagnostic interviews and actually measure the possible agreement between screening and clinical diagnosis are sparse. Some report that the screening instruments used to identify probable psychiatric illness tend to overestimate the prevalence compared to the diagnostic procedures and give a higher estimate of caseness (Sandanger et al., 1998; S. W. Turner, Bowie, Dunn, Shapo, & Yule, 2003).
The aim of our study was to explore the criterion validity of some widely used self-screening instruments, the Hopkins Symptoms Check List (HSCL-25; Mollica et al., 1996) and the Harvard Trauma Questionnaire (HTQ, Part IV; Mollica et al., 1992), compared with psychiatric diagnoses on the basis of structured clinical interviews administered by trained clinicians. However, in an attempt to include informants regardless of reading and writing skills, we wanted to approach the testing procedures differently than in earlier refugee studies and administer the instruments by the use of MultiCASI. We also wanted to evaluate the feasibility of computer-based screening with a group of young asylum seekers with limited school background. To our knowledge, this is the first study of HSCL HTQ among refugee youth validated by clinical interviews.
Materials and methodology
Participants
Age and education of unaccompanied refugee minors in three language groups arriving in the host country (N = 160).
The Regional Medical Ethics Committee, South-East Norway approved this study. Informed consent was obtained from all participants.
Measures
Screening instruments included the Hopkins Symptom Checklist-25 (HSCL-25; Mollica et al., 1996), which is a self-administered questionnaire designed to measure anxiety and depression. The HSCL-25 has been validated in various clinical and community samples (Hollifield, Warner, & Lian, 2002; Silove et al., 1997). A version has also been applied in a number of refugee studies with minors (Bean, Derluyn, Eurelings-Bontekoe, Broekaert, & Spinhoven, 2007b; Bean, Eurelings-Bontekoe, Derluyn, & Spinhoven, 2004; Bronstein, Montgomery, & Ott, 2013). The translated HSCL versions used in our study were developed by Centrum 45 (www.centrum45.nl). These are the same versions that were developed and validated in the study from Tammy Bean and coauthors (2004) referred to throughout this paper . They state that no written back-translations were done, but instead an oral item-by-item analysis, with trained interpreters, took place. The internal consistency (Cronbach's alpha) varied between .86 to .94 for the internalizing symptoms of the HSCL-25 in different language versions (Bean et al., 2007b).
A mean score greater than 1.75 on a range from 1 (not bothered) to 4 (extremely bothered) is thought to indicate “clinically significant distress.” However, different cut-off values have been used in different samples. Studies involving refugee adolescents have found that cut-off values of ≥ 2.0 indicated the possibility of a clinically meaningful condition (Bean et al., 2007a; Vervliet, 2013). The basis of this cut-off value was a division into percentiles in a population where no clinical diagnosis or standardized diagnostic interview was available (Bean et al., 2004).
The Harvard Trauma Questionnaire (HTQ; Mollica et al., 1992) is a comprehensive screening instrument that was developed to assess potentially traumatic experiences and posttraumatic symptoms in various cultural contexts. Its psychometric properties were first established in a highly traumatized clinical population, but have also been used in larger community samples and with asylum-seeking adolescents (Hodes et al., 2008; Jones & Kafetsios, 2005).
Different cut-off values have been used in different studies (Jakobsen et al., 2011), but a cut-off value of ≥ 2.0 has been recommended to signal possible PTSD in a large nonclinical study (Silove et al., 2007).
The HTQ Part IV comprises 30 symptom items, of which the first 16 items constitute the Posttraumatic Symptom Scale (PTSS), a measure of symptoms of PTSD according to the DSM-IV (American Psychiatric Association [APA], 1994). Each symptom was related to the previous week’s experiences and rated using a 4-point Likert scale ranging from 1 (not at all) to 4 (extremely). The HTQ versions were obtained from earlier studies (Jakobsen et al., 2011; Kleijn, Hovens, & Rodenburg, 2001). All instruments had been translated by certified translators and reviewed by other professional interpreters. Cronbach’s alpha for the PTSS in the samples varied from under .80 to .92. The criterion-validity results were mixed as the Somali group showed a very weak concordance between screening cases and diagnosed PTSD, probably because of weak reading skills.
The chosen screening instruments were combined into a single questionnaire using the program MultiCASI (Knaevelsrud & Müller, 2008). The items appeared one after the other on the screen along with alternative responses. The participants received the questionnaires in their native languages by using laptops with touch-screen function. All text had a sound-file connected to it that started as soon as the item appeared on the screen and the sound of each item could be activated by touch as many times as necessary. The test could be completed without any previous reading experience. Items could be skipped and left unanswered, but would then be repeated once more towards the end of the questionnaire. The first introduction of the computer-based self-screening was done shortly after arrival with one language group at a time. An interpreter was present with a maximum of three participants as they were instructed in how to use the touch screen. They were encouraged to ask clarifying questions as they answered the items, all in the same room, with earphones on, in order not to disturb each other. The results were transferred digitally to SPSS files.
The Composite International Diagnostic Instrument (CIDI) is a structured diagnostic interview that was developed by the World Health Organization in collaboration with the U.S. Mental Health Administration Task Force. Previous research has documented the reliability and validity of the interview (Wittchen et al., 1991). In this study, each person was interviewed using the modules for depression, anxiety, and PTSD in a fixed sequence. The category depression comprised the DSM-IV diagnoses: major depressive disorder, dysthymic disorder, and mood disorder with depressive features due to general medical condition. The category of anxiety comprised the DSM-IV diagnoses: agoraphobia, generalized anxiety disorder, social anxiety disorder, or panic disorder. The numbers and percentages are published elsewhere (Jakobsen, Meyer DeMott, & Heir, 2014).
The CIDI was administered 4 months after arrival to Norway, when all participants had prior experience of the screening procedure and could manage the computers without the aid of an interpreter. The CIDI interviews were performed by health professionals who were trained and certified in the use of CIDI and interpreters were present, either in person or by telephone, during the whole interview. In cases of doubt, the professionals discussed individual cases until consensus was reached.
Results
All participants completed the touch-screen assessment. Less than 1% of the items were left unanswered. The population consisted of 97 (60.6%) participants who were illiterate and 59 (36.9%) participants with sufficient reading abilities according to self-evaluation (missing/literacy not answered: 4[2.5%]). Calculations were done separately for these groups in order to investigate possible differences in internal consistency of the scales between the participants who only had an auditory understanding of the items and the participants who could both read and listen during the assessment.
Internal reliability assessed by Cronbach’s alpha for the HSCL-25 was .94 and for the PTSS-16 was .89, in the whole study population. Data from the literate group (n = 59) yielded a Cronbach’s alpha of .96 for the HSCL-25 and .91 for the PTSS-16. Comparative data from the illiterate group (n = 97) were: HSCL-25, .93 and PTSS-16, .88.
Validation
Self-report measurements, using a cut-off value of ≥ 2.0 for all three screening instruments, identified anxiety in 80 (51.0%) cases and depression in 80 (51.0%) cases according to the respective parts of HSCL-25. Posttraumatic stress was estimated in 97 (60.2%) of the cases, according to PTSS-16. The CIDI interviews, however, found a considerable lower rate of psychiatric morbidity (Jakobsen et al., 2014). The most prevalent diagnosis was posttraumatic stress disorder (PTSD), that was present in 48 (30.6%) of the cases. Depression was diagnosed in 26 (16.3%) and anxiety disorder in 13 (8.1%) of the cases.
Agreement between screening instruments and CIDI diagnosis of depression, anxiety, and PTSD in unaccompanied refugee minors (N = 160).
Note. CIDI: Composite International Diagnostic Interview; HSCL 15: Hopkins Symptom Checklist (only the 15 items representing depressive symptoms); HSCL 10: Hopkins Symptom Checklist (only the 10 items representing anxiety symptoms); PTSS 16: Posttraumatic Stress Scale, identical with the first 16 items from Harvard Trauma Questionnaire, Part IV; PPV: positive predictive value is the proportion of patients with positive test results who are correctly diagnosed; NPV: negative predictive value is the proportion of patients with negative test results who are correctly diagnosed.
Agreement between screening instruments and CIDI in subsamples of unaccompanied refugee minors.
Note. CIDI: Composite International Diagnostic Interview; HSCL 15: Hopkins Symptom Checklist (only the 15 items representing depressive symptoms); HSCL 10: Hopkins Symptom Checklist (only the 10 items representing anxiety symptoms); PTSS 16: Posttraumatic Stress Scale, identical with the first 16 items from Harvard Trauma Questionnaire, Part IV; PPV: positive predictive value is the proportion of patients with positive test results who are correctly diagnosed; NPV: negative predictive value is the proportion of patients with negative test results who are correctly diagnosed.
The prevalence of depressive disorder according to the CIDI interviews was 16.3%. According to the ROC curve, the best fit using the 15 items for depression was also achieved with a cut-off value of 2.17. This gave a sensitivity of .71 and a specificity of .66. ROC area for depression was .75. Overall diagnostic efficiency for depression was .57.
The prevalence of PTSD according to the CIDI interviews was 30.6%. For the PTSS-16 scale, a best fit was achieved with a cut-off value of 2.23. This score gave a sensitivity of .80 and specificity of .64. ROC for PTSD: .75. Overall diagnostic efficiency for PTSD was .59.
Discussion
In this study, we found that young asylum-seeking participants were able to complete the computer-based assessment by themselves, regardless of how they rated their own reading and writing abilities or how many years of formal school they had. There were almost no missing items and few complaints about the questionnaire being too long or too difficult to understand.
Overall, our experience with the HSCL-25 and PTSS, two widely known screening instruments, administered with the use of MultiCASI was encouraging insofar that the approach was easy to administer and overall cost-effective. Exporting results directly to the statistics program was easy and saved time. Cronbach’s alpha for the HSCL-25 was .94 for the whole group, which is similar or better than earlier studies, and strengthens our impression that this testing method was well received by our population. The reliability tests focusing on results from the subjects with low literacy gave very acceptable results and give further support to the usefulness of the MultiCASI technology.
The need for interpreters was minimal. We used interpreters at the introduction of the testing in order to make sure that the instructions were understood, but answering items at the time of the validation study was done privately because informants had been through the assessment procedure at least once and, regardless of reading abilities, were able to answer the questions without the aid of an interpreter. One implication of this is that comparison between tests at different points in time would be quite reliable, since the wording of each item would remain identical.
Inconsistent results in all areas of research may result from variations in sampling or procedures. Misunderstandings and other technical difficulties also can introduce systematic error. Since so many studies rely on “written” self-rating, there is reason to believe that the subjects that participate in self-report studies are either among those confident in their own reading and writing abilities or they have been given a lot of guidance in the process. While this can be viewed as a reasonable strategy for minimizing errors in assessment, it can also introduce new inconsistencies. The validity of ad hoc translations during testing is uncertain, since the wording may vary from one setting to another, while the researcher, who usually does not know the language, has no ability to identify and correct the variation. It is difficult to know what impact this problem may have had on the results of earlier studies. In the worst case scenario, misunderstandings may lead to unreliable results.
Limitations
The study sample was recruited from adolescent asylum seekers who arrived in Norway over a period of 2 years. Refugee areas of origin vary over time and thus, our results may not be representative for populations of refugees from other parts of the world. Also, the specific cut-off values were derived from the best fit for this nonclinical sample and may not be appropriate in another situation such as a clinical setting.
The strength of this study is the use of structured diagnostic interviews with trained health professionals and the unlimited use of time and interpreters. The recruitment procedures provided a representative sample of refugee minors from the chosen countries. We employed widely used translated instruments from different sources, where the recommended procedures for cross-cultural adaptation not necessarily had been documented. This makes it easier to compare our results with earlier studies, but may also explain some of the weak agreement between screening instruments and diagnoses.
The self-report instruments were completed by the informants themselves, regardless of literacy, with the MultiCASI method. There was no need to exclude subjects with low literacy or treat them differently than other participants. We believe that this improved the validity of the testing procedure.
The life situation of the participants together with the distance, both physically and culturally, from their countries of origin, gave us limited access to objective data concerning their backgrounds. The informants themselves were the only source of information, and the clinical assessments may have been influenced by inaccurate data. However, the health professionals performing the CIDI interviews were all experienced clinicians who had worked extensively with refugees using interpreters and this likely improved the validity of the diagnostic procedures.
Conclusion
Refugee adolescents are known to encounter numerous risk factors that can cause psychological distress, including exposure to violence and multiple losses. In this study we compared the results from computer-based psychiatric symptom screening with structured diagnostic interviews in a nonclinical sample of adolescents. The diagnostic precision of the instruments, using a cut-off value of 2.0, was comparable to other studies based on paper-and-pencil screening. Results from the Somali group were even more precise. We also found that raising the cut-off value improved the diagnostic precision in this sample. In this respect, the participants resembled a clinical sample, more than a nonclinical sample (Mollica et al., 1992).
Using the same screening instruments for research across populations and societies can facilitate comparison. At the same time, if clinicians and researchers around the world are to assess their populations and share the results, they need to know that the available instruments are evidence-based, and validated. A precise understanding of the language used for assessment is a basic condition for high-quality research. Validation of research measures is an ongoing process, and our experience suggests that computer-based testing, with touch screen and sound-files, can be an important methodological step forward.
Footnotes
Acknowledgements
A special thanks to Liv Berit Løken for her care and assistance with all aspects of the data collection, Lars Erik Eide Johansen for his assistance with the diagnostic interviews, to our very skilled interpreters, and to all the young participants.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was partially funded by the Norwegian Directorate of Immigration.
