Abstract
Objective:
The objectives of this study were to examine how often clinicians judged youths or caregivers to not be credible informants, to identify the associated features of youth or caregiver credibility, and to examine credibility's impact on the validity of mood and behavior checklists.
Background:
Clinicians often have the experience of talking to a parent or a youth and judging that the credibility of the information offered is unusually poor. Little is known about the correlates of poor credibility or about the extent to which credibility changes the validity of commonly used checklists.
Methods:
Interviewers rated the credibility of 646 youths aged 5–18 and their primary caregivers after completing a Kiddie Schedule for Affective Disorders and Schizophrenia. Ratings and diagnoses were blind to the behavior checklists completed by caregivers, youths, and teachers. A subset of youths also had intelligent quotient tests and behavioral observations available.
Results:
Caregivers were perceived as more credible on average than youths, though this dropped sharply with adolescents. Caregiver credibility was higher for better functioning families, more credible youths, younger youths, and more educated caregivers; it was unrelated to caregiver mood symptoms or being the mother. Youth credibility was strongly connected to age, cognitive ability, caregiver credibility, and independent observations of youth behavior. Credibility ratings markedly altered the validity of checklists compared with interview ratings, diagnoses, or cross-informant criteria.
Conclusion:
Clinicians' judgments about informant credibility are associated with different characteristics for youths versus caregivers, though youth age is important to both. Credibility affects the validity of information from checklists measured against several different independent criteria.
Introduction
To the clinician, perceptions of credibility are not so much about the “average” parent or the “typical” youth; they are observations about an individual person. These judgments sometimes are based on a single, decisive piece of information, such as when a person appears intoxicated during the interview, and other times, the perception about credibility is based on a constellation of factors. To date, little or no work on cross-informant agreement has examined the extent to which clinicians' judgments about informant credibility might be associated with the validity of information derived from mood and behavior checklists. Two dominant approaches have been used to conceptualize differences in reported problems across informants: situational specificity and picking the best average informant (De Los Reyes and Kazdin 2005; De Los Reyes 2011). Both are described briefly, along with reasons why neither is fully satisfactory in clinical practice.
The “situational specificity” hypothesis has argued that each person is providing accurate information about behaviors that are specific to different situations (Achenbach 1995; Kraemer et al. 2003; De Los Reyes and Kazdin 2005). For decades, it has been well established that caregivers, youths, and teachers agree with each other at only modest levels when describing youth mood and behavior (Achenbach et al. 1987; Achenbach and Rescorla 2001; Reynolds and Kamphaus 2004). Information provided by each person typically meets high standards for internal consistency reliability, retest stability, and various aspects of validity (Achenbach and Rescorla 2001)—yet agreement remains modest. Teachers may provide accurate descriptions of behavior in the classroom, which could be different from parents' accurate description of behaviors at home, for example (Kraemer et al. 2003; Hudziak et al. 2005). Situational specificity is a widely accepted model (Sherman et al. 2010). Recent observational work in both laboratory and clinic settings supports it (De Los Reyes et al. 2009, Hartley et al. 2011), and the situational specificity model supports the common recommendation for clinicians to gather information from multiple sources (Mash and Hunsley 2005). However, this model creates challenges for clinicians when the information from different informants appears to disagree—especially when the options being considered for intervention are more global and cannot be adjusted for particular situations. A clinician choosing whether or not to prescribe a medication cannot specify that the atypical antipsychotic or mood stabilizer only affects the person at school, for example. The decision whether or not to medicate or initiate other forms of treatment becomes complicated when one person appears to deny a problem that another person reports is present (see Carlson and Blader, 2011).
The second approach to resolving disagreement has relied on identifying which is the most valid source of information for a particular diagnosis or problem, on average. When one source of information is clearly more valid than another, then the clinician can choose to ignore the less valid piece of information or even streamline the assessment process by not gathering information with lower validity. In fact, it is possible to dilute the validity of the information gathered by adding new results or scores with lower validity, yielding more expensive yet less accurate clinical decisions (Kraemer 1992). The “pick the best” strategy has led to discounting self-report by youths as a source of information about attention problems (Jensen et al. 1999) or giving greater credence to self-report about internalizing problems (Loeber et al. 1989). Conversely, the weight of evidence suggests the opposite about teacher report: teacher checklist ratings tend to be valid about attention problems (Barkley 1998; DuPaul et al. 1998) and much less so about internalizing disorders (Epkins 1995; Hazell et al. 1999; Youngstrom et al. 2004b; Youngstrom et al. 2008).
Problems with the “pick the best” approach include the fact that problems only reported by either the parent or the youth are still often associated with considerable impairment (Bird et al. 1992; Jensen et al. 1999; Youngstrom et al. 2003), as well as the possibility that there might be incremental value in combining different perspectives on youth functioning (e.g., Carlson and Youngstrom 2003). Differences between informants may also be more nuanced, playing out at the level of differing interpretations of specific items or behaviors rather than entire scales or domains (e.g., Freeman et al., 2011). A pragmatic concern is that the “best” informant based on the literature may not be available in practice. Much more research has focused on the validity of mothers as informants about child behavior as opposed to fathers or other caregivers (cf. Phares 1992, 1997). How should a clinician proceed when the youth in question is young and there is no mother involved, or the parent's perspective is clearly compromised by psychological factors or extrinsic considerations such as legal or custody proceedings?
A fundamental limitation of both the “situational specificity” and the “pick the best” algorithms is that they focus on group data, evaluating typical or average validity. These summary statistics are important starting points, but there may be huge individual differences hidden within the general trends. For example, a particularly insightful youth might have a clear and informative perspective about their own attention problems, whereas a caregiver who is dealing with serious impairments, or a foster parent who has only known the youth for a few months, will have much less basis for their perceptions—and correspondingly lower validity than usual.
Is there a way to capitalize on clinical judgments about the credibility of individual informants that supports diagnostic formulation and treatment planning? Can we do better than current strategies of saying that disagreement is due to situationally specific changes in behavior or picking a single best informant for each target issue? How valid would global judgments about credibility be? Would accounting for credibility lead to appreciable changes in the reliability or validity of information collected from each person?
The goals of the present study include (a) examining the relative frequency with which clinicians judged the information they received from youths and caregivers during an interview to be credible, (b) identifying factors associated with credibility of youth or caregiver report, and (c) examining whether judgments of credibility were associated with significant changes in the diagnostic or cross-informant validity of information from mood and behavior checklists.
Methods
Procedure
All procedures were reviewed and approved by the institutional review boards of University Hospitals of Cleveland, Applewood Centers, and the University of North Carolina at Chapel Hill. Participants were recruited from a consecutive case series seeking outpatient evaluation at either the largest community mental health center providing services to children and families in the state of Ohio or a neighboring academic medical center. A total of 646 youths ranging from age 5 to 18 years presented for an outpatient evaluation with their primary caregiver and completed a semistructured diagnostic interview with highly trained raters. The interviewer rated the credibility of information received from the youth and caregiver as “poor,” “fair,” or “good” after talking with each informant. Caregivers also completed rating scales about their youth's mood symptoms. Youths aged 11 and older completed the same rating scales about themselves, and teachers completed checklists about a subset of participants. Caregivers also reported their own current mood symptoms on two questionnaires. Diagnoses and credibility ratings were made blind to the scores on any mood and behavior checklists, which were gathered by a second research assistant. The caregiver completed the interview first when youths were younger than age 11. When the youth was older, the family was given their preference about interview order; 90% of families elected to have the caregiver complete the interview first.
Measures
Youth diagnoses and mood severity
A consensus meeting assigned youth diagnoses by reviewing the results of a Kiddie Schedule for Affective Disorders and Schizophrenia (KSADS) interview conducted by a highly trained rater (K>0.85 at the item level for each of five training and five certification interviews) using the KSADS-Present and Lifetime version (Kaufman et al. 1997) with the Washington University mood disorders modules (Geller et al. 2001). The same interview provided the basis for scoring the severity of the youth's manic and depressive symptoms (Axelson et al. 2003). The interviewer met with the caregiver and the youth sequentially and reinterviewed each as necessary to use clinical judgment to resolve reporting discrepancies. A licensed clinical psychologist reviewed the KSADS findings in person with the interviewer and synthesized them with additional information about developmental history, treatment history, and family psychiatric history (Spitzer 1983). Diagnoses followed strict Diagnostic and Statistical Manual of Mental Disorders, 4th edition, Text Revision (DSM-IV-TR), criteria (American Psychiatric Association 2000).
Credibility ratings
At the conclusion of the KSADS interviews, the interviewer also rated the credibility of the caregiver and the youth at the end of the interview day. The instructions were “Reliability of information: 2=good, 1=fair, 0=poor.” Credibility scores were a global, subjective rating based on the clinical judgment of the interviewer.
Family characteristics
Age, race, and gender were primary demographic characteristics. The caregiver's relationship to the youth, their education level, their self-reported income, and their occupational status, along with the number of children in the home comprised the family-level demographic variables of interest for this study. Two additional sources of information rated the overall functioning of the family: the KSADS interviewer completed the Global Family Environment Scale (GFES; Rey et al. 1997), scoring each family on a scale from 1 to 90, with higher scores indicating better functioning. The primary caregiver completed the Family Assessment Device (FAD) (Byles et al. 1988) as part of the questionnaire packet. FAD total score provided a measure of overall discord in the family, combining information about poor communication, problem solving, and general functioning, with alpha=0.91 in this sample.
Youth behavior problems
Caregivers completed the 2001 version of the Child Behavior Checklist (CBCL; Achenbach and Rescorla 2001). Youths aged 11 years and older (n=349) also completed the self-report version of the same scale, and a subset of teachers (n=249) also completed the Teacher Report Form. Teacher data were not gathered during the summer or first 6 weeks of the new school year, even though families continued to enroll in the project, because teachers were unavailable when schools were closed and less familiar with the youth's behavior at the beginning of the academic year. The present analyses focused on the Externalizing and Internalizing Problems Broad-Band T-scores and the Attention Problems Clinical Scale T-score, as these are the three domains that have received the most study in the cross-informant research literature.
Youth mood ratings
Primary caregivers also completed the Parent General Behavior Inventory (PGBI; Youngstrom et al. 2001) as a rating of the youth's depressed, hypomanic, and mixed mood symptoms. Youths 11 years and older also completed the self-report version (adolescent GBI), which has shown good psychometric properties within this age group (Danielson et al. 2003). Present analyses concentrated on the depression and hypomanic/biphasic scores, each with alpha >0.94 in this sample.
Caregiver mood ratings
Primary caregivers also completed a Beck Depression Inventory (Beck and Steer 1987) and a Mood Disorder Questionnaire (Hirschfeld et al. 2000) as measures of their own depressive and hypomanic or manic symptoms.
Youth cognitive ability
A subset of youths (n=127) completed cognitive ability testing as part of their clinical evaluation and had a global score available as part of the review of the treatment history. Present analyses used the global score, scaled with M=100 and SD=15. For 79% of cases, scores were based on the Wechsler Abbreviated Scales of Intelligence (The Psychological Corporation 1999), 12% based on the Peabody Picture Vocabulary Test–Third Edition (Dunn and Dunn 1997), and 9% based on the Kaufman Brief Intelligence Test–2nd Edition (Kaufman and Kaufman 2004).
Observational ratings of youth behavior
During the final year of data collection, an IRB-approved modification added observational ratings of the youth behavior, using the Guide to the Assessment of Test Session Behavior (GATSB) (Glutting and Oakland 1993). The GATSB is an age-normed observational system where the rater scores 29 behaviors on a 0- to 2-point scale. The GATSB generates three scores: Inattention, Uncooperative Mood, and Avoidant Behavior, as well as a Total Problems score. At the end of the interview day, the KSADS interviewer completed the GATSB to describe the youth's behavior. In addition, the second research assistant who supervised the youth while the caregiver was completing the KSADS interview also filled out a second, independent GATSB.
Analytic plan
Descriptive statistics examined rates of caregiver and youth credibility, and kappa quantified whether youth credibility was associated with caregiver credibility. Correlations measured the extent to which credibility was associated with demographic variables, youth diagnoses, or parent mood symptoms. Multiple regressions using credibility as the dependent variable examined what factors made unique contributions to credibility. Three sets of analyses tested the effect of credibility on the reliability or validity of information from the caregiver or youth by stratifying on credibility: (a) Feldt's (1969) procedure tested whether the internal consistency reliability changed between different levels of credibility; (b) z-tests of independent r values (Cohen and Cohen 1983) tested potential differences in validity coefficients comparing mood and behavior checklist scores to diagnoses and interview-based mood ratings; and (c) Hanley and McNeil's (1983) procedure for comparing Receiver Operating Characteristic (ROC) analyses tested whether the diagnostic discriminative validity changed significantly between good, fair, and poor credibility informants. All analyses report uncorrected p values; p values denoted by two or more asterisks signify p<0.005 and would survive even Bonferroni correction for the “study-wise” rate of all significance tests run in the course of this investigation.
Results
Description of participants
Table 1 provides information about demographics, youth diagnoses, and study scale descriptives.
Intelligent Quotient score=full scale score, n=116.
Guide to the Assessment of Test Session Behavior, n=109.
Teacher Report Form, n=249.
p<0.05, ** p<0.005, *** p<0.0005, **** p<0.00005, two tailed.
M = mean; SD = standard deviation; FAD = family assessment device; PGBI = Parent General Behavior Inventory; AGBI = Adolescent General Behavior Inventory; KSADS = Kiddie Schedule for Affective Disorders and Schizophrenia; ADHD = attention-deficit/hyperactivity disorder.
Perceived credibility ratings
At the end of the KSADS, interviewers rated 63% of caregivers “good” credibility, 31% “fair,” and 6% “poor” versus 24% of youths “good,” 47% “fair,” and 30% “poor.” Credible youths tended to have credible caregivers: chi-squared (4 degrees of freedom)=31.69, p<0.00005; but there were still frequent mismatches: kappa=0.11. Interviewers perceived caregivers as much more credible on average for the young children (ages 5 to 10 years): Cohen's d=1.29, p<0.00005; in contrast, caregivers were only slightly more credible on average for the older youths: d=0.28, p<0.0005.
Correlates of caregiver credibility
Table 1 provides correlations between caregiver credibility and family, youth, and caregiver characteristics. All potential correlates showed distributions with acceptable skew and kurtosis values that fell within the range where Pearson correlation and multiple regression values tend to be robust, and there were no substantive outliers based on standard regression diagnostics (Tabachnick and Fidell 2007). Significant correlates of caregiver credibility included younger youth age, better family functioning (measured as either the interview-rated GFES or the caregiver-rated FAD Total), better youth functioning (based on the interview global assessment of functioning), higher caregiver income or education, fewer children, and lower concerns about youth depression or manic symptoms. Regression analyses indicated that a combination of factors could explain 17% of the variance in caregiver credibility (p<0.00005), with family functioning (r part=0.26), youth age (r part=−0.21), credibility of the youth (r part=0.18), and caregiver education (r part=0.09) making significant unique contributions. Caregiver credibility was unrelated to caregiver mood symptoms, youth cognitive ability, or independent observations of youth behavior problems.
Correlates of youth credibility
Table 1 also provides correlations for youth credibility ratings with family, youth, and caregiver characteristics, along with tests of whether the correlations significantly differ between youth versus caregiver credibility. Significant correlations of youth credibility were mostly different from the predictors of caregiver credibility. Older age youth, female youth, not having a diagnosis of ADHD or bipolar disorder, lower CBCL Externalizing or Attention Problems, higher caregiver education, having a male primary caregiver, lower caregiver report of manic symptoms, and higher self-report of manic or depressive symptoms all were significantly associated with greater levels of youth credibility. The correlation with self-reported manic symptoms was small and suggests that the subset of youths with insight into their behavior were perceived as slightly more credible. A subset of cases also completed a brief intelligence test and had observational ratings of behavior available. Greater youth credibility was strongly associated with higher cognitive ability (r=0.40) and less behavior problems during the KSADS interview (r=−0.25) or when watched by a different person while the caregiver was completing the KSADS (r=−0.38). Regression analyses indicated that factors could account for 22% of the credibility in youth report (p<0.0005), with age being the strongest predictor. Controlling for youth age eliminated all other correlates except for caregiver credibility (r part=0.18), with age remaining a powerful predictor (r part=0.39). For the subset with cognitive ability and behavioral observations available, the regression explained a similar amount of variance, with age, cognitive ability, and observational ratings of behavior, each making unique contributions, but caregiver credibility was no longer significant. Comparing the correlates of youth credibility to those of caregiver credibility found that youth credibility was significantly more linked to youth cognitive ability, youth behavior during the interview, and youth diagnoses of ADHD or mood problems, whereas caregiver credibility was significantly more associated with family functioning or socioeconomic status.
Effect of credibility on reliability
Table 2 presents the internal consistency estimates for the GBI scales reported by the parent and youth. Cronbach's alpha was significantly higher (p<0.05 based on Feldt's test) for the poor credibility caregivers on the depression scale compared with both the fair and good credibility, and there was a similar trend for poor versus good credibility on the hypomanic/biphasic scale. This suggests that poor credibility informants answered with a response set rather than reflecting on the content of each item. Consistent with this possibility, caregiver report on the FAD showed a significant pattern in the opposite direction, with lower internal consistency for the poor credibility caregivers (p<0.05). The FAD includes 10 items that are reverse keyed, so selecting the same response option for all items would lower the reliability estimate for the FAD, whereas it would raise reliability on scales such as the GBI that do not use any reverse keying. Similarly, there was a tendency for the interview-based ratings of the severity of mood symptoms to be more internally consistent when interviewing good credibility rather than poor credibility informants (p=0.0903 on the KMRS and 0.1401 on the KDRS). There were no trends for the association between youth credibility and internal consistency of youth report on the GBI and between youth credibility and interview ratings of mood (all p>0.25).
Effect of credibility on criterion and discriminative validity
Ratings of caregiver credibility were related to the validity of caregiver report on mood and behavior checklists. Criterion validity coefficients for caregiver-reported manic symptoms changed from 0.27 for poor credibility to 0.50 for good credibility when comparing PGBI to KMRS ratings and from 0.52 to 0.64 for PGBI compared with KDRS ratings; however, these did not achieve statistical significance because of the small number of poor credibility caregivers. Caregiver–youth and caregiver–teacher correlations all significantly increased when comparing good credibility to poor credibility caregivers, and this pattern was found across ratings of externalizing, internalizing, and attention problems as well as for manic and depressive symptoms (all p<0.05) (cf. Achenbach et al. 1987). With regard to discriminative validity, areas under the curve in ROC analyses change from 0.63 (nonsignificant) to 0.81 (p<0.0005) for poor versus good credibility when comparing PGBI scores to bipolar diagnoses.
Similar patterns were observed with youth credibility, with the criterion correlations for the GBI depression score rising from 0.27 to 0.49, and the hypomanic/biphasic score from 0.39 to 0.43 when comparing poor versus good credibility youths, with the ROCs against bipolar diagnoses being significant for the good but not poor credibility youths. These patterns were not due to changes in the internal consistency of checklist scores, as internal consistency either stayed the same or increased for the poor credibility informants.
Discussion
At the end of a day-long interview, clinical interviewers judged caregivers to be credible informants more often than youths. The correlates of caregiver reliability and youth reliability each had face validity, but there was little overlap in the predictors. Caregivers were perceived as most credible when families were functioning better (consistent with Hawley and Weisz 2003), when they were reporting about younger children, when the youth was also perceived as credible, and when the caregiver was more educated. Caregiver mood symptoms, being the biological mother, and various other plausible correlates showed no significant association with caregiver credibility. The lack of relations between credibility and caregiver mood stands in contrast to prior findings that the validity of caregiver report might be attenuated by caregiver stress, although these effects have tended to be small (Richters 1992; Youngstrom et al. 2000). Perceived youth credibility markedly increased with youth age, with cognitive ability and independent observations of youth behavior problems explaining additional variance. If the youth met criteria for ADHD or a bipolar diagnosis, then their perceived credibility tended to be significantly lower.
Regression models using demographic and clinical features explained moderate amounts of variance with high degrees of statistical significance for both caregiver and youth credibility. These plausible and often substantial associations corroborate the validity of clinical judgments about credibility. However, the predictions fell far short of what would be needed to classify informants based on these variables instead of using clinical judgment to rate credibility directly. Similarly, the changes in validity of caregiver or youth-reported ratings, although often statistically significant, were never so large as to justify substantial changes in the interpretation of information, such as discounting or ignoring an informant entirely. Informants with poor credibility still usually provide ratings with some validity, albeit moderately less valid than corresponding reports from informants with good credibility. Ratings of manic symptoms in the youth varied from moderate to high validity, for example.
On the other hand, clinical interviewers appear to be able to integrate multiple pieces of information gathered during a semistructured interview to arrive at a valid decision about whether informants have good, fair, or poor credibility. Informants judged to have “good” credibility showed validity coefficients equal to or higher than the benchmarks reported in the literature for diagnostic efficiency and cross-informant agreement. Conversely, informants judged to have “poor” credibility demonstrated validity coefficients that were sometimes significantly lower than the credible informants in the same setting, as well as below published benchmarks. These relationships were significant despite safeguards such as blinded ratings by independent raters, and they provide strong evidence for the validity of clinical judgment about the credibility of informants.
Decreased reliability is a frequent cause of decreased validity in assessment, but in this case the reports from informants with poor credibility actually had the same or higher internal consistency. This suggests that informants were forming a response set and describing behavior problems in a global, uniform way, thus yielding highly consistent but less valid reports.
Limitations and future directions
Investigation of credibility was based on a secondary analysis of data originally gathered for other purposes. Although sample size remained considerable, a variety of measures that were helpful in examining correlates of credibility were only available on subsets of cases. There also are many factors that may have a large effect on rater credibility, but these were not directly measured in this study. Candidates include constructs such as social desirability, denial, malingering, and other factors that can systematically influence scores (Guion 1998).
The actual rating of credibility was also simplistic, asking the interviewer to make a global evaluation with only a few options. Many raters opted to use decimals to convey additional gradations of credibility, indicating that more nuanced perceptions could be quantified. It is also likely that more elaborated rating systems could focus attention on different aspects of credibility, providing more detail about facets such as demoralization, malingering, impression management, or lack of insight. Moving from global, unstructured impressions to more objectified, semistructured ratings often achieves enhanced reliability and validity (Anastasi 1988). This would be a promising area for future development of rating scales, as even simple global ratings demonstrated moderate criterion correlations and changes in the validity of information provided by caregivers and youths.
More work is needed to refine assessment approaches to consider the credibility of the informant, rather than always basing algorithms on the “average” caregiver or “average” youth. It remains to be determined whether clinical decision making would be better enhanced by determining when to discard information, because a particular informant seems to have poor credibility, versus keeping the information but adjusting the weight or interpretation. Future work should also investigate whether the effects of credibility are equally powerful across different diagnoses, different domains of functioning, and different rating scales. Present results suggest that the effects of caregiver versus youth credibility may not be the same for externalizing versus internalizing or attention problems, and similar issues may apply to conditions such as anxiety disorders (where self-report may often be highly accurate) (Frick et al. 1994) versus conduct disorder (with a high risk of denial in self-report) or psychosis (where insight may rapidly be compromised) (Pini et al. 2001; Youngstrom et al. 2004a). Similarly, some rating scales may be more susceptible to the effects of changing informant credibility, because of issues such as rater burden and reading level as well as perceived social desirability (Garb 1998). Differences in credibility are also likely to be associated with patterns of agreement between informants about the youths' functioning (De Los Reyes et al., 2011). The majority of participants came from low-income families and impoverished school systems, as reflected in the distribution of cognitive ability scores. It would be helpful to investigate correlates of credibility in samples with different demographics to understand the extent to which SES might moderate credibility and its associated features.
Conclusion
Overall, present findings indicate that not all caregivers, and all youths, are seen as equally credible by clinicians, nor should they be. Clinical judgments about credibility showed plausible relationships with youth, caregiver, and family characteristics as well as measurable changes to the reliability and validity of information they provided on standard checklists. Clinicians appear to be able to gauge when information from a particular source may be suspected. The next wave of research should refine how to quantify judgments of credibility and develop evidence-based approaches for integrating these data into the assessment and decision-making process.
Clinical significance
No prior research has investigated whether clinical judgments about the credibility of adults or youths are related to the reliability or validity of the information they provide. Findings add to knowledge about factors associated with credibility and document the extent to which poor credibility is linked with reduced criterion validity. Clinicians will encounter caregivers or youths with compromised credibility because of various circumstances, and global clinical judgments about credibility are linked with changes in response set, shifts in degree of cross-informant agreement, and changes in the diagnostic validity of information received. However, even poor credibility did not totally invalidate the youth or caregiver report on any instrument. Clinicians should keep this in mind when integrating information from various informants. Completely discounting or ignoring information from a person judged to have poor credibility will overcorrect and often result in less accurate decisions. Until more precise and generalizable algorithms are developed, a reasonable clinical strategy would be to pay attention to credibility, but to make more fine-grained adjustments in interpretation rather than dropping a set of scores entirely when credibility is assessed as being poor. When credibility is only “fair,” reports may need to be taken with some degree of circumspection, and even “poor” credibility informants provided information that remained statistically valid.
Footnotes
Acknowledgments
The authors thank the families who participated in this research. This work was supported in part by NIH R01 MH066647 (PI: E. Youngstrom).
Disclosures
E. Youngstrom has received travel support from Bristol-Myers Squibb. Dr. Findling receives or has received research support, acted as a consultant and/or served on a speaker's bureau for Abbott, Addrenex, AstraZeneca, Biovail, Bristol-Myers Squibb, Forest, GlaxoSmithKline, Johnson & Johnson, KemPharm Lilly, Lundbeck, Neuropharm, Novartis, Noven, Organon, Otsuka, Pfizer, Rhodes Pharmaceuticals, Sanofi-Aventis, Schering-Plough, Seaside Therapeutics, Sepracore, Shire, Solvay, Sunovion, Supernus Pharmaceuticals, Validus, and Wyeth. The other authors have no financial interests to disclose.
