Abstract
Discrepancies between our university’s training program’s report-writing guidelines and common practice in Manitoba could not be resolved by reference to the literature. To inform the discussion, we collected a sample of local real world school psychology reports and undertook a modified content analysis to operationally define and measure relevant variables. In this article we present our qualitative and quantitative findings on organization, readability, length, and the nature of recommendations in detail with implications to improve the extent to which school psychology reports incorporate an evidence basis and contribute to real, beneficial, and demonstrable change in circumstances for children and families.
This mixed methods study grew out of the efforts of our school psychology program to develop a set of criteria and standards (what is commonly called a rubric) for report-writing based on the best available evidence in the literature, feedback from our students and input from our school-based instructional partners. As well, in supporting a focus on an evidence basis, we were interested in understanding which aspects of best practice are based on empirical study and which on expert consensus and in developing a process to extend the evidence base to help resolve discrepancies between theory and practice. In developing the rubric we surveyed the literature, published handbooks and other training programs to determine what the field considered best practices.
Detailed reviews of the relevant literature are presented in the related articles in this series. In summary, the literature identifies several variables of import. These include: length (Donders, 2001; Groth-Marnat & Horvath, 2006; Horvath, Logan, Walker, & Juhasz, 2000; Sattler, 2008), reading level (Brenner, 2003; Groth-Marnat, 2009; Groth-Marnat & Horvath, 2006; Harvey, 1997, 2006; Sattler, 2008), organizational style (Ackerman, 2006; Bagnato, 1980; Pelco et. al., 2009; Wiener, 1985, 1987; Wiener & Kohler, 1986), usefulness of recommendations (Bagnato, 1980; Harvey, 2006; Ownby, 1990; Wiener, 1987), degree of individualization (consumer focus) (Brenner, 2003), relevance to the referral question (Brown-Chidsey & Steege, 2005; Groth-Marnat, 2009; Ownby, 1990; Sattler, 2008; Schwean et al., 2006; Wiener, 1987), and the balance of strengths and weaknesses (Duckworth, Steen, & Seligman, 2005; Jimerson, Sharkey, Nyborg, & Furlong, 2004; Rashid & Ostermann, 2009; Rhee, Furlong, Turner, & Harari, 2001; Snyder, Ritschel, Rand, & Berg, 2006).
Our program’s guidelines were based on this body of knowledge and the principles of evidence-based clinical practice (EBCP). EBCP has been defined as “the integration of best research evidence with clinical expertise and patient values” (Sackett, Straus, Richardson, Rosenberg, & Haynes, 2000, p. 1). In our model this translates to reports based on the basic scientific problem-solving model: observation, hypothesis generation, data collection, analysis, recommendation generation, intervention implementation, further data collection [The Canadian Cochrane Network/Centre (CCN/C)], 2003). As well, the model encourages an overall positive psychology/resilience orientation (Jimerson, Sharkey, Nyborg, & Furlong, 2004) and cultural appropriateness (Canadian Psychological Association, 2007) with an individual education plan (IEP) friendly format (D’Amato & Dean, 1987) particularly in the recommendations section. In developing the rubric from this model we attempted to support every requirement with reference to rigorous science, however, in defining these criteria we found in many cases we could not. It seems important for an evidence-based discipline such as school psychology (NASP, 2006) to address the concern that only some aspects of a literature derived report-writing model are based on empirical examination with many based on expert consensus (Groth-Marnat & Horvath, 2006; Harvey, 1997, 2006; Pelco et al., 2009).
In our case, this lack of scientific support presented challenges to the acceptance of the model/rubric by both mentors and students. Indeed, students and mentors were reluctant to implement even some empirically supported criteria they deemed to be inappropriate or impractical. Of particular concern was the perception that the program’s guidelines were not consistent with the expectations of the school psychologist mentors providing the practicum experiences. Specifically practitioners expressed concern that
Referral sources do not provide information sufficient to meet the rubric criteria;
Reports based on the rubric criteria would be too long to be acceptable and useful;
Teachers would see the IEP friendly format of the recommendations as prescriptive and objectionable;
The inclusion of strengths would compromise funding applications.
This type of “research to practice divergence” is well documented in the literature (Kratochwill, and Steele Shernoff, 2004) in general and is longstanding with regard to the organization and substance of psychology reports (Appelbaum, 1970; Forer, 1959; Foster, 1951; Holzberg, Alessi, & Wexler, 1951; Lodge, 1953; Sargent, 1951; Tallent, 1958). Harvey (2006) examined the persistence of report-writing concerns and concluded that one major focus needs to be on the training practices of graduate programs. Her argument is persuasive, and training programs need to take it seriously and implement strategies to foster appropriate changes in practice. To that end this study applies a collaborative action research approach to the issues identified by our students and training partners by undertaking a modified content analysis of present reports from our area.
Overview of the Study
Psychological research has focused primarily on quantitative research over the past decade (e.g., Camic, Rhodes, & Yardley, 2003; Hayes, 1997). However, qualitative research is productive when researchers are exploring new areas of study (Fitzpatrick & Boulton, 1994) and seeking to understand and describe a phenomenon (Britten, Jones, Murphy, & Stacy, 1995). We based our methodology on a modification of the descriptive-qualitative approach as described by Gilgun (2005) and based on grounded theory (Charmaz, 2006). The qualitative approach taken in this study was designed to expand on and explicate the content and structure of typical reports written by local school psychologists in the Winnipeg, Manitoba area.
We sent an outline of the proposed research with a request to provide existing reports, made anonymous, to local school divisions that employ certified school psychologists. This began a process of discussion that resulted in an agreed process, the salient points of which are:
Reports were seen as the intellectual property of the authoring school psychologists and, consequently, reports would be included only with their agreement;
Reports were to be made anonymous with respect to all persons and agencies referenced;
All demographic information of the participating school psychologists was to be excluded from the study;
Results would be reported as a descriptive formative review and would avoid any comparative, evaluative, summative orientation.
The resulting conditions were not ideal from an experimental design perspective but were acceptable for an initial qualitative exploration within an action research frame of reference. The process of negotiating participation was itself instructive and contributed considerably to our understanding of some of the issues related to studying and changing the nature of school psychology practice which we discuss below.
Method
We received 90 reports and checked them for anonymity. Then the content-analysis process, based on the work of Pope, Ziebald, and Mays, (2000) proceeded with findings shaping ongoing data collection. Qualitative data analysis involves looking initially at smaller units, then identifying broad categories or themes, and ending with an interpretation of the categories and themes. Our analysis of reports was guided by procedures outlined by Tutty, Rothery, and Grinnell, (1996) with additional techniques taken from Strauss and Corbin (1998).
In these comprehensive classic models, the first step involves a quick read through a selection of the reports to gather a sense of the information, breaking the text into meaningful units or chunks of information (Tutty et al., 1996). In the second step categories are created (Strauss & Corbin, 1990). The third step in the data analysis involves comparing and contrasting categories to determine the relationships between them. A final review ensures that these categories appear to reflect what is in the report confirming the definition of each category and satisfying the need to demonstrate consistent application and meaningfulness.
Our goal was more modest than the complete deconstruction and content analysis of school psychology reports requiring the complete implementation of the classic approach. We had some guidelines from the empirical literature and our own rubric criteria on which to base our categorization. As noted, these include organizational style, length, reading level, usefulness of recommendations, relevance to the referral question, and the balance of strengths and weaknesses. However, because very little of the extant literature is specific to school psychology we also wanted to ensure that our categorization schema were relevant to and reflective of the realities of presently used school psychology reports. Accordingly, we adapted the classical approach to define mutually any additional themes as they became apparent. Then we determined strategies for measurement. We view the emergent categories as meaningful in the context of our rubric development but certainly not exhaustive to the exploration of the intricacies of the school psychology report.
Each of three raters read the same initial 10 reports. All three raters were familiar with the literature and the goals of the study. Raters independently developed categories for describing the structure and content then discussed these in concert and mutually agreed on categorical labels and subcategories. A scoring rubric was developed to define the categories and subcategories. A subsequent group of three raters coded the same 10 reports using the rubric. Differences were resolved in discussion with the consequent refining of the rubric. Each rater then coded an additional five reports demonstrating an acceptable level of agreement (r = .92).
Findings
We defined the following relevant variables for which we created a scoring rubric: organization; length; readability; nature of recommendations; presentation of test scores; collaboration; balance of strengths and difficulties; child centeredness; individualization and the relationship to referral question. We present our findings on organization, readability, length, and the nature of recommendations in detail in this article and recommend continued study of other variables to improve the extent to which school psychology reports contribute to real, beneficial changes in circumstances for children and families.
Organization
We found that all of the reports contained a similar organizational structure. In all likelihood this is due to the unique history of clinical services in our area deriving from a common agency with a mandated report format. Since the demise of that singular agency the resulting services have developed their own guidelines and it appears that the general structure has prevailed. Every report reviewed was in the general format: reason for referral; sources of information; history; observations; results; recommendations.
Some reports had different labels or a different mix of information in these sections and some had additional subsections and headings or the order of presentation differed. These section headings formed the basis of our examination of readability. Within this overall similar structure we identified three styles of significance to the literature and to the questions we were asking about effective reports. We labeled these as test-by-test (n = 70), thematic sequential (n = 15), and thematic integrated (n = 5). Pelco et al. (2009) made a similar distinction in using reports styles as dependent variables defined as test-by-test technical, theme-based technical, and theme-based.
Equivalent to Ackerman (2006) and Pelco et al. (2009) the test-by-test style presented its results in the chronological sequence of the assessment, instrument by instrument with little or no integration of findings across tests. The thematic sequential style was similar to that described by Beutler and Groth-Marnat (2003) and Pelco et al. (2009) and presented results as integrated functional areas across assessment tools, primarily in a discrete section, as in the test-by-test format. The third format we identified was similar to that that described in Pelco et al. (2009) in that aspects of the results appear earlier in the report. However, the thematic integrated style we defined differed from previously described styles in that it did not restrict recommendations to a final report section. This style presented results as integrated functional areas across assessment tools but presented recommendations (sometimes labeled implications) earlier in the report integrated with the findings, as well as summarizing them at the end of the report in a formal recommendations section.
Readability: Length
Typically, studies of readability in psychology reports rely on the inbuilt facility of the Microsoft Word grammar check function to calculate the number of words, sentences and paragraphs and the Flesch reading ease and/or Flesch-Kincaid grade level indices. Following this logic we explored the overall length and relative section length of each report type as well as calculating the Flesch-Kincaid grade level. Because we wanted both overall and section-by-section measures, we used a macro in Microsoft Word (Wyatt, 2010) to calculate readability statistics and avoid having to complete the grammar check each run. The descriptive statistics (means, standard deviations and percentages) by section for length by report section and style are indicated in Table 1.
Descriptive Statistics for Length
Report length varied from 3,493 to 6,574 words. The 75 test-by-test style reports ranged from 3,493 to 6,190 words with the 15 in the thematic sequential style ranging from 5,453 to 6,386 words and the five thematic sequential reports from 5,499 to 6,574 words. In all report styles the two procedural sections, observations and results, comprised more than 50% of the report. In all cases the recommendations section comprised 15% or less of the report. The thematic integrated format demonstrated a significant difference from the other two formats with a longer observations section. Further examination of this difference indicated that much of the additional length was due to presenting material related to the recommendations along with the related results rather than later in a separate recommendations section. Based on the limited number of differentiated styles sampled it appears that report styles may be related to differing lengths overall and different relative lengths in the test observations and results sections. An independent-samples t-test was conducted to compare length in each pair of styles. There was a significant difference in the scores for test-by-test and thematic sequential conditions (t (83) = 4.6, p < .001) as well as for the test-by-test and thematic integrated conditions (t (73) = 3.4, p < .001). There was no significant difference between the two thematic styles.
Readability: Grade Level
Although the data does not support a sophisticated statistical analysis and again, based on the limited number of differentiated of styles sampled, it does suggest that report styles may be related to differing reading levels with thematic formats having higher reading levels than the test-by-test format overall. As well, our results indicated that readability varies section-by-section with the reason for referral and recommendations sections having higher difficulty levels than the other sections. In keeping with the exploratory nature of the study we conducted a one-way analysis of variance across styles for the overall reading level and for the recommendations section. This demonstrated significant differences for the three styles overall (F (2, 87) = 745.4, p < .001, η2 = .945), and for the recommendations section (F (2, 87) = 229.7, p < .001, η2 = .733). In both sets of post hoc analyses using a Bonferroni correction and p < .05 the test-by-test style showed a lower reading level than did the thematic sequential style, and both were lower than the thematic integrated style (see Table 2).
Descriptive Statistics for Reading Grade Level
1 and 2 are significantly different, b1 and 3 are significantly different, c2 and 3 are significantly different.
Significant with Bonferroni correction at p < .05, **significant at p < .01
Of particular interest was our shared observation that reports with higher grade level readability scores did not always “feel” more difficult. This observation is consistent with both the early findings of reading researchers and the contemporary findings of cognitive scientists. DuBay (2004) points out that early researchers did not rely only on readability formulae. According to DuBay (2004), Gray and Bernice Leary (1935) found that content, with a slight margin over style, was most important. Third in importance was format, and almost equal to it, “features of organization,” referring to the chapters, sections, headings, and paragraphs that show the organization of ideas. (p. 17). In fact, because of these known limitations, readability researchers have long taken pains to recommend that formulae are best used together with other methods of grading and writing texts. Ojemann (1934) warned that the formulae are not to be applied mechanically, a caution echoed by other investigators (Dolch, 1939; Horn, 1937).
More recently, cognitive theorists and linguists viewed reading as an act of thinking, pointing out the importance of inferences and interpretations, prior knowledge about the topic, the text structure or genre, and strategies for learning (DuBay, 2004). As well, these researchers point out different levels of difficulty relate to different uses of a text (Bormuth, 1969; Vygotsky, 1978). The use of reading protocols and usability testing has been put forward as an alternative to the formulae (Redish, 2000; Schriver, 2000).
Beyond conceptual concerns we identified a methodological source of the disparity based on formatting interacting with report style and the way that programs calculate readability.
Logical and Methodological Considerations in the Use of Readability Statistics
The formula for calculating the Flesch-Kincaid grade level score is: .39 (average sentence length) + 11.8 (average syllables per word)–15.59. An initial investigation found that, depending on the specific software used some issues were evident. There are errors in recognizing some words as single or multisyllabic. Because sentences are defined by the presence of a period, the software does not count as sentences list items or table fields that do not contain a terminating period. In our sample the use of terminating periods in lists and tables was variable and this affected readability scores. Depending on punctuation they can result in scores indicating a greater degree of reading difficulty than presenting the same information in a unified paragraph. Tables, diagrams and numbered lists have been promoted as a way to render a document more easily understood (Bradley-Johnson & Johnson, 2006).
Pelco, Ward, Coleman, and Young (2009) found that teachers prefer reports in which the results are organized by themes rather than test-by-test when reading levels were held constant. Although our non test-by-test sample was very small we noted that both thematic styles of reports reviewed in our study used more lists than did the test-by-test format and they produced higher grade level reading scores. Accordingly, we wonder if there is a difference between the perceived ease of understanding and the measured reading level. We conducted a preliminary exploration of this question limited by the fact that we had only five examples of the thematic integrated style, we randomly selected five reports from the other two styles and asked five teachers and five parents to rate all on a simple 7 point subjective ease of understanding scale from very easy to very difficult. We then compared that score to the Flesch-Kincaid scores. The data are not sufficient to report statistically but suggest strongly that reading level alone may not be the best estimate of understandability particularly when tables and figures are used, and that further examination of this question is warranted.
Nature of Recommendations
To assess the structure and content of the recommendations, we considered that the local context for these evaluations promotes the IEP as the guiding document for interventions. Speaking of student-specific outcomes (SSOs), the Manitoba Education, Citizenship and Youth (MECY) IEP writing guideline states: “Writing appropriate student-specific outcomes (SSOs) is a fundamental component of the student-specific planning process. Effective SSOs should be SMART” (MECY, 2006).
We redefined the MECY acronym to reflect a slightly more general view of the SMART format and our review of the local reports. Our basis for measurement was as follows:
We repeated our reliability check using 10 reports and this modification. We achieved a correlation of r = .89 indicating acceptable reliability in the judgments. In this analysis we rated every recommendation regardless of where it appeared in the report not to distinguish between report styles. We defined a recommendation as an action statement occurring anywhere in the report or an assertion appearing in the recommendations section.
An example that scored zero for any SMART element because there is no link of person and responsibility or clarity of required action is:
Results of this assessment were interpreted to child’s social worker, the present school guidance counselor and special needs resource teacher on June 8, 2009. CHILD will require a modified program throughout high school and will also need support as he transitions to adulthood.
An example with one time-bound1, one relevant2 and one specific3 element (underlined and superscripted) is as follows:
It would be beneficial for child to have an updated reading evaluation
The average recommendations section comprised 15% of the length of the report. The average number of recommendations per report was six, with a maximum of 15 and minimum of zero. The average number of SMART elements per recommendation was two with a maximum of four and a minimum of zero. As a percentage of all recommendations reviewed the element specific was apparent in 45%, measurable in 2%, achievable in 15%, relevant 40% and time-bound in 11%.
Conclusions
Within the noted constraints, the study provides a good start to establishing partnerships that allow us to address our major goals of understanding the areas of discrepancy between present report-writing practices and our program’s training rubric, providing a baseline against which we can measure changes in general practice as our graduates enter the profession, and exploring strategies for quantifying aspects of the report to support further detailed investigations.
Our view is that a good report describes the way in which the information given to the clinician by other parties has resulted in an understanding of the relevant variables used in the formulation of the clinical treatment plan. As such, it educates the audience regarding the range of probable causes and the extent of the difficulties and strengths observed, in a way that links these logically to the intervention and outcome measurement strategies presented. It also outlines clearly the chronology of events and the actions or responsibilities undertaken or planned by the various parties involved. Our findings indicate that the major part of these reports is descriptive of the history or procedures rather than focusing on integration and synthesis (Ackerman, 2006; Grimes, 1983; Kvaal, Choca, & Groth-Marnat, 2003; Sattler & Hoge, 2006) in the conclusions and recommendations. As well, although the general content of the recommendations provides some degree of specificity and relevance, the elements of measurable, achievable and time-bound exist at relatively low levels.
One possible explanation for this observation is that measurable, achievable and time-bound elements allow for accountability and feel to the clinician like an assessment of competency. This view is consistent with our experience negotiating participation and with Lichtenberger, Mather, Kaufman, and Kaufman (2004) who wrote about challenges of convincing the skeptics of the value of comprehensive competence assessments across the professional life span. Because the profession holds continuing professional development to be a fundamental competence, training programs would do well to address this concern early in students’ professional development and encourage evaluation of all aspects of professional practice. Reports that feature clear and time-bound outcome measures would be a vehicle to this end that also would increase the likelihood of the assessment process resulting in tangible benefits to consumers.
Other aspects of report-writing training need careful examination and refinement. In the present study we demonstrated the need to treat the concept of readability as more than a formulaic calculation. There appear to be consistent differences in both length and readability measures between report sections and across styles that need careful consideration. We tentatively observed that reading level was negatively correlated with ease of understanding. This observation requires further investigation because researchers have used reading level in isolation as a criterion for evaluating reports. This strategy may be misleading given the difference in perception observed here. The need to consider factors beyond calculated reading statistics is consistent with the broader definition of readability put forward by Dale and Chall (1949): “The sum total (including all the interactions) of all those elements within a given piece of printed material that affect the success a group of readers have with it. The success is the extent to which they understand it, read it at an optimal speed, and find it interesting.” This interaction between organizational style, formatting and the peculiarities of the specific software used to calculate reading scores requires further investigation. It may very well be that consumers of reports judge thematic reports written at various reading levels differently than test-by-test reports.
If we want to respect the research in supporting the adoption of a more thematic format, we will have to adjust our understanding of the numbers generated by simplistic readability formulae and evaluate readability section-by-section adjusting for the peculiarities of the scoring algorithms. Reading theorists appear to agree as stated in DuBay (2004): “One cannot emphasize enough the importance of testing and of frequent contacts with members of the targeted audience before, during, and after the process of producing documents as urged by Hackos and Redish (1998). Assessing both the reading ability of the audience and the readability of the text will greatly facilitate this process.” (p. 19). Fortunately we have examples of studies attempting this approach as in Pelco et al. (2009), who asked teachers to generate their own recommendations after reading different report styles. More are needed.
This study has significant technical limitations that severely limit generalizability. The restrictions resulting from the negotiations with the participating school divisions mean that we do not have demographic information to describe the school psychologists who contributed reports. As well, the agreement allowed only descriptive data, and the judgments of quality that would have supplemented our understanding were unavailable. The reports sampled are from a very restricted geographic area that has significant inbreeding of practice traditions and shares an educational political structure that influences the structure and content of reports.
Despite these limitations, it appears the study demonstrates that our understanding of the realities of real-world reporting would benefit from applying a content-analysis approach to include several other aspects of existing reports. The development of common metrics, addressing empirically outstanding issues and critically examining many of our “common sense” practices will benefit the field. Suggestions for profitable topics include: how best to present test scores (if at all), the use of an appendix, the advantages and disadvantages of an executive summary, the effects of documenting collaborations and preagreement on efficacy, the effects of a child-centered wording on the acceptance and implementation of report recommendations, how strength-based approaches impact on funding applications, how best to express an individualized approach to strengths and challenges. The process of negotiating participation sensitized us to the high level of concern that individual psychologists and their employers have to the examination of their work product. We believe, as do many trainers of school psychologists, that there is a crucial need for training programs to develop strong, trusting, ongoing relationships with working school psychologists to the benefit of all concerned. Collaborative action research is one vehicle to do so.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received financial support for the research, authorship, and/or publication of this article from the University of Manitoba Research Grant Program.
