Abstract
The small amount of work on workplace writing assessment has focused almost entirely on student readiness for professional writing or included case studies of employer expectations for new writers. While these studies provide insight into current pedagogies for technical writing and writing instruction in general, the main conclusion to be drawn from them is the unsatisfactory number of recent graduates who display workplace readiness. In this article, we explore writing assessment research in both the academy and the workplace and attempt to identify ways in which the academy’s assessment practices lead, lag behind, or simply differ from writing assessment in the workplace. This comparison will serve to identify not only where the academy might improve pedagogy in its curriculum for technical communication in order to best prepare students for workplace writing but also where the workplace might learn from the academy to improve its own hiring and training procedures for technical writers. In this case study, we used Neff’s approach to grounded theory to categorize rater feedback according to a ranking system and then used statistical analysis to compare writer performance. We found that the direct test method yields the most predictive results when raters combine tacit knowledge with a clearly defined rubric. We hope that the methods used in this study can be replicated in future studies to yield further results when exploring workplace genres and what they might teach us about our own pedagogical practice.
Keywords
Introduction
The little published work on workplace writing assessment has focused almost entirely on student readiness for professional writing (Boettger, 2014; CCCC Committee on Assessment, 1995; Hart & Conklin, 2006) or included case studies of employer expectations for new technical writers (Sageev & Romanowski, 2001; Stevens, 2005; Sullivan & Martin 2003; Whiteside, 2003). Although these studies provide insight into current pedagogies for technical writing and writing instruction in general, the main conclusion to be drawn from them is the unsatisfactory number of recent graduates who display workplace readiness. In this article, we explore writing assessment research in both the technical writing classroom and the workplace and attempt to identify ways in which the academy’s assessment practices lead, lag behind, or simply differ from writing assessment in the workplace. This comparison will serve to identify not only where the academy might improve pedagogy in its curriculum for technical communication in order to best prepare students for workplace writing but also where the workplace might learn from the academy to improve its own hiring and training procedures for technical and professional writers.
The writing tests given to new hires serve a parallel purpose to academic placement exams, in that they are a high-stress, high-risk situation that aims to evaluate writer ability rather than the quality of the completed task (Haswell, 1998; Moss, 1994; Perney, 1994). However, while academic assessment measures ability with the aim to improve the students’ learning, workplace assessment is driven by market forces and is seen in terms of return on investment. While a division can be identified between workplace assessment and rubrics used in the classroom, Savage and Seible (2010) argue that assessment criteria in the technical communication classroom “is enhanced if it can be demonstrated to conform to widely accepted professional standards” (p. 53). The authors argue for studies that pinpoint “skills, core competencies, essential knowledge, and literacies essential” drawn from professional practice that help standardize curriculum based on an established body of knowledge and professional practice (p. 53).
This case study used qualitative and quantitative measurements to explore the implication of market-driven assessment practice at a local search engine optimization company in the United States to understand how different stakeholders might impact technical writing assessment in the workplace. This examination was followed by analyzing a random sampling of subsequent writing tasks of employees of the company to determine whether the assessment methods being used by the company accurately predicted the technical writing ability of these employees. We use this case study to demonstrate how writing assessment in the workplace is being implemented and how those standards could inform instruction in the technical writing classroom.
Writing samples and additional data were collected from 10 recently hired employees of High Hits (pseudonym). We also interviewed evaluators and hiring managers to find out which method, rubric standardization or tacit knowledge, was used during the hiring process. The guiding questions for this study included:
How do hiring managers and evaluators at High Hits assess candidates’ writing abilities? What are the challenges and opportunities of different workplace evaluation methods?
The small amount of research on assessing writing in the workplace provides an ideal opportunity for expanding the dialogue on assessment practices both inside and outside the academic classroom. In the following sections, we begin with an overview of research on writing assessment. Then we use the case study of employees at High Hits to explore the affordances and limitations of different assessment practices. The company is an ideal case to study because they use standard practices when hiring employees, using the direct writing test explained below, conducting interviews, and reviewing job material. Sometimes hiring managers use their own knowledge and experiences to evaluate the writing test and sometimes they elect to use a standardized rubric.
Interestingly enough, our case analysis reveals that experienced hiring managers who used their own knowledge and experience were more successful at predicting a writer’s on-the-job performance than the company’s market-driven rubric. Although a more tacit knowledge approach was more effective than a numeric scoring system, we did find that the rubric involved in this project was still largely successful at predicting future writing performance on the job. We also identified places the rubric could be improved to relieve stress and maintain agreement for the company stakeholders. The process we outline later in this article could be used to assess any rubric designed for writing assessment, in the workplace or in the classroom, by comparing the numeric scores that the rubric yields with specific workplace skills and literacies. This method, based in grounded theory, has yielded promising results in this study and could yield further results when exploring other workplace genres and what they might teach us about our own pedagogical practice.
Writing Assessment Research
Research in technical writing assessment has defined assessment as the application of marking criteria to evaluate written work. Many hiring managers in technical fields use similar strategies to evaluate job candidates’ writing skills, but in some cases with little to no writing evidence beyond traditional résumés and cover letters. Still, other companies, who recognize the importance of writing skills, have chosen to include timed writing tasks as a metric for evaluating potential hire’s competency in writing. This direct test method, which asks applicants to solve a writing problem in a relatively short period of time, can provide hiring managers a more complete picture of the candidates’ writing abilities and may show how a candidate writes under pressure, how they meet deadlines which require a less than perfect draft, and how they develop a piece of writing for specific stakeholders. This timed writing environment can also reveal technical issues like syntactical immaturity or issues with word choice. Employers in this latter group may have a vested interest in knowing whether candidates can meet the writing requirements of the marketplace.
But the process of evaluating the output of these timed writing environments varies from employer to employer. For some, the use of an agreed upon evaluation metric, or evaluation rubric, helps standardize the process across evaluators. Others, who also use the direct test method, rely on a tacit knowledge approach where the assessment is based on experience both writing and evaluating written prose in the workplace. As we review both of these writing assessment strategies from the research, we also provide insights into the opportunities and challenges both afford.
Although little work has been done to study workplace writing assessment (Beaman, 1990; Boettger, 2014; Dias, Freedman, Medway, & Pare, 1999), assessment in academia has been studied extensively. Decades of research have established a pluralistic approach to writing assessment (Ackerman, 1991, p. 143; MacDonald, 1987, p. 321; Petraglia, 1995, p. xii) determining that writing encompasses a diverse number of activities and genres that cannot be unified under a single standard (Downs & Wardle, 2007, p. 556).
Instead, research suggests that writing standards must be negotiated in a case-by-case basis, and such standards often reflect the values of a particular teacher or group rather than that of any objective quality (Elbow, 2006, p. 83). Writing proficiency becomes not a question of generalized quality but of a writer’s ability to conform to standards established by the group.
In the classroom, teachers become the de facto audience for many writing assignments and use writing assessment to measure progress, assign placement, give feedback, and certify proficiency, among other purposes (CCCC Committee on Assessment, 1995, p. 431). Assessment in this context plays an integral part in the academic world by educating “teachers and students alike as to the kinds of work, methods, criteria, and standards which are and ought to be valued” (Wiggins, 1994, p. 130). In contrast, workplace writing assessment may help with placement, either a factor in promotions or hiring, and it may also be used to enforce market-driven standards (Dias et al. 114). Writers in the workplace environment must resolve the inherent difficulty of identifying values of “good writing” that may not mirror earlier classroom experiences but are essential for success in an overcrowded and often impatient marketplace.
Rubrics Versus Tacit Knowledge
There is some debate about how effective rubric scoring can be in evaluating writing ability in general. Peter Elbow (1996) argues against a holistic rubric that results in a numeric score. Because writing contains innumerable variables that determine quality, Elbow believes it is impossible to reduce any written product to a single metric (p. 122). Pamela A. Moss (1994) counters that standardization and quantification in writing assessment are essential for practical purposes. Less standardized forms of evaluation produce results that are difficult to compare and to use in generalizing performance (p. 110). The rubrics used during scoring may be general or task-specific (the rubrics in this case study are task-specific). General rubrics are intended to be used to score any writing activity, while task-specific rubrics are altered to more precisely evaluate a particular writing activity (Bean, 2011, p. 270). Both rubrics have particular strengths and weaknesses. Task-specific rubrics are seen as more accurate and descriptive of specific writing activities. However, a score derived from a specific writing activity may not reveal anything useful about the writer’s performance in a different type of writing (White, 2005, p. 585). General rubrics, on the other hand, cover a wider variety of writing tasks and so can serve as a more generalized measure of writing performance. However, they may be inappropriately applied to all writing activities, whether or not the criteria identified in the rubrics are actually valuable measuring points in the writing activity (p. 585).
The Educational Testing Service first developed holistic scoring in the mid-20th century to lower costs of assessment and to increase agreement between raters (White, 2005, p. 584). Senior researcher Paul Diederich (1974) demonstrated that readers could use the same set of criteria as a guideline to increase consistency between raters and their scores (p. 32). This guideline was eventually formalized into a rubric. Diederich also defined a set of procedures for training, grade norming, and calculating interrater reliability that are standard practices for holistic scoring today. Because holistic scoring is driven by an emphasis on comparability and agreement, it only functions under the assumption that agreement (between members of a discourse community) and validity (the idea that the test accurately measures what it is intended to measure) are synonymous in practice. This assumption is derived from classical test theory, which states that the score observed by a rater is “an estimate of the true proficiency, or true score, that is exhibited in the sample” (Penny, Johnson, & Gordon, 2000, p. 145). The score can only be an estimate because raters imperfectly interpret the rubric guidelines. Rubric training is performed with the aim of improving agreement between raters, called interrater reliability (p. 145). However, while agreement in holistic scoring provides practical benefits, scholars have challenged the assumption that agreement always reflects proficiency in those whose writing is being assessed. These challenges reveal possible weaknesses in holistic scoring.
Reliability, Validity, and Holistic Scoring
One weakness may be the tradeoffs between reliability and validity. Reliability refers to agreement between raters, consistent scores assigned by the same rater, and consistent scores over time by different writers. Validity refers to the test’s ability to properly measure what it claims to measure. Elbow argues that there is “an inherently inverse relationship between reliability and validity. As reliability improves, validity degrades—and vice versa” (Elbow, 2006, p. 88). Inherent to his criticism is the question of whether the reliability produced by agreement is a valid expression of value or simply a display of agreement. Interrater reliability is calculated by standard deviations from the mean, so even if multiple raters have comparable mean scores, they could still be misinterpreting the rubric guidelines. Elbow argues that the significance of agreement is even more tenuous. He claims that any agreement about “value” in writing actually “produce agreement only about conventions” (p. 89). In that case, even a perfect application of rubric guidelines reveals only how well the text adheres to the rubric. Of course, under some circumstances, there is value in assessing how well a text follows a rubric’s guidelines, but that assessment does not always correlate with writing proficiency, or a writer’s ability to create texts that meet all the standards defined by a discourse community that may not be included in the rubric.
While holistic scoring assumes that the score represents the true value of a text, rubrics and texts do not always perfectly align. This problem is especially evident in rubrics that have multiple categories. Richard Haswell (1998) explains this conundrum well: My experience has always been that it is nearly impossible to find anchor essays that are true to the scale of quality pictured by the rubric. What is pictured is quality rising ladder-like and unilaterally across all subskills. A “4” essay is better than a “3” essay in every subskill … Finding anchor essays that show it, however, is a different matter … the problem re-emerges when readers do not know how to score essays that perform at one level in some subskills and at another level in others. (p. 242)
Supposed agreement in holistic scoring can mask disagreement between readers and nuance within texts. The holistic score serves as an “average” of all the positive and negative qualities within the text, hiding most variance and complexity. Raters may be responding to very different parts of the text as they assign a holistic score, but there is no way of knowing what they notice without more information (Lee, Gentile, & Kantor, 2009, p. 394). In addition, if enough raters score a particular text, eventually the text will receive an average score that appears to show agreement. However, the minimum and maximum scores may vary significantly from the average, which raises the question of which scores should be considered to be the “true score” (Haswell, 1998, p. 242). Disagreement may occur because a text does not conform to the expectations of a rubric, causing raters to waffle about the scores they ought to give. These factors can potentially complicate the interpretation of holistic scores.
Holistic scoring assumes that a score derived from a rubric measures a writer’s ability, when there is actually more evidence that it measures performance at a single task. In high-stakes assessment, such as the writing tests given to employees by potential employers, the goal is not just to assess the text, but to place the writer herself into a category (Haswell, 1998, p. 244). Although classical test theory argues that the text reflects the writer’s abilities, Elbow (2006) argues that an assessor must look “through the text to try to see the writing abilities of the writer behind it” (p. 83). However, it is far easier to measure a text than to measure the person who wrote the text. Requiring a writing test to measure the writer as well as the text adds a level of complexity (and therefore uncertainty) that may influence the reliability and validity of a writing test.
Some researchers argue that rubrics cause raters to read unnaturally, looking for “adherence” rather than “value.” In this argument, readers who encounter a text without intending to assign a score have a distinctly different reaction to a text than does a trained rater who approaches it using a rubric. Bean (2011) writes that raters are “trained to read in an unnatural way in order to apply negotiated criteria that do not, in any holistic or meaningful sense, belong to the actual reading practices of real readers” (p. 277). Instead of “creating a reading,” the rater is forced to “find a meaning that is expected in the schema encoded in the guide” (Elbow & Yancey, 1994, p. 97).
Some researchers argue that holistic rubrics are intended to help raters focus on strengths rather than just deficiencies (Lee et al., 2009, p. 394). However, the opposite may also be true: To raters reading with the intent to assign a score, errors are generally more visible than positive elements, so raters are more inclined to notice problems than they do a text’s strengths (Haswell, 1998, p. 241). Because rubrics generally list both positive and negative descriptions, they can prevent raters from weighting negative aspects of performance too highly. In this case, a natural reading—that is, one unaided by a rubric—may actually be an unbalanced reading because many readers may focus more heavily on elements that are naturally more visible.
Haswell’s “Gut Reaction” Assessment Model
Richard Haswell argues that a rater’s reading depends more on the rater’s individual experience than on the rubric being used. He proposes three models for assessment: classical, prototype, and exemplar categorization. Classical categorization follows classical test theory; it “assumes that people categorize by grasping the non-accidental properties of a new instance and matching them with the unique set of properties that define the correct category” (Haswell, 1998, p. 245). This model assumes that categories have clear boundaries and that raters are perfectly capable of recognizing those boundaries from a generalized description in a rubric and applying them to specific texts. However, in practice, classical categorization is actually the most difficult to train raters to understand, as it requires raters to continually conceptualize generalizations and apply them to specific texts.
Prototype categorization “assumes that people categorize by judging how similar the yet-to-be-categorized instance is to abstract schemas they have of the best example or most representative member (prototype) of possible categories” (Haswell, 1998, p. 246). Grade norming and anchor essays often provide the basis for prototypes. The prototype model differs from the classical model because the raters can rely on a more concrete example to create the boundaries of categories. Raters who have some experience often rely on the prototype model to create readings and assign scores to texts.
Exemplar categorization requires raters to have extensive experience with the kind of texts being evaluated. Raters refer back to “exemplars,” or ideal examples they have read before, and use a “gestalt-like pattern recognition,” depending on “a flock of contextual contingencies, including the categorizer’s previous encounter, subsequent experience with it, and current motivations” to categorize a text (p. 247). However, the exemplar model is the most problematic because “the rules governing exemplar membership decisions are definitely not deterministic or probabilistic but rather heuristic, norm-based, or interpretive” (p. 247). If experienced raters use only an exemplar model in evaluating texts, it may be difficult to use practices like grade norming and training to standardize rating practices and achieve acceptable interrater reliability.
However, if raters use some combination of classical, prototype, and exemplar models in their scoring, it may be possible to achieve a more approximate (albeit not a perfect) measure of writer ability. Haswell’s exemplar evaluation can also be called a “gut reaction,” or an unconscious reading of a text (Elbow, 2006, p. 85). In a study performed by Brian Huot (1993), raters who had a background in writing, or an educated “gut,” and who were provided an hour of rubric training were much more successful in achieving interrater agreement than a group of raters who were given the hour of rubric training but who had no background in writing (p. 206). The training (grade norming and prototype model training) provided a “grounding” to the raters; it allowed them to focus their reading according to the values outlined by the rubric. Elbow calls this grounded reading an “empirical reading,” or a reading that takes into account the raters’ past experience and the requirements outlined by the rubric (Elbow & Yancey, 1994, p. 99). Huot reports that this strategy, training raters with expertise, resulted in more consistent scores than training or expertise alone.
Classical test theory assumes that a holistic score can reveal actual proficiency, with some allowance for rater and rubric error. A natural extension of this assumption is that a writer’s proficiency should be consistent from one writing sample to the next. However, the model requires the assumption that not only can raters identify a “true score” for a text, but “true score” also represents a writer’s ability, rather than just the proficiency located within the particular text. We test these assumptions by rating the high-stakes assessment (i.e., the writing tests given to job applicants) exactly the same as the writing samples the employees create after being hired using a company-standardized scoring rubric. Because researchers offer no advice on how exactly to “look through” a text to see the writer’s abilities, we attempted to evaluate the text itself (both the writing tests used during hiring and the writing samples created on the job) as a proxy for ability and see how well that evaluation predicts ability over time. Much of the management at High Hits believes in the importance of good writing, but which of the approaches used by their management staff, rubric scoring or intuitive approaches, yields acceptable, consistent, or reliable results?
Methods
In close consultation with High Hits, we recreated the hiring process using assessment officers and writing samples from actual employees. The assessment officers were chosen because of their previous experience with writing and editing and their training with the High Hits rubrics and writing standards.
The rubrics were developed in-house at High Hits to score all blogs and copy written by copywriters on a day-to-day basis. The purpose of examining both the writing tests and the content written after the copywriters were hired is to determine which assessment methods being used by the company to assess the writing tests best predicted the writing ability of the new employees. We also tested an alternative assessment practice using the company rubrics to score the writing tests and to see if the rubrics offer a better predictor of copywriter scores after hiring. This project was approved by the Internal Review Board on August 30, 2016.
Participants
Copywriter Demographic Information.
Some participants included multiple responses
Raters
Because High Hits employees only reviewed each writing sample once and did not create scores for any of the writing tests, we employed raters to do two reads of each writing sample. We chose three raters who had previous experience in copyediting, had satisfactory writing samples, and passed a grammar test. We trained raters using company training materials that were previously used at High Hits training new employees. Raters assigned numeric scores based on the workplace rubric requirements and also recorded a short commentary based on their tacit impressions of the writing. Using open, axial, and selective coding methods based in grounded theory, we were able to transform the raters’ tacit impressions into numeric quantities that could be compared with the rubric scores. We then used statistics tests to measure possible correlations between the rubric scores and tacit knowledge scores.
Raters were asked to transcribe qualitative comments to explain the rubric scores they assigned and to provide data for their tacit knowledge reading of the texts. Raters were asked to focus on specific details that reflected the rubric score but to also include other feedback based on their own experience with editing. In order to be as consistent as possible, raters included as much detail as possible in their comments. Most comments were between two and three sentences long. These comments were meant to supplement the rubric scores and provide more insight into why the raters assigned particular scores to particular texts. The comments were also meant to reflect Elbow and Haswell’s theory that raters’ past experience, or gut intuition, often affect how raters interpret texts and apply rubrics. Because the comments were created in the raters’ own words about elements of the text they noticed organically, these comments ought to be more reflective of their experience and intuition than the rubric scores they assigned to the writing samples. The numerical scores and qualitative comments provided by the raters served help determine whether rubric scoring or a tacit knowledge approach would be more predictive of a writer’s aptitude at High Hits.
Writing test
Like academic placement tests, the writing test at High Hits seems to be guided under the classical test theory assumption that one successful task completion predicts a writer’s ability to consistently produce comparable texts. The writing test was designed to predict consistent writing proficiency. One measure of the effectiveness of a writing test is the authenticity of the task (Wiggins, 1994, p. 130). This test is as similar as possible to the day-to-day writing on the job. It contains two prompts: one for a blog and another for copy. The blog prompt asks the candidate to write a 250- to 350-word article for one of three clients. It gives instructions that are remarkably similar to the requirements in the rubric: keep the audience in mind, use formatting to make the text easier to scan, and write a compelling title. The copy prompt also mimics a common task on the job—rewriting a poorly written web page for a client. The prompt provides minimal information from the client as well as instructions that reflect rubric requirements: be persuasive, speak in the company’s voice, use the keywords, and include a call to action. The applicants were not given a copy of the rubric during the application process (see Online Appendix 1 for the full text of the company writing test).
Writing samples
The company began keeping all original versions of the copywriters’ writing samples after August 2015. We selected a random sampling of copywriters’ copy and blogs written between August 2015 and August 2016. Some employees did not work for the full year in which samples were collected, resulting in an uneven number of samples from each copywriter. One copy text and one blog were collected per month per copywriter. Each text was evaluated once by two different raters. In total, 221 writing samples were gathered; there were 442 readings of the texts. All identifying information was removed from the writing tests and writing samples and replaced with numerical markers before writing samples reached the raters.
Rubrics
High Hits used two holistic, task-specific rubrics to evaluate the two different writing products: blogs and copy (see Online Appendices 2 and 3 for the complete rubrics). The rubrics used by High Hits are holistic rubrics, but instead of describing multiple subskills on a sliding scale (e.g., “Is the organization excellent, acceptable, or substandard?”), the rubrics use ranked tiers. Each tier describes a set of requirements that must be met in order to move on to the subsequent tier. For example, Tier 1 describes the qualities that are absolutely essential for copywriting: originality (no plagiarism), correct audience, accurate information, and a positive attitude. These terms occur only in Tier 1; there are no “degrees” of proficiency in higher tiers. If a copywriter addresses an incorrect audience, the text automatically receives a Tier 1 score and cannot receive any other score. Requirements terms are not repeated between tiers. If a text lacks any of the required qualities in Tiers 1 through 3, it automatically receives a failing score and is returned to the copywriter for revision. It does not matter if the text contains any of the qualities from higher tiers; the writer must meet the most basic requirements and make changes before an editor will look at it again. If a writer meets all the requirements of Tiers 1 through 3, they can earn a Tier 4 score, which is considered passing. Tier 4 contains qualities that are no longer requirements, but that indicate proficiency according to the company standards.
Although these workplace rubrics do share many purposes with academic rubrics, such as to give feedback and aid in writer improvement, the main purpose of this workplace rubric seems to be to sift between acceptable and unacceptable texts (CCCC Committee on Assessment, 1995, p. 431). Therefore, the rubric is much more product-oriented than classroom rubrics. In their role as professional copywriters, the writers are expected to consistently produce texts that climb the tiers of the rubric to Tier 4. Revisions and rewrites translate to lost profits by their employer. Although professional development is important to the company, consistent proficiency (passing scores) is expected.
Interrater reliability
Exact and Adjacent Interrater Reliability.
To discover any possible reasons for these discrepancies, we examined the comments provided for the 52 readings that were assigned both pass and fail scores. Of the 52 readings, 20 readings, or 38% of the total, were what one would consider to be rater mistakes, or misreadings, of the rubric. Thirteen of those readings (25% of the total) resulted from confusion about the company’s keyword practices (a Tier 2 issue). During the course of the year during data collection, High Hits changed their keyword practices, so copywriters formatted keywords differently in about half of their writing samples. Raters did not always recognize which keyword practices were being used in each writing sample and occasionally assigned a score using the incorrect keyword practice. The remaining seven divergent scores were caused by a student rater’s overly strict interpretation of the rubric. For example, Rater 3 assigned a piece of copy a Tier 3 score for poor readability because the first two paragraphs of the copy did not “get to the point” quickly enough (per the guidelines in the rubric). However, Rater 2 noted in her comments of the same sample that, as the text went significantly over word count, a company editor could easily delete the two paragraphs and fix that issue. Rater 2 assigned the copy sample a Tier 4 passing score.
Data analysis
Open Coding Process.
Keyword Rankings According to Rubric Tier.
Positive keywords received a ranked value that corresponds to tier level. Tier 1 keywords have a value of 1, because receiving a positive comment about fulfilling a basic requirement is not a very impressive indicator of a writer’s proficiency. Tier 5 keywords have a value of 5, because receiving a positive comment about achieving excellence is more impressive. To contrast, negative keywords have an inverted ranking with negative values. A negative comment with a Tier 1 keyword is a much more problematic indicator of writer performance than a negative comment using a Tier 5 keyword.
Results and Discussion
Most companies already recognize that hiring is a high-stakes situation and that they must gather as much information about candidates as possible before making a hiring decision. Current hiring practices in many workplaces do not solely rely on writing tests to make hiring decisions. They gather résumés and writing samples and conduct interviews in addition to administering writing tests. High Hits was no different: Hiring managers invested significant time and energy into evaluating potential employees. For this study, we evaluated the writing tests outside of the context of the other information hiring managers gathered to evaluate copywriter candidates.
We found that High Hits uses assessment tools like rubrics to enforce industry best practices and maintain a consistent standard of quality for their clients. The company uses two distinct rubrics (Online Appendices 2 and 3) to score the writing, with genre-specific requirements outlined in each rubric. However, to our surprise, some of the employees who participated in this study were not evaluated by the company rubrics during the hiring process. Instead, some hiring managers relied on an exemplar, or empirical reading of the writing tests, using their past experience with copywriting to identify candidates they thought would succeed in the company.
Only 25% of the copywriters in this study received passing rubric scores on their writing tests, but 85% of the writing samples they produced on the job received passing rubric scores. An experience-based reading of writing tests seems to be more predictive that copywriters can reach a threshold of performance rather than predicting their actual rate of consistent performance.
However, we acknowledge with these findings that stakeholders in the workplace can feel nervous about relying on unstandardized and unquantifiable hiring practices, which is one of the reasons we sought to test an assessment method that would be easily standardized and quantifiable. In this case at least, relying on a strict rubric reading of the writing tests would sacrifice validity for agreement, as Elbow theorized. Although stakeholders might like to be able to point to a rubric that ensured that all applicants were measured by the same standard, that standard does not appear to adequately measure the writer behind the text, nor does it assess the writer’s abilities beyond her performance on that particular test.
However, although experience-based writing test assessment proved to be fairly predictive in this case, we recognize that abandoning the holistic value of agreement is not feasible in some workplace settings, especially after the hiring process has ended and writers must perform on a daily basis. Agreement is essential to stakeholders who have less experience with writing as a discipline of study. Managers require consistent standards to hold employees accountable for their performance. These standards ensure quality control for clients and other stakeholders. It should also be noted that when both parties, managers and employees, agree to a standard by which they are judged, there is far less conflict when a manager sends a writing task back to an employee for revision. Vague institutional standards could result in inconsistent feedback during job performance. Agreement and consistency are therefore values that must be considered when employers adopt a tacit knowledge approach or a market-driven rubric approach.
Based on the findings of this study, future workplace assessment might include the value of clear instructional writing prompts. It is possible that this extra information may function the same way grade norming works for raters. If a candidate has a background in writing, the rubric may serve to focus the writer on demonstrating certain aspects of writing important to the potential employer rather than simply relying on past writing experiences. In fact, it is possible that the lower scores observed for writing tests occurred largely because of the applicants’ ignorance of the institutional standards. The language currently used in the rubric at High Hits is intended for use by employees with company training in online marketing. With this change, a future study may find more useful conclusions about using a rubric reading as a method of evaluating writing tests as predictive measures of writing on the job.
There is hope that the methods described in this study could be replicated to yield further results comparing tacit knowledge readings to strict rubric readings. To our knowledge, no other researchers have applied grounded theory to examine practical application of Elbow’s empirical reading theory in writing assessment or Haswell’s methods of classification. The methods used in this study could be repeated to see if the insights gained by this project apply to other writing situations or in other contexts, professional and academic. As so little work is being done in workplace writing assessment, there is ample opportunity for future researchers to explore predictability and applicability of these practices as we prepare students for on-the-job writing and communicating.
Supplemental Material
Supplemental material for Testing the Test: Expanding the Dialogue on Technical Writing Assessment in the Academy and Workplace
Supplemental material for Testing the Test: Expanding the Dialogue on Technical Writing Assessment in the Academy and Workplace by Lindsay Tanner and Jon Balzotti in Journal of Technical Writing and Communication
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Authors received the Brigham Young University Graduate Research Fellowship to fund the research.
Supplemental Material
Supplemental material for this article is available online.
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
