Assessing Assessment

Abstract

This article introduces an objective grammar and math assessment and evaluates the assessment’s outcome and reliability when fielded among eighty-one students in media writing courses. In addition, the article proposes a rubric for grading straight news leads and compares the rubric’s reliability with the reliability of rating straight news leads on an “A to F” scale without the guidance of a rubric. The study found evidence suggesting all three assessments may be useful tools in evaluating student learning in basic media writing courses.

Keywords

assessment media writing grammar math student evaluation rubrics

Introduction

The push for not only programmatic accountability regarding student learning but also institutional accountability has driven accredited and non-accredited journalism programs to pay greater attention to appropriate assessment measures. In an attempt to achieve reliable measures, programs and faculty members have moved toward using rubrics, determining appropriate goal-focused assessment methods, and trying to account for student learning.

Research has shown evaluation methods have evolved over time, moving from standardized tests in the early years of assessment to portfolios.¹ While English departments across the United States have been discussing the best method for assessing student writing at institutional and program levels for decades,² journalism programs appear to be somewhat late in joining the discussion. This is surprising considering that many journalism programs are accredited by the Accrediting Council on Education in Journalism and Mass Communication (ACEJMC), which requires assessment as part of the accreditation process and the fact that journalistic writing is significantly different from composition writing.

Only a handful of studies could be found that discussed assessment concerns specifically in media writing courses. Using data collected across multiple sections of an introductory media writing course at a mid-level U.S. university’s accredited school of journalism, the current study proposes and evaluates an objective instrument for assessing improvement in students’ grammar and math skills as well as a rubric for reliably grading straight news leads. With the impetus to hold faculty accountable for student learning, having a clear idea of the questions, concerns, successes, and failures in assessing media writing is essential for mass communication faculty.

Literature Review

Assessment Standards

ACEJMC Standard 9 calls for accredited journalism and mass communication (JMC) programs to assess learning outcomes in relation to suggested core values and competencies on a regular basis. As part of this standard, the results of the assessment must be collected, reported, and used to guide changes in curriculum. Suggested assessment measures include exit exams, interviews, and professional projects or portfolios. The introduction of this standard in 2003 prompted programs to ensure their assessment measures were reliable and valid. While most JMC programs are trying to develop better assessment methods and tools, Lingwall points out that the majority of these are created “without clear indication of whether they have succeeded or failed.”³

Ideally, assessment serves as a way to monitor student learning and evaluate whether learning objectives have been reached in courses and programs.⁴ In a 2010 study, Weir examined the results of a pretest/posttest administered to entry-level and senior-level students in an accredited journalism program.⁵ The test was based on the recommendations of core learning objectives from the accrediting body. As to be expected, senior-level students did perform better on the exam than entry-level students. However, the students’ overall results were lower than expected. Weir ultimately concluded that the assessment tool was reliable because it produced “similar scores” over time. It should be noted that this assessment tool, an objective test of grammar knowledge, did not include direct assessment of student writing.

Math Assessment

It should be noted that ACEJMC core values and competencies for assessment go beyond writing assessment. In fact, in the section on Professional Values and Competencies, the Council states that journalism graduates are expected to be able to “apply basic numerical and statistical concepts” learned as part of their education.⁶ Despite this charge, only a handful of articles could be found that addressed assessing a math component as part of journalism education.⁷ In their survey of department chairs, Cusatis and Martin-Kratzer found that most programs incorporated math skills through general education courses and incorporating math in specific core courses in the major.⁸ Despite the fact that most programs appear to be keeping math education on the periphery of their programs, few would argue against the notion that journalists need to develop strong math skills to cover a variety of topics.

Writing Assessment

Writing assessment is not a new topic for some disciplines. With assessment plans rarely being static,⁹ it is logical that the practice of writing assessment is often discussed in terms of three waves based on testing practices.¹⁰ The first wave (1950–1970) focused on objective testing with focus on using tests with clear correct and incorrect answers.¹¹ While these tests were most often used to place students in courses, many professors found they felt disconnected from writing assessment procedures because testing specialists were guiding the practices.¹² As Huot points out, these early exams did not quite fit the experience of students and teachers in the classroom.¹³ This brought about a call for assessment that moved away from focusing on reliability expressed as intercoder reliability and moving more toward a measure that reflected validity.¹⁴ This movement allowed raters to focus on elements such as context and localized standards set by stakeholders and teachers.¹⁵

In the second wave (1970–1986), the evaluation of student writing moved to the forefront in the form of essay assessment.¹⁶ There was more of a connection to the classroom with this method of assessment, with the tool asking the student to write an essay which was then evaluated by raters.¹⁷ This method, however, does not solve all the problems encountered with objective testing as it awakens concerns of intercoder reliability. In addition, some people have argued that students seek feedback from others before submitting such essays for evaluation. Accordingly, the end product does not reflect the student’s true abilities.¹⁸

In an attempt to get a more accurate picture of students’ writing abilities over time, the third wave (1986 to present) moved assessment to evaluating student writing through a collection of writing samples in portfolios.¹⁹ This mode of assessment most closely aligns with the process of writing where students are encouraged to write, edit, and revise, while acknowledging that a writer’s voice and style can change over time. It also allows a correction for the students’ writing being affected by factors such as a student having an off day or not being interested in the testing topic.²⁰ Thus, the portfolio would give students the chance to provide a representative sample of their works. As with the second wave, this approach suffers from the potential lack of intercoder reliability and high cost of rating.²¹

Even with the evolution of writing assessment expressed by Huot²² and Yancy,²³ there are still concerns regarding which method is best and most representative of student skills. Perhaps most of the issues that people have with assessment testing stem from the fact that, as Huot²⁴ points out, the majority of assessment is controlled by testing theory, not writing theory. Thus, there are consistent calls for refining and rearticulating how writing assessment should be conducted and exactly what should be the emphasis of the measurement scale. The end result should not just consider reliability and validity, but other stakeholder issues such as writing theory, technical skills, readability, and other concerns to produce the most effective assessment.

Assessment measures of incoming students in JMC programs have varied over the years, including setting minimum grade point average (GPA) standards, evaluating through objective entrance exams, and submission of writing samples,²⁵ with no single measure being consistently preferred.

Writing Assessment in Practice

There are a variety of emotions that students and instructors bring with them when it comes to writing assessment. While student apprehension to writing has been well documented,²⁶ instructor emotions toward assessment have been less examined.²⁷ The underlying emotions instructors bring to the task of assessing student work are a part of the process, even though instructors strive for fair and consistent assessments.²⁸ Thus, as Caswell argues, having both students and instructors acknowledge emotions regarding writing and writing assessment is key to developing strong assessment tools.

For students, grades and assessment criteria may seem disconnected because the instructor’s subjective judgment may enter into the process when assessing writing.²⁹ This could, in turn, add to student apprehension toward writing and assessment. In an attempt to clarify expectations, many faculty members have turned to rubrics to help provide a clear rationale and roadmap for student success. Rubrics help facilitate a positive connection through providing insights into the grading process by alerting students to items that the instructor deems to be the most important in assessing the work.³⁰ When the assessment process is open, instructors are less likely to be in the position of justifying the grades to students.³¹

Rubrics, however, may be problematic when it comes to evaluating writing in that they can turn the writing process into a checklist of criteria.³² Thus, one of the essential problems with developing a rubric for writing is creating categories that assess, but do not encourage mechanistic writing aimed more at fulfilling specified criteria than at communicating with effectiveness and style.³³ Even when rubrics are provided to students at the beginning of an assignment, some of the requirements may still appear to be subjective from the student perspective.

Yet there is hope. Nelson argues that a clear set of criteria and objective grading measures may have a positive impact on journalism student writing practices.³⁴ The study revealed that when given clear instructions of the assessment criteria, students were more likely to try to receive an A and attend class more frequently.³⁵ Thus, Nelson concluded that explicit assessment criteria allowed students to take control over their learning by setting goals to achieve the grade they desired.

Faculty in many journalism programs believe that freshman students begin the program without the adequate writing skills to succeed.³⁶ Despite having to overcome presumed student inadequacies, journalistic writing may present an interesting case that lends itself to assessment more easily than writing produced in English composition because in journalistic writing, there are specific agreed on norms beyond style. For example, the inverted pyramid and specific requirements for lead generation should appear consistently throughout media writing courses. However, evaluating these elements is more structural than content focused. As Finn³⁷ pointed out, perhaps one of the main challenges for media writing students is a role and audience change. In contrast to composition papers that usually require students to argue a side to convince a professor, student journalists are expected to present an unbiased view of an issue to a mass audience.³⁸ As such, their writing is closer to professional than academic.

While not many articles could be found dealing with journalism writing assessment, it is still a concern for programs, universities, and accrediting bodies. ACEJMC accredits 114 professional mass communication programs.³⁹ As part of the accrediting process, schools are to comply with nine standards, one of which, Standard 9, focuses on assessment of learning outcomes. As part of the evidence of compliance, schools are asked to provide a statement of competencies, an assessment plan, records of collected assessment data over time, and proof that the assessment data are facilitating improved teaching methods.⁴⁰ Yet, in spite of having this standard in place, many accredited programs struggle with assessment.

Although most mass communication faculty would agree that student writing needs improvement, there may be some disagreement over whether to focus on the process or the end product as they teach writing.⁴¹ Finn’s⁴² study of media writing called for two levels of assessment to produce the most effective results. For evaluating student work, Finn argues for more analytic scoring methods. These methods would serve to give students significant feedback while connecting assignments to clearly defined assessment values. However, Finn also recommends holistic evaluation procedures for institutional and programmatic assessment.

A Proposed Rubric for Evaluating Straight News Leads

Media writing courses vary widely in their scope and content, but most begin, or address early on, the crafting of a “straight news lead” summarizing the who, what, where, when, why, and how of a given set of facts. A search of the literature produced no well-tested rubrics designed specifically to evaluate straight news leads written by college-level journalism students. Accordingly, the current study devised its own, which consisted of ten “yes/no” questions to be answered about each lead:

Is the lead a single sentence?

Does the lead consist of thirty or fewer words?

Does the lead’s first verb express the most newsworthy “what” element of the story?

Does the lead’s first verb appear within the first half of the lead’s words?

Is the lead’s first verb active voice?

Does the lead include, but not begin with, a “where” element?

Does the lead include, but not begin with, a “when” element?

Does the lead either omit attribution or place the attribution at the end of the lead?

Does the lead use correct grammar, punctuation, and spelling?

Does the lead use correct Associated Press (AP) style?

The proposed rubric’s ten criteria were based on a “Six Rules for Writing Straight News Leads” handout used in class by one of the authors (see the appendix). The rules prioritize a brisk, clear presentation of the story’s key “what,” “where,” and “when” elements organized according to a rigid structure that centers on the placement and voice of the lead’s first verb. We do not contend that every possible good straight news lead—or even the best one—for a news story about a given set of facts will conform to all of the rules. Rather, we contend that simultaneously observing each of the rules when writing a straight news lead will consistently produce a good lead and indicate appreciable writing aptitude. Furthermore, the rules empower students to take control over their learning by setting goals aligned with professor expectations.⁴³

To further examine assessing media writing, this article introduces an objective, thirty-item grammar and math assessment and evaluates the assessment’s outcome and reliability when fielded among eighty-one students in multiple sections of a beginning media writing course during the Fall 2011 semester. In addition, the article summarizes the evaluations produced by two seasoned media writing instructors who applied the proposed rubric to end-of-semester samples of straight news leads the students had written. Finally, the article uses Krippendorff’s alpha to compare the reliability of those evaluations with evaluations produced by the two professors using no rubric.

Research Questions

To evaluate the results and reliabilities of the proposed assessment and rubric, the study explored the following research questions:

RQ1: Which areas of the assessment’s grammar and basic math portions appeared most problematic for students at the beginning of the media writing course?

RQ2: Did average scores on the assessment’s grammar and basic math portions improve by the end of the media writing course?

RQ3: Did professors’ ratings of student writing quality show more reliability when based on the proposed rubric than when based on their individual A to F grading criteria?

RQ4: Which, if either, rating of student writing quality correlated better with the study’s measure of student grammar ability: the one based on the proposed rubric or the one based on individual A to F grading criteria?

Method

Data

Students in all sections of media writing at a mid-major university in the United States were asked to voluntarily complete a grammar and math assessment and a writing assessment at the start of the Fall 2011 semester. At the end of the semester, the students were asked to complete the same exercises, again voluntarily. The grammar and math assessment consisted of thirty multiple-choice questions. The first twenty-five questions pertained to grammar and punctuation. The final five pertained to basic math. The writing assessment involved presenting students with a poorly organized, first-person description of a small fire in the campus building that houses the university’s school of journalism. The description was presented as notes and quotes taken down by a student journalist who was in the building when the fire started and who observed various subsequent events, including the building’s evacuation, the arrival of firefighters, treatment of a janitor who had been injured trying to extinguish the fire, and so forth. Students were asked to “compose a brief (three- to five-paragraph) article, suitable for publication in a newspaper, summarizing the event described by the facts provided.” The assessments were administered in class by each course’s instructor, who directed students to a Web page containing links to the assessments. The assessments were presented as online forms hosted by Google Docs at http://docs.google.com.

Submissions from each form automatically accumulated in an associated Google Docs spreadsheet. The researchers downloaded the spreadsheets for compilation and analysis once the data collection had concluded. Pre- and post-semester submissions by each student were matched using Microsoft Access, and all statistical analyses were performed using SPSS for Windows. In all, eighty-one students in thirteen separate sections of media writing completed both of the pre-semester assessments, both of the post-semester assessments, and a form consenting to the study’s use of their submissions.

Materials designed to help students learn the grammar and math skills covered by the assessment had been made available to each media writing instructor before the beginning of the semester. Instructors were free to use all, some, or none of the materials. These materials consisted of a set of weekly, ten-question, do-at-home quizzes in both print and pre-programmed Desire2Learn formats as well as a study guide for each quiz. The study guide directed students to Web-based grammar resources, particular entries in the Associated Press Stylebook, and chapters in a book explaining essential math skills for journalists.⁴⁴ Both the AP Stylebook and the math book were required texts for the course.

Variables

Student performance on the assessment’s grammar portion

The study measured student performance on the grammar portion of the assessment by awarding 1 point for each correct answer to the first twenty-five questions in the assessment. This portion of the assessment (see the appendix) covered subject–verb agreement (Questions 1–5); distinguishing between “its” and “it’s” (Question 6) and “their,” “there,” and “they’re” (Questions 6–7); pronoun case (Questions 8–9); possessives (Question 10); misplaced modifiers (Questions 11–15); elliptical usage (Question 16); parallel structure (Question 17); run-on sentences, comma splices, and sentence fragments (Questions 18–21); and commas and punctuating quotations (Questions 22–25).

Student performance on the assessment’s math portion

Similarly, the study measured student performance on the math portion of the assessment by awarding 1 point for each correct answer to Questions 26 to 30 on the assessment. This portion of the assessment covered calculation of a percentage (Question 26), a percentage increase (Question 27), an average (Question 28), distance traveled in a given time at a given rate (Question 29), and the area of a circle (Question 30).

Professor ratings of student-written leads

The brief news story submitted by each participating student at the end of the semester was compiled into a single Microsoft Word file in a random order. A randomly assigned ID number replaced the name of each story’s author to facilitate a blind review. Two experienced media writing instructors—both among the authors of this article—then independently assigned a quality rating of A, B, C, D, or F to the lead of each of the first forty-one of the file’s randomly ordered stories. In assigning these ratings, each instructor used whatever criteria he or she typically uses when grading leads. Both instructors then rated the leads of the remaining forty stories using the proposed rubric. They did so with no prior discussion of the rubric or how to apply it. Although it is usually standard practice in studies to train evaluators before attempting to apply a rubric, the researchers elected in this case to forego such a priori training to evaluate the rubric’s intuitiveness and to more accurately reflect the classroom experience. At the university sampled, the media writing course is taught by several full-time and adjunct faculty who do not meet prior to grading to agree on a rubric. Ultimately, the goal was to see how the rubric would work in a classroom setting.

Procedures

To examine which sections of the grammar and math assessments appeared initially most problematic for students (RQ1), the analysis computed chi-squared tests comparing the percentage of correct answers on each assessment item with an arbitrarily chosen standard of 66 percent, or two-thirds, correct. The study interpreted percentages significantly lower than this standard as evidence of a “problem area” for the students who took the assessment. Meanwhile, the analysis relied on paired-sample t tests and a Spearman’s rho rank-order correlation to determine whether average scores on the assessment’s grammar and math portions had improved by the end of the media writing course (RQ2).

Krippendorff’s alpha tests of reliability computed using Hayes’ “KALPHA” macro for SPSS⁴⁵ were interpreted to determine whether professors’ ratings of student writing quality show more reliability when based on the proposed rubric than when based on their individual A to F grading criteria (RQ3). Simple Pearson correlations were calculated to learn which, if either, approach to rating student writing quality—the proposed rubric or the individual A to F criteria—correlated better with the study’s measure of student grammar ability (RQ4). Where appropriate, of course, descriptive statistics were computed and examined as well.

Results

Table 1 compares the pre- and post-semester means and standard deviations for the number of correct answers students gave on the entire assessment as well as on the grammar and math portions in particular. Cronbach’s alpha for the math items at the start of the semester, .494, was unacceptably low but rose to an acceptable, if questionable, .545 by the end of the semester.⁴⁶ The twenty-five grammar items, by contrast, resulted in acceptable alphas of .645 at the start of the semester and .661 at the end of the semester.

Table 1.

Descriptive Statistics.

	N	M	SD
Pre-semester overall	81	21.35	3.782
Post-semester overall	81	22.80	3.770
Pre-semester grammar	81	18.06	3.367
Post-semester grammar	81	19.20	3.307
Pre-semester math	81	3.28	1.154
Post-semester math	81	3.60	1.211
N	81

Table 2 shows the results of comparing the percentage of correct answers on each pre- and post-semester assessment item with an arbitrary standard of 66 percent correct. Based on percentages of correct answers significantly below 66 percent, the answer to RQ1 appears to be that areas of the grammar portion of the assessment most problematic for students at the beginning of the course included aspects of subject–verb agreement, pronoun case, misplaced modifiers, and punctuating quotes. Meanwhile, the most problematic areas of the assessment’s math portion for beginning students appeared to be calculating a percentage change and calculating the area of a circle.

Table 2.

Item Analysis of Pre- and Post-semester Assessment Results.

		Pre-semester correct		Post-semester correct		Post-semester − Pre-semester (%)
Question	Topic	Count	Percentage	Count	Percentage	Post-semester − Pre-semester (%)
Q1	Subject–verb	125	93*	125	93*	0
Q2	Subject–verb	70	52*	76	57*	5
Q3	Subject–verb	134	100*	130	97*	−3
Q4	Subject–verb	99	74	103	77*	3
Q5	Subject–verb	92	69	96	72	3
Q6	Its/it’s	110	82*	105	78*	−4
Q7	Their/there/they’re	133	99*	131	98*	−1
Q8	Pronoun case	62	46*	86	64	18
Q9	Pronoun case	46	34*	54	40*	6
Q10	Possessives	89	66	94	70	4
Q11	Misplaced modifier	82	61	97	72	11
Q12	Misplaced modifier	78	58*	100	75*	17
Q13	Misplaced modifier	85	63	92	69	6
Q14	Misplaced modifier	100	75*	117	87*	12
Q15	Misplaced modifier	105	78*	118	88*	10
Q16	Ellipticals	115	86*	120	90*	4
Q17	Parallel structure	122	91*	122	91*	0
Q18	Run-on sentences	80	60	79	59	−1
Q19	Comma splice	103	77*	107	80*	3
Q20	Sentence fragment	99	74	105	78*	4
Q21	Sentence fragment	113	84*	116	87*	3
Q22	Commas	100	75*	95	71	−4
Q23	Punctuating quotes	42	31*	61	46*	15
Q24	Punctuating quotes	98	73	101	75*	2
Q25	Punctuating quotes	94	70	84	63	−7
Q26	Percentages	115	86*	122	91*	5
Q27	Percent change	73	54*	88	66	12
Q28	Averaging	110	82*	118	88*	6
Q29	Rate × Time = Distance	98	73	97	72	−1
	Area of a circle	25	19*	36	27*	8

Significantly different from two-thirds (66 percent) correct, according to a chi-square test (p < .05).

Table 3 shows the results of a paired-samples t test contrasting the average number of correct answers on the entire assessment, the grammar portion, and the math portion at both the beginning and end of the semester. Averages for the assessment overall, the grammar portion, and the math portion rose nonrandomly at the end of the semester. The increase was modest in an absolute sense, though, amounting to increases of about half a question for the math portion, one question for the grammar portion, and about a question and a half for the overall assessment. Thus, the data support a qualified “yes” to RQ2. Table 4 repeats the paired-samples t test analysis, but for each of three groups consisting of the students with the lowest, midrange, and highest percentile scores, respectively, on the initial overall assessment. Among students in the lowest percentile group, average overall and grammar scores rose significantly by about two questions and average math scores significantly by about half a question. By contrast, none of the gains among the middle percentile group were significant, and only one—a gain of about one question on the overall assessment—proved significant among members of the highest percentile group. Thus, for both the grammar and math portions of the assessment, the qualified “yes” to RQ2 depends partly on students’ initial ability. The largest—albeit still modest—knowledge gains occurred among members of the group with the lowest initial overall scores, while slightly positive, but nonsignificant, gains occurred among midrange students and, for the most part, also among the most advanced students.

Table 3.

Average Change in Post-semester Assessment Scores Compared with Pre-semester Assessment Scores.

	Paired differences
				95% CI of the difference
	M	SD	SE M	Lower	Upper	t	df	Significance (two-tailed)
Overall	1.457	2.846	.316	0.827	2.086	4.606	80	.000
Grammar	1.136	2.509	.279	0.581	1.691	4.075	80	.000
Math	0.321	1.127	.125	0.072	0.570	2.563	80	.012

Note. CI = confidence interval.

Table 4.

Average Change in Post-semester Assessment Scores Compared with Pre-semester Assessment Scores by Initial Overall Score Percentile.

	Paired differences
				95% CI of the difference
Percentile	M	SD	SE M	Lower	Upper	t	df	Significance (two-tailed)
Lowest
Overall	2.448	3.031	.563	1.295	3.601	4.350	28	.000
Grammar	1.931	2.828	.525	0.855	3.007	3.678	28	.001
Math	0.517	1.122	.208	0.091	0.944	2.483	28	.019
Middle
Overall	0.692	2.724	.534	−0.408	1.793	1.296	25	.207
Grammar	0.615	2.334	.458	−0.327	1.558	1.345	25	.191
Math	0.077	1.164	.228	−0.393	0.547	0.337	25	.739
Highest
Overall	1.115	2.519	.494	0.098	2.133	2.258	25	.033
Grammar	0.769	2.141	.420	−0.096	1.634	1.832	25	.079
Math	0.346	1.093	.214	−0.095	0.788	1.614	25	.119

Note. CI = confidence interval.

Spearman’s rho for the data depicted in Table 2—that is, number of correct pre- and post-semester answers on each of the thirty items—was calculated at a statistically significant (p < .01) .939. The finding indicated that the rank order of the pre- and post-semester scores differed little. What post-semester gains there were tended not to change the overall order of the assessment’s items in terms of those items that the students were most and least likely to answer correctly.

Krippendorff’s alpha for the two professors’ A to F ratings of the leads written by the students at the end of the semester was .491 on the statistic’s 0 to 1 scale. Based on a bootstrap sample of 1,000 pairs, the alpha’s 95 percent confidence, which ranged from .246 to .661, did not include the .667 minimum for drawing even tentative conclusions in situations where the consequences of incorrect decisions are unknown.⁴⁷ Several, but not all, components of the proposed rubric produced much better alphas, as Table 5 shows. The raters showed high agreement on the rubric items pertaining to lead length, first verb placement and voice, and inclusion of a “when” element. Agreement levels closer to random appeared for items assessing whether the first verb focused on the most newsworthy “what” element of the story, whether the lead included a “where” element, and whether the lead contained grammar, punctuation, or spelling errors. Rater agreement appeared nonrandomly poor for the item assessing the correct use of AP style. Thus, the answer to RQ3 appeared to be that, while the proposed rubric was far from perfect, it showed markedly more reliability than the A to F scores awarded based on the raters’ individual grading criteria, even with no advanced training in the rubric’s use.

Table 5.

Krippendorff’s Alpha Scores for Items in the Proposed Lead-Writing Rubric.

	95% CI
Item	Alpha score	Minimum	Maximum
Single-sentence lead	.867	0.629	1
Lead of thirty or fewer words	.936	0.809	1
Lead’s first verb expresses the most newsworthy “what”	.010	−0.287	0.307
Lead’s first verb appears in the first half of the lead	.774	0.436	1
Lead’s first verb is active voice	.844	0.636	1
Lead includes, but does not begin with, a “where”	.151	−0.274	0.505
Lead includes, but does not begin with, a “when”	.876	0.690	1
Lead omits attribution, or places it at the end^*	—	—	—
Lead uses correct grammar, punctuation, and spelling	.457	0.132	0.729
Lead uses correct Associated Press style	−.133	−0.628	0.292

Both raters unanimously answered “yes” to this item.

Combined, the scores given by the raters when using the proposed rubric correlated positively and significantly with the students’ scores on the end-of-semester grammar assessment (r = .349, p < .05). The combined scores given by the raters when using the A to F grading showed a .298 correlation with the post-semester grammar assessment, a relationship that fell just short of statistical significance (p = .059). The answer to RQ4, then, is that the proposed rubric performed at least marginally better than the A to F approach as a predictor of student grammar scores.

Conclusion

The results do not add up to a ringing endorsement of either measure’s quality. Results of the grammar and math assessments showed only modest progress by the end of the semester. Students at the start of the semester averaged in the mid-seventies on the overall assessment and on the assessment’s grammar portion but lower—about 66 percent—on the assessment’s math portion. Interpreted in light of the typical academic A to F scale, the overall and grammar averages would be in the “C” range, while the math averages would be in the upper end of the “D” range. Average scores for all three measures increased significantly on the post-semester assessment but remained in the seventies. Assuming at least a minimal level of instructional effectiveness in the courses the students had taken, one might expect valid measures of math and grammar skills to show more impressive degrees of progress. Meanwhile, about half of the proposed rubric’s items yielded subpar Krippendorff’s alpha figures.

Other results, though, suggest that both the grammar and math assessments and the proposed lead writing rubric show promise. The grammar and math assessments produced acceptable Cronbach’s alpha reliability scores, at least by the end of the semester. Furthermore, the grammar and math assessments both showed a number of statistically significant patterns that have rational explanations. On the math assessment, for example, students tended to do well on calculating averages and percentages. They performed poorly, though, on other, perhaps more technical math skills, like calculating a percentage change and calculating the area of a circle. Both questions require knowledge and recall of a fairly specific formula. Also, computing the formulas requires correct application of the order of operations principles. The calculations themselves may have been difficult for some students, too, even though the example was designed to require fairly easy calculations based on numbers rounded to the nearest 10 or 100. Furthermore, both the grammar and the math measures showed nonrandom improvement by the end of the semester—one would expect them to, assuming at least a minimal level of instructional effectiveness. Also, the significant Spearman’s rho correlation suggested that the grammar and math assessment’s components produced stable results over time, an outcome that would be unlikely if the measures had produced mere random results. Finally, the grammar assessment scores correlated significantly and positively with scores from the two instructors’ rubric-based evaluations of students’ straight news leads. One explanation for the correlation is that both sets of scores indicated related aspects of overall writing ability.

Meanwhile, several, albeit not all, of the proposed straight news lead rubric’s items posted good Krippendorff’s alpha reliability measures. Future research may demonstrate that training evaluators in using the rubric can increase reliability scores for the rubric’s problematic components. For example, reliability was surprisingly low for the straight news lead rubric item indicating whether the lead contained, but did not begin with, a “where” element. A review of the students’ leads, however, showed that many had included datelines in their leads. The coders may have arrived at different conclusions about whether the dateline’s location information should be considered part of the lead. The matter would be easy to resolve in training. If nothing else, the approach described in this article provides a standard method for testing such rubrics and singling out their unreliable aspects for refinement. And despite its shortcomings, the proposed rubric proved appreciably more reliable than the “A to F” approach many instructors may default to using in the absence of a rubric. Overall, the results suggest that the proposed grammar assessment, math assessment, and straight news lead rubric may be promising tools for measuring learning in a basic media writing course.

If one accepts that the measures offered reasonably reliable and valid indications of students’ learning levels, the results also offer some practical insights for media writing instructors. Specifically, the examination of improvement on the math and grammar assessments by initial ability level suggests a couple of interesting things about the school’s efforts in its media writing course. First, it appeared that the students who benefited most from the grammar aspects of the course were those who came into the course with the lowest levels of grammar ability. Students with the highest levels of initial grammar ability seemed to receive the next-highest degree of benefit, while there appeared to be virtually no benefit for students in the midrange of initial grammar ability. One interpretation of this finding is that the course tends to bring low-end students up to moderate proficiency more than it tends to improve the skills of midrange to high-end students. Any assessment of student success in the classroom would be remiss if it did not take into consideration larger ramifications of assessment on the teaching of the course assessed. Given that media writing is often a journalism school’s entry-level writing course, this “leveling” effect may be appropriate. However, the apparent lack of benefit for midrange students is troubling. With student success being the driving force at many universities, the results show that journalism schools may need to consider how best to connect with and motivate students who bring an average level of grammar ability to the course while avoiding neglecting students at the extremes of ability level. It also is possible that taking the initial assessment alarmed, or piqued the curiosity of, students at the extremes but had the effect of reassuring students in the midrange that their present skill levels were sufficient.

The significant Spearman’s rho correlation between the pre- and post-semester components of the math and grammar assessments suggested that what improvement there was occurred broadly, across most of the assessment’s measures, rather than only in particular measures. The pattern may suggest a need to focus more instruction on the problem areas identified by the assessment rather than on a broad range of grammar and math skills, some of which students have mastered already. While we are not recommending “teaching to the test,” we are recommending identifying problem areas and teaching the principles students need to know to perform better in those areas.

Limitations

This study does have its limitations. A variety of factors may have accounted for the generally poor performance detected by the math and grammar assessments. Students may not have been motivated to perform their best on the assessments, knowing that the final scores would not affect their grades. Furthermore, shortcomings in the proposed rubric’s clarity, coupled with the deliberate omission of training in applying the rubric, probably provided opportunities for the raters’ varying backgrounds and emotions toward writing to influence their rankings. Finally, the results are not generalizable but suitable for guiding similar efforts at other institutions.

While this study has various limitations, it may point toward a direction for assessment and additional research. Future studies should look toward what criteria educators find important in grading media writing samples, trends in grammar abilities, and examining the use of rubrics in assessing media writing.

Footnotes

Appendix

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes

Author Biographies

Tricia M. Farwell is an associate professor in the School of Journalism at Middle Tennessee State University.

Leon Alligood is an associate professor in the School of Journalism at Middle Tennessee State University.

Sharon Fitzgerald is a lecturer in the School of Journalism at Middle Tennessee State University

Ken Blake is an associate professor in the School of Journalism at Middle Tennessee State University