Abstract
This article looks at how marketing student ratings of instructors and classes on online rating sites such as RateMyProfessor.com can be biased by prior student ratings of that class. Research has identified potential sources of bias of online student reviews administered by universities. Less has been done on the sources of bias inherent in a ratings site where those doing the rating can see prior ratings. To measure how student online ratings of a course can be influenced by existing online ratings, the study used five different prior ratings experiment conditions: mildly negative prior ratings, strongly negative prior ratings, mildly positive prior ratings, strongly positive prior ratings, and a control condition of no prior ratings. Results of this study suggest prior online ratings, both positive and negative, do affect subsequent online ratings and bias them. There are several implications. First, both negative and positive ratings can have an impact biasing subsequent ratings. Second, sometimes negative prior ratings must be strong in valence in order to bias subsequent ratings whereas even mildly positive ratings can have an impact. Last, this bias can potentially influence student course selection.
Keywords
Introduction
Online reviews and review sites have an important impact on consumers and businesses. Yelp alone had 115 million cumulative reviews by November 2016 (Smith, 2016). Consumers increasingly rely on electronic word-of-mouth (e-WOM) for their purchasing decisions, so ratings can have a major impact on sales (Chevalier & Mayzlin, 2006; Clemons, Gao, & Hitt, 2006; Liu, 2006). Similarly, in marketing education, online reviews can influence student choice of classes and instructors.
Some research has been done on the efficacy and potential sources of bias of online student reviews administered by universities (Estelami, 2015; Guder & Malliaris, 2010, 2013). Less research has been done on the impact of allowing students to rate instructors/courses after seeing prior ratings as on third party rating sites. Research has looked at RateMyProfessors.com, with some studies finding bias. For example, Legg and Wilson (2010) found that students who voluntarily went online to rate their professors gave them poorer evaluations than those who had been prompted to do so. On the other hand, others like Otto, Sanford, and Ross (2008) found that the scale items on RateMyProfessor.com, helpfulness, clarity, and ease, were not biased measures of student learning. This study seeks to shed light on another area of potential bias, the degree to which recent online student reviews on sources, such as RateMyProfessors.com, can influence subsequent reviews. Implications for the validity of online reviews are suggested.
Online Product Reviews and Valence
Online product reviews are an important source of information for consumers shopping both online and in non-online environments (Chevalier & Mayzlin, 2006). Such reviews are done by proactive online consumers who create large amounts of information that influence other consumers’ decisions. There are many sites, such as Amazon.com and Yelp, and many more specialized sites that act as guides for consumers for specific categories of products and services. Such reviews can be found in places as diverse as retailing sites online, message boards on websites (Chiou & Cheng, 2003), online social networks (J. Brown, Broderick, & Lee, 2007), and virtual online communities (Bagozzi & Dholakia, 2002). Online consumer reviews of products have become a starting point for many consumers in their search for products and in fact are actually useful in predicting sales of entertainment (Dellarocas, Zhang, & Awad, 2007).
Such online consumer reviews extend the scope and reach of WOM and they are in fact a great source of information for e-WOM, providing information as well as reviews and opinions about products and services (Chatterjee, 2001). WOM is in general more trusted than marketer-generated information, such as advertising, but it is limited by how many people individual consumers have contact with. By contrast, consumer online reviews have a potentially unlimited audience. As with in-person WOM, consumers have a variety of motives for writing online reviews that range from altruistic to self-interested (Hennig-Thurau, Gwinner, Walsh, & Gremler, 2004). Some of the reviews are generated by the marketers of the products. It is sometimes hard for consumers to know for sure how credible such e-WOM is from online reviews.
As a result of this anonymity, consumers may not trust online reviews as much as in-person WOM since they are usually not familiar with those writing them. Racherla, Mandviwalla, and Connolly (2012) found that greater perceived similarity and argument quality of reviewers will lead to greater trust by consumers. The valence of the review may also have an impact. One negative or positive review may not have an impact because it will tend to be attributed to the reviewer, but repeated valenced reviews will influence consumer views (Kim & Gupta, 2012).
Such valenced reviews also serve an informational purpose. Repeated good reviews will be viewed as a source of positive information about the product or service, suggesting it as a good means to fulfill various types of needs, just as numerous negative reviews serve as a warning to consumers about it. Negative information can lead to greater credibility of the review when the reviewer’s identity is known (Kusumasondjaja, Shanka, & Marchegiani, 2012).
Instructor Online Reviews
Just as products and services are rated, instructors are rated as well. There are teaching evaluations done by universities themselves that rate instructors and courses. Online surveys can save cost entering data for administration and are more convenient for students who can provide more extensive comments (Guder & Malliaris, 2010), but administering them raises new challenges, such as timing (Estelami, 2015) and response rates (Guder & Malliaris, 2013).
This research extends these findings by looking at what happens when instructor and course ratings are available online for students to see and rate their instructors. Certainly one benefit of allowing students to see instructor ratings online is that it will provide them information necessary to make good decisions. The information would also preempt less accurate information from rumors or other less professional sources.
Are Online Reviews Accurate?
Given the impact of online reviews/e-WOM, an important question is the issue of veracity. How accurate are these reviews? Are they subject to influences other than just the reviewer’s experience with the product or service?
There are external instructional ratings sites, such as RateMyProfessors.com, fitting into this category that provide students with the ability to both rate instructors and see the ratings of others. Such rating sites are commonly used by students. M. J. Brown, Baillie, and Fraser (2009) found that about 40% of the overall variance in university-administered student teaching ratings could be related to the variance in teaching ratings on RateMyProfessors.com. Valence is an important part of ratings, the vast majority are either positive or negative, which suggests that just as in product and service ratings, students come in with strong valenced feelings about their experiences in a class (Hartman & Hunt, 2013).
Sridhar and Srinivasan (2012) found that prior online ratings did have a moderating effect on the positive and negative aspects of a product or service experience, affecting the reviewer’s rating. This suggests that reviews can be influenced by previous reviews in ways that in turn influence overall perceptions. The question is how this process works. This article suggests two potential processes by which prior online reviews can affect a reviewer’s rating, modelling behavior and psychological reactance.
Social Modelling
On the one hand, there should be a general tendency for the valence of prior ratings to influence a reviewer’s rating in the same direction. Social modelling suggests that reviewers will use existing ratings as a cue for how they should rate the product. Research has found that families, peers, and other reference groups can have an influence on purchase decisions (Bearden & Etzel, 1981; Childers & Rao, 1992). In an actual experiment, Bevelander, Anschutz, and Engels (2010) found social modelling affected food purchases at a supermarket. What peers purchased in the store influenced the types of products purchased by subjects.
Research on polling finds a similar process. Polls that show majority support for one or another side of an issue can influence public opinion through what is known as the “bandwagon effect” (Marsh, 1985; Rothschild & Malhotra, 2014; Simon, 1954). For example, in an experiment, Marsh (1985) found that perceived public opinion had an influence on attitude toward abortion laws. Later, Rothschild and Malhotra (2014) found bandwagon effects changed opinion to conform to perceived consensus, especially for issues in which subjects did not have very strong prior opinions.
Similarly, social modelling suggests that the valence of ratings others have given to a product or service online should influence subsequent ratings. On an online course rating site, the comments and especially the rating of a course could provide information that would guide students providing new ratings. If there are predominantly negative ratings, there should be a negative influence on new ratings.
Evaluations of various aspects of the instructor and the course can also be influenced by the valence of the prior ratings. Evaluation of the helpfulness or clarity of the instructor or overall perceived satisfaction with such a course can be affected by negative information from ratings and also by positivity as well (Larsen, Smith, Kyle, & Cacioppo, 1998). Thus,
On the other hand, if there are predominantly positive ratings, the ratings of online reviewers should be influenced to be more positive. Thus,
There may also be circumstances in which an opposing process affecting the valence of online ratings interferes with the peer influence of social modelling. Sometimes raters are either not influenced by existing ratings or perhaps even influenced in the opposite direction by the valence of existing ratings. Comments on ratings sites mention “I don’t understand all the (positive/negative) comments” or how “unfair” are the existing comments.
Psychological Reactance
Reactance theory suggests that when freedom to make a certain choice or engage in a behavior is threatened, that threatened choice or behavior becomes more attractive (Lessne & Notarantonio, 1988). The classic example of reactance is increased desire in children when they are told they cannot have something (Rummel, Howard, Swinton, & Seymour, 2000). One of the key aspects of the reactance effect is that it is greater as the pressure to comply increases (Brehm & Sensenig, 1966). Fitzsimons and Lehmann (2004) found that unsolicited recommendations by experts contradicting a respondent’s initial impressions lead to reactance, which led to a behavioral backlash and behavior contradicting the advice. These findings suggest that prior online ratings that strongly contradict a reviewer’s initial impression will influence his or her rating in the opposite direction.
Applying this to student ratings of courses, the rating of students who see strongly negative or strongly positive reviews prior to rating may be more likely to experience reactance than if there were mildly negative or positive reviews. Rather than social modelling, these prior reviews may trigger a backlash, leading students to view strongly negative ones as “mean” or strongly positive ones as exaggerated. As a result, such ratings may have less of an impact than moderately valenced ratings. If there is reactance, the overall valence of respondents will similarly influence evaluation of all of the various aspects of the instructor and course, so
Study
This study examines the process of how valence of prior ratings affects subsequent ratings and tests the prior hypotheses. Specifically, it attempts to determine if students will model their ratings and evaluations of a course after those that rated previously. A total of 162 usable responses from undergraduate marketing students from a large public university in the southwestern United States were collected. There were 32 in the mildly negative ratings condition, 31 in the strongly negative ratings condition, 32 in the mildly positive ratings condition, 38 in the strongly positive ratings condition, and 29 in the no ratings control condition.
Respondents were asked to view a video clip, a little more than a minute in length, of a lecture for an advertising course. This method was used because video clips of even 30 seconds in length were found to be predictive of end-of-the-semester teaching evaluations, so this stimulus offered a realistic student evaluation of teaching (SET) stimulus (Ambady & Rosenthal, 1993). The exact same video was shown to respondents in all experimental conditions. Respondents were first given a manipulation check to see if they had viewed the video that asked them to identify the topic of the course, accounting, advertising, or management. Responses from respondents who failed to correctly identify the course were discarded. They were then asked to think about this course and lecturer and make evaluations.
Respondents were shown prior ratings that they were told were the latest compiled from those of students who had recently completed the questionnaire. These ratings ranged from 1 to 5, 1 = terrible and 5 = outstanding. Students in the mildly negative ratings section were shown prior ratings that averaged 2.75, and in the strongly negative ratings section the ratings averaged 1.5. Students in the mildly positive ratings section were provided with ratings that averaged 4.0, and in the strongly positive ratings section the ratings averaged 4.75. Students in the control condition were shown no ratings.
Students were then asked to provide their own rating of the class, giving it between 1 star (terrible) to 5 stars (outstanding), a rating system used on RateMyProfessors.com. They were then asked a measure of the percentage chance they would recommend this class to other business school students, commonly used to rate products and services (Kozak, Rimmington, & Kozak, 2000). The last of the introductory questions asked students to provide three words that they felt best described the class to provide qualitative information about their perceptions. This manipulation imitated the way the most recent ratings on online rating sites are the most visible to students.
Students then completed an online questionnaire. To be realistic, course-related measures were similar to those found in RateMyProfessors.com and were informed by instructor rating measures used in other educational research studies (Ackerman & Gross, 2010; Sidelinger & McCrowsky, 1997). There were three-item measures asking how easy (M = 4.15, α = .78), clear (M = 4.89, α = .92), and helpful (M = 4.80, α = .93) students felt the instructor was. There was also a measure of student satisfaction with the class (M = 4.85, α = .96). To examine discriminant validity, the dimensionality of the items was tested through confirmatory factor analysis. The analysis found a five-factor model to have the best fit. This five-factor model had a goodness-of-fit index of .93, a root mean square residual of .12, and total explained variance of 86%.
Measures related to perceptions of the instructor and class loaded cleanly onto five factors using varimax rotation. Questions a1 to a3 measured the helpfulness of the instructor with loadings of .71 to .73. Items a4 to a6 pertained to clarity of the instructor with loadings of .81 to .85. Questions a7 and a9 measured the degree to which students felt the instructor’s course would be easy with factor loadings of .71 to .88. Last, perceived satisfaction with the course, measured by Questions a10 to a12, had loadings of .86 to .88. These factors are displayed in the appendix.
Last, there were measures assessing the students themselves. There was a three-item measure of self-efficacy (Gist & Mitchell, 1992; Gist, Stevens, & Bavetta, 1991) regarding assessment of a business course (M = 5.72, α = .96), Questions b1 to b3. There was also a three-item scale (M = 3.18, α = .94) to measure students’ outlooks toward performance and ability. This scale included items for “A high score on this scale indicates that people believe that ones’ abilities are fixed and cannot change (an entity view) while a low score indicates the belief that abilities are malleable and can change (an incremental view)” (Dweck, 2005; Dweck & Leggett, 1988), Questions b4 to b6. Last, there was a three-item measure (M = 4.86, α = .82) of need for cognition (Chang & Yen, 2013; Haugtvedt & Petty, 1992), Questions b7 to b9.
These measures are displayed in the appendix. Afterward, students were debriefed regarding the nature of the study.
Results
A MANCOVA of prior ratings (“mildly negative,” “strongly negative,” “mildly positive,” “strongly positive,” and “none”) on the dependent variables was done to determine basic reactions of students to prior ratings they were exposed to. Self-efficacy regarding marketing was used as a covariate, allowing researchers to remove the bias that is caused by the variable of self-efficacy regarding marketing (Field, 2013). This analysis found significant differences for ratings as well as for several of the evaluations of the instructor and the course (F[5, 126] = 5.42, p < .01, Wilks’ Lambda = .84, partial eta-squared = .16). The results are displayed in Table 1.
Means for Prior Ratings (n = 162).
Note. Means with completely different superscripts are significantly different from each other.
Impact on Ratings
Results of analysis of means for the chance that students would recommend the class to others found, on the one hand, that that means were significantly higher in the strongly positive condition and significantly lower in the strongly negative condition, but not significantly different between the mildly negative, mildly positive and control conditions (

Course rating by condition.
Similarly, the chance of recommending the course to others was significantly higher than the means in the other conditions in the strongly positive condition and significantly lower in the strongly negative condition, but not significantly different between the mildly negative, mildly positive, and control conditions (
Impact on Evaluations of the Instructor and the Course
For the perceived ease of the course, the means for mild and strong positive prior ratings were not significantly different from each other but were significantly higher than the means for mild negative ratings. Perceived clarity of the instructor was highest when there were either mild or strongly positive ratings and lowest when there were strongly negative ratings. There were no differences between the mildly negative and control conditions (
Qualitative Results
The descriptive words students wrote about the course were counted by condition and tabulated in order from most frequent to least frequent. The five most frequently mentioned words for the mildly negative ratings condition, in order, were “boring, interesting, useful, difficult, informative.” The five most frequently mentioned words for the strongly negative ratings condition, in order, were “boring, easy, confusing, dull, not helpful.” The five most frequently mentioned words for the mildly positive ratings condition, in order, were “interesting, practical, hard, easy, informative.” The five most frequently mentioned words for the strongly positive ratings condition, in order, were “interesting, practical, informative, time consuming, insightful.” Last, the five most frequently mentioned words for the control condition, in order, were “informative, interesting, boring, interactive, common sense.”
Discussion
The findings of this study suggest that social modelling has a significant impact on online ratings of sites like RateMyProfessor.com, with a few visible positive or negative ratings biasing subsequent ratings. This is bad news for those concerned about the impact of those few students who would use such sites for revenge for a bad grade. Subsequent ratings will fall in line with such negative ratings and influence overall averages for a long time in a vicious cycle. On the other hand, a few good ratings can also have long-term impact on overall ratings as well in what could be thought of as a virtuous cycle. This suggests that online rating results can be gamed by either individual raters or by the individual being rated, resulting in significant inaccuracies of the results.
These results support findings of research in political polling, which find that polls not only reflect but can also influence public opinion through a bandwagon effect. Seeing polls that find a majority of people believe one way or another on a certain issue can influence others to fall in line as long as they do not have strong prior opinions about that topic. The magnitude of this impact can depend on the strength and presumably valence of prior opinions (Marsh, 1985; Rothschild & Malhotra, 2014).
A second important finding is that social modelling hay have a different impact depending on whether ratings are positive or negative. It was more often true that mildly negative ratings did not have a significant impact on the dependent variables and that only strongly negative prior ratings did. By contrast, for most of the dependent variables, both mild and strong positive prior ratings influenced responses. This suggests that those doing online teaching ratings may discount mildly negative prior ratings for whatever reason, maybe perceiving them as less informative, until they reach a certain level of negativity on which it has a big influence on their ratings. Respondent reaction to positivity in prior ratings was similar for the overall rating, no impact for mild but a strong impact for strong positivity, but even mild positivity influenced perceptions of the instructor. Perhaps there is a tendency to discount mild negativity as the norm in rating people, especially for students rating instructors after grades are turned in, but positivity is a believable cue that has more impact.
A third finding of this study is that, at least for evaluations of instruction, lecture, and people in general, the above-mentioned results are remarkably robust. The results were invariant to personality and outlook of student. Social modelling affected students in exactly the same way regardless of implicit theory, need for cognition, or self-efficacy about classes. Regardless of high or low levels of these personal attributes, student ratings were influenced by prior ratings, sometimes with large differences from the control condition.
This study also found that, outside of the ratings, various aspects of perceptions of classes or instructors can also be influenced merely by the positivity or negativity of the ratings of others. The perceived clarity, helpfulness of the instructor, and even satisfaction with the course are influenced by the valence of prior ratings. The more negative influence the prior ratings, the more difficult respondents perceived the course to be. Given that all of these aspects of a course are factors in student choice, prior ratings bias can have a significant influence on enrollment in a course.
Last, pretests suggest that the results of this study may be more applicable when people are not involved at all. A pretest asking for an evaluation of a new course syllabus, no lecture or person involved, found that when students saw unusually high or unusually low ratings, they did not react as much as when they saw less extreme ratings. On the other hand, even in this pretest, the ratings in the strong negative and positive prior ratings conditions were still lower and higher, respectively, than in the control conditions, suggesting that social modelling still has an impact even when there is no human aspect of what is being rated.
Limitations
This study is limited in that although students had time to make a judgment of the lecturer in the video clip, they have not had the time to make an emotional attachment over the course of a semester that may mitigate the effects of social modeling. Perhaps future research could find a way to examine the impact of prior ratings on the ratings of real classes and real instructors. Also, interviews or verbal protocols would be helpful to look more in-depth at the process.
This study is also limited in that subjects were not given the option to opt out of rating the instructor and class. It is likely that under some conditions, students would choose to just not provide their rating. Many instructors on RateMyProfessors.com are only rated by one or two people a semester at most. Future research needs to look at who rates on online rating sites and perhaps also at the influence of prior ratings on the propensity to make an online rating. Future research could also look at the influence of prior ratings on the propensity of students to rate on a rating site. Perhaps extreme ratings, especially negative ones, discourage them from rating a class. On the other hand, ratings in the middle may increase the propensity of students to do an evaluation on an online site.
Implications
This study has public policy implications for dissemination of information about marketing courses and instructors. Some business schools and universities provide student-evaluation information about courses online. The results of this study suggest that such evaluations may help combat the inaccuracies of third party rating sites, such as RateMyProfessors.com. Such university-administered SETs are collected from all students and are collected before course grades are turned in, which means that retaliation is not an issue, so they are going to be more accurate.
The results also have implications for the validity of third party rating sites that allow evaluation of marketing instruction. The negative implications are that teaching ratings sites have low validity in general and are easily influenced by factors that have no relation to actual perceptions. A few bad or malicious reviews can influence subsequent evaluations on that same site for some time to come. Conversely, marketing instructors can rely on past positive evaluations to boost ratings of future teaching performance.
These results have implications for research in online evaluation not just for education but also for products and services. Angry consumers or even competitors could cause significant damage to a firm’s reputation by posting a series of negative ratings. They could lower future evaluations for a period of time, which could in turn influence other evaluations as well, creating a domino effect. It is also possible that strongly negative ratings could prevent those who would otherwise post positive ratings from doing so. Conversely, a series of positive ratings could help a firm’s reputation above and beyond just the impact of those ratings alone.
These results also have implications for polling research in the impact of valence. It is likely that polls showing majority support for an issue or person may have a different impact on public opinion than those that show majority disagreement. As in online ratings, it is possible that negativity may require a higher threshold than positivity to create the bandwagon effect observed in that literature.
Last, this research also has implications for the relatively recent phenomenon of online mobs on social media and rating sites. It suggests that such attacks, often with hundreds of extremely negative ratings in a short period of time by those who may or may not have directly experienced what they are rating, can present an inaccurate view and do long-term damage to the reputation of a person or business. Results find that after significant numbers of negative ratings on an online rating site, subsequent ratings are not going to go back to normal after a while. For whatever reason, people will model, not resist, the online mob sentiment. Given these findings, maybe such sites might need to consider starting ratings anew every year or half year to combat such abuses.
Footnotes
Appendix
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
