While college students have been completing student evaluations of faculty since the 1960s, the importance of these evaluations has increased. There are various reasons for this. One is a conceptual shift towards the idea that a college is a business and students are customers. On this model, student evaluations of faculty are part of the customer satisfaction survey process. Second is an ideological shift in education. Education is seen more as a private good and in need of quantification. This is also tied to the notion that the education system is, like a forest, worker, or oilfield, a resource to be exploited for profit. Student evaluations provide a cheap method of assessing the value provided by faculty and, best of all, provide numbers.
Obviously, I agree with the need to assess performance. As a gamer and runner, I am obsessed with measuring my athletic and gaming performances and I am comfortable with letting that obsession spread into my professional life. I want to know if my teaching is effective, what works, what does not, and what impact I am having. Of course, I want to be sure the assessment methods are useful. Having been in education for decades, I do have concerns about the usefulness of student evaluations of faculty.
The first and most obvious concern is that students are, almost by definition, not experts in assessing education. While they obviously take classes and observe faculty, they usually lack any training in assessment. Having students evaluate faculty could be seen as on par with having sports fans assessing coaching. While fans and students can have strong opinions, this does not qualify them to provide meaningful professional assessment.
Using the sports analogy, this can be countered by pointing out that while a fan might not be a professional at assessing coaching, they usually know good or bad coaching when they see it. Likewise, a student who is not an expert at education can still recognize good or bad teaching.
A second concern is the self-selection problem. While students have access to the evaluation forms and can easily go to Rate My Professor, those who take the time to complete the evaluation will usually have stronger feelings about the professor. These feelings can distort the results so that they are more positive or more negative than they should be. The counter to this is that the creation of such strong feelings is relevant to the assessment of the professor. A practical way to counter the bias is to ensure that most (if not all) students in a course complete the evaluations.
Third, people often base their assessments on irrelevant factors. These include such things as age, gender, appearance, and personality. The concern is that these factors makes evaluations a popularity contest: professors that are liked will be evaluated as better than professors who are not as well liked. There is also the concern that students tend to give younger professors and female professors worse evaluations than older professors and male professors and these sorts of gender and age biases lower the credibility of evaluations.
A stock reply to this is that these factors do not influence students as strongly as critics might claim. So, for example, a professor might be well-liked yet still get poor evaluations in regards to certain aspects of the course. There are also those who question the impact of alleged age and gender bias.
Fourth, people often base assessments on irrelevant factors about the course, such as how easy it is, their grade, or whether they like the subject. Not surprisingly, it is commonly held that students give better evaluations to professors who they regard as easy and downgrade those they see as hard.
Given that people often base assessments on irrelevant factors (a standard problem in critical thinking), this is a real concern. Anecdotally, my own experience indicates that student assessment varies based on irrelevant factors they explicitly mention. I have a 4.0 on Rate my Professors, but there are inconsistencies between evaluations. Some students claim that my classes are incredibly easy (“he is so easy”), while others claim they are incredibly hard (“the hardest class I have ever taken”). I am also described as being both very boring and very interesting, both helpful and unhelpful and so on. This sort of inconsistency in evaluations is common and raises concerns about the usefulness of such evaluations.
But it can be claimed that the information is still useful. Another counter is that the appropriate methods of statistical analysis can be used to address this concern. Those who defend evaluations point out that students tend to be generally consistent in their assessments. Of course, consistency in evaluations does not entail accuracy.
To close, there are two final general concerns about evaluations of faculty. One is the concern about values. That is, what is it that makes a good educator? This is a matter of determining what it is that we are supposed to assess and to use as the standard of assessment. The second is the concern about how well the method of assessment works.
In the case of student evaluations of faculty, we do not seem to be very clear about what we are trying to assess nor do we seem to be entirely clear about what counts as being a good educator. In the case of the efficacy of the evaluations, to know whether they measure well we would need to have some other means of determining whether a professor is good or not. But, if there were such a method, then student evaluations would seem unnecessary because we could just use those methods. To use an analogy, when it comes to football, we do not need to have the fans fill out evaluation forms to determine who is a good or bad athlete: there are clear, objective standards in regards to performance.
