My adventures in assessment began in 2004 when I was assigned to the General Education Assessment Committee. I have served on that committee since then and have become the co-chair. I have also always done the assessment for the Philosophy & Religion unit (it is part of a larger department formed during the consolidations of the 1970s). As would be expected, assessment for the unit is just an extra smooshed into the infinitely expandable 20% of my Assignment of Responsibility (which also includes advising, research, web mastering, facilitating, publishing, and 4-9 committees).
When I first started in the assessment business, most of it seemed quite arbitrary. For example, we were told by the assessment guru that we needed five outcomes to assess. I still assess five outcomes. As another example, we were told that our goal should be that 80% of Xs do Y; for example, that 80% of students achieve a rating of competent or better in an outcome. When one asked why 80% was picked, the answer was simply that this was what was to be done. To be fair, almost everyone uses 70% as the cutoff for a C grade because that has what has always been done and what everyone does. Which is to say that we use an appeal to tradition and an appeal to common practice.
As is always the case, the years have seen more and more added into the assessment process—I now attend multiple committee meetings, collect data from all the faculty in the unit, and create stacks of spreadsheets with sweet pie charts. I then do an analysis of each outcome, craft improvement narratives and write an extensive reflection. All of this gets loaded into a LiveText system for review by multiple levels of administrators. Its final fate is to be part of various collective reports. While much could be written about any part of this system, the focus of this essay is on the improvement narrative.
As noted above, the performance of the unit is assessed in terms of five outcomes. For example, the written communication skills of students are assessed as part of the written communication outcome. Once the data for the academic year is gathered and properly spreadsheeted, I do an analysis and then an improvement narrative for each assessment method within the outcome. For example, I do an analysis of draft performance relative to final paper performance as part of the written communication outcome and then write an improvement narrative that maps out a timetable and budget for the improvement. The budget part is easy: we do not get a budget for any improvement, so that is a simple copy-paste operation.
The idea of writing an improvement narrative can be reasonable. After all, if the unit is falling short on an outcome or otherwise encountering problems, then it makes sense to develop a plan for addressing these shortcomings or problems. For example, the students tend to do poorly in the area of written communication. One obvious reason for this is that students are generally less prepared in K-12 in writing argumentative papers than they were in the past. As such, the faculty must address this shortcoming in student preparation in order to bring our results up to the 80% target. That said improvement narratives can also be problematic.
One practical concern is that there is a practical limit to improvement. To use an obvious analogy to sports, an athlete can only improve so far before the cost of improvement becomes prohibitive and there is a point at which further improvement is no longer possible. The same is also true in education: even if improvement is possible, the cost of doing so will be prohibitive and hence there will be a point at which improvement is not rational. One example of this is when trying to improve in one area must come at the expense at making another area worse. To illustrate, the more a philosophy professor tries to make up for the K-12 educational shortfall in writing skills the less time they will have to address everything else—such as philosophy. Continuing the athlete analogy, it makes sense that professors also have skill caps—they will reach a point at which they cannot get any better and improvement will no longer be possible. It might be objected that a professor can always improve something and hence there must always be improvement narratives. This leads to the more abstract concern of endless improvement.
As might be imagined, one cannot simply write “we are meeting our goals and need no improvement” as part of the improvement narrative. There must always be an improvement narrative that includes improvement narratives. This assumes the possibility of endless improvement. On the one hand, it could be claimed that there is always room for improvement—unless the professor is a perfect being teaching a perfect class to perfect students. This is obviously true. On the other hand, endless improvement is a practical impossibility. One could even say that there is a seeming paradox here: imperfect beings can always improve, but imperfect beings will always hit their limits. As such, imperfect beings cannot always improve. So, what should be done about improvement narratives?
One approach is to be practical and realistic: focus on actual problems and areas that can be improved while having the freedom to claim that areas where the acceptable levels are being maintained are not going to be improved. That is, to say that the performance is good enough given the available resources and pay. This is what I strive to do and is what good assessment should be like. I take the same approach to my running: I know where I can improve and, more importantly, that time and injuries have put a hard cap on my performance. That is to say that without magic or technology, I’ll never run another 2:45 marathon or a 33 minute 10K.
Another approach is to engage in the narrative for the sake of the narrative: if people must write an improvement narrative for everything, even when they know damn well that there is no way that something is going to be improved, people will craft suitable fictions to appease the administration. On the plus side, this will check all the requisite boxes. On the minus side, it is demoralizing to do this and, to be honest, dishonest. I try to avoid doing this.
One especially interesting thing about improvement narratives is that there is the expectation, as noted above, of endless improvement. However, for most faculty their salary is largely detached from their performance and improvement: they receive no improved compensation or benefits for improving the results of outcomes. The only practical incentive is to stay above the level at which one can be fired for cause and to perform well enough for tenure and promotion; other than that, there is no practical incentive to improve. Most faculty, of course, are motivated by pride and professionalism to do a good job and improve—so it is good that people are not solely motivated by gain. But higher education is being pushed ever deeper into the business model and a key concept in business is that you get what you pay for. So, if constant improvement is expected, then there should also be a corresponding increase in compensation based on this. If such improvement is valued and required, then there should also be adequate support provided to achieve that improvement. Expecting someone to always improve without any improvement in compensation and without any increase in resources is to expect far too much. Now, back to my data analysis and narrative crafting.
“just an extra smooshed into the infinitely expandable 20% of my Assignment of Responsibility (which also includes advising, research, web mastering, facilitating, publishing, and 4-9 committees).”
Our responsibilities are somewhat less defined, but have the same kind of “scope creep”. We have three main responsibilities – teaching, scholarship, and service. Each of those general categories has sub-categories, divisions, and levels. As of right now, “Scholarship” includes a wide variety of activities related to research, publication, conference participation, and (for those in the arts) performance or exhibition. There are various levels – for example, speaking at an international conference carries more weight than speaking at a local or regional one – but the common denominator is peer-review. Self-publication does not count, nor does an exhibit in a gallery for which there is no jury.
Similarly, “service” means service to the department, service to the school, the college, the institute, or the community. Committee work is in that mix – and I’ve served on the portfolio committee (school), the facilities committee (college), and the Academic Affairs committee (institute), and have chaired a local professional chapter of an international computer graphics association (community).
During my tenure process, one of my mentors described this all as “swimming upstream towards a moving target”. I think that by design there is always room for improvement.
“Teaching” is supposed to be evaluated on a combination of factors – peer review, outcomes, student evaluations, and others – but is largely (and erroneously) based on student evaluations. I served for the last few years on a task force charged with the assessment of this student evaluation system, which of course bears with it the responsibility of making recommendations for improvement. The system, regardless of what “flavor” is used, is a nationally accepted system – and there is a constant stream of critical articles published almost monthly, claiming inherent bias, unconscious bias, influence of irrelevant data, mis-interpretation of scores, and a host of other factors that might lead one to believe that the most effective methodology for improvement would be to scrap the system altogether. We can’t do that, of course, because that would leave us with nothing (“So?” we might ask … ), and so the improvement narratives continue.
Your dilemma is a little different, but reminds me of a couple of things. The first is Zeno’s paradox of the tortoise and Achilles. While “perfection” is an ideal, it is impossible if only by decree. Therefore, your improvement narrative must then be incremental by percentage, and therefore infinite. How to measure this, of course, is tricky – especially in fields so subjective as writing or philosophy. Art and design also have this inherent imprecision – maybe more so inasmuch as sometimes quality or innovation is based on breaking the rules we teach. So part of our assessment process is a general look at things like placement or application numbers, or overall portfolio or SAT scores of applicants. While certainly not an exact metric, if we notice that the SAT scores of applicants trend upwards over say, a 10-year period, and the placement numbers and values follow suit, our general assessment is that we’re getting better.
The second is tied up with Jewish thought, and how we expect, accept, and embrace our own imperfection. (Of course, this line of thinking is put forth by those with skull caps, not skill caps, but it still applies). Without it, of course, there is no improvement, no struggle. The thought continues to a more spiritual end, counseling that to strive to absolute perfection is in itself a sin, because it is an attempt to become God.
There is a tradition on Passover wherein during the week leading up to the holiday we clean house – it is an attempt to rid ourselves of chometz, or leaven. This serves a practical as well as spiritual purpose. The practical purpose is akin to spring cleaning – we clean our cabinets and remove dishes, seeking every last crumb of leaven, even though we know the expectation of completion is futile. The spiritual purpose is the same, a kind of mental or spiritual cleansing, an attempt to address personal flaws – which is also futile.
And so the tradition includes a loophole, which is basically the admission of imperfection – “I’ve done everything I can, but I know there’s more to do …”. So we write up a blanket contract, selling all remaining chometz to the rabbi – thus disavowing any ownership of, or responsibility for it.
But I digress.
“That is to say that without magic or technology, I’ll never run another 2:45 marathon or a 33 minute 10K.”
You could always buy a broken watch.
Good points; you should write an article on apply the philosophy of Passover to assessment!
You are right about the standards of assessing teaching. As you note, student evaluations are not good measures–nor is there self evaluation. As part of the unit assessment, I run a survey of the students that includes their self reporting on their competency in the 5 areas we assess. I do the same thing for the GENED courses. As I am sure you have already guessed, about 97-99% of students rank themselves at competent or better in everything, although their performance in the courses tells a rather different tale.
Thanks to the widespread use of chip timing, broken watches are no good. I’d need a magic watch.
How odd – that “self reporting” and “performance metrics” would yield different results!
Whenever I teach a course other than an intro course, or one that is the second or third in a sequence, I routinely give a quiz or take a survey based on the learning objectives of the previous course. The results help me to establish a baseline for beginning my class, or will identify a student or group of students who need some remedial help in order to bring them up to the level of the class.
Occasionally, I will see a trend – where the students emerging from section 01 will perform markedly worse than those in section 02, which could very well point to the effectiveness of the instructors, but this data is not used.
Time and maturity are also unmeasured factors. In my own education, there was one particular required class that I absolutely hated – I hated everything about it. I failed to see the relevance to my education, and I thought the professor was overly defensive about the importance of the subject matter. I think I got a “C” in the class, and I think it was a gift. The professor fared worse in my evaluations.
Today, I look back and regard that as one of the most important classes I’ve ever taken; I refer to it in almost every class I teach in one way or another, and I use what I learned there in my personal work as well. I wish the professor were alive today so I could tell him – but sadly he passed away very shortly after my graduation.
As for the race timing thing, do you remember Rosie Ruiz?
I know; it was an amazing discovery! I should get some sort of prize for that.
Good point. At any workplace, people tend to know who does good work and who does not, but in higher education not much is done to address professors who do a bad job teaching. In fact, there are cases in which the worst teachers are the most lauded professors-as researchers or academic stars. Research is important, but students who are there to learn certainly deserve quality teaching. Then again, if they are just there to check off a graduation requirement then they might deserve a professor who is just there to check off a work requirement before they get back to working on that book on Marx and Facebook.
Somewhere I read that student evaluations were anti-correlated with performance in the next class in the sequence.
Students are not the customers, they are the product.
So who is the customer?
Society as as whole, and also future employers.
“Society”? How do they do that? How does “society” choose what providers are are giving them value for their money? Does “society” really make those decisions or some significantly powerful subset thereof do so? How are future employers, seeing as they are out in the future, getting to make these customer decisions? Time machine? Quantum physics? Seems the die is already cast at that point.
“student evaluations were anti-correlated with performance in the next class in the sequence.
This is not untrue, but it’s far from the whole story. In our investigation and analysis of several thousand classes across multiple disciplines over a number of years, we did find that there was an inverse relationship between the student evaluations of the instructor & course and the level of difficulty of the class. So a difficult class may produce more successful students, but at the expense of the poor professor who has to live with bad evaluations. The converse is also true – sometimes the “easy” professors who don’t prepare their students well for the next course get the better evaluations.
The point is that there are all sorts of factors that affect the data. The raw data is meaningless until it is interpreted – and the elusive “truth” lies in the interpretation.
It is important that a chair or dean is aware of this correlation anomaly – along with all the other flaws in the system – and uses the data as a springboard for conversation more than a final numeric indication of performance. It’s when the numbers are taken as fact without nuance that the system breaks; with perhaps the most alarming result being if and when a professor is passed over for a promotion or raise as a result of the misinterpretation of an inverse correlation like the one you mention.
That does often seem to be the case. Students probably tend to evaluate professors the way they would evaluate a movie, restaurant or video game: how much did they enjoy the experience versus the educational benefit.
Just a snippet. There’s more if any of you are interested.