When I began my academic career, most of my effort focused on teaching, research and advising. Over the years, administrative tasks have devoured more and more of my time. One of the most ravenous beasts in the administrative pack is assessment. This data is now critical for accreditation and an essential part of how state funding is allocated under the punitive performance-based funding system that the state has imposed on public higher education.
Back when assessment became part of the bureaucracy, the main goal was to fill binders with paper containing outcome data. After several years of this, there was a realization that outcome data by itself did not show the value added by the educational process. To use an obvious analogy, if you just looked at the times of a cross country team at the end of the season with a new coach, you would be hard pressed to determine the impact of the coach. Because of this, measuring improvement eventually became a thing to do.
As part of the current assessment project, I am required (but not actually paid—it is not part of my official assigned responsibilities) to collect data from the classes in philosophy and religion that measures improvement in the courses. To this end, I have used an obvious approach: assessing draft papers and comparing them to the final papers. However, I also need other statistical data and the more objective the better. To this end, I created an Argument Basics Assessment Instrument (ABAI, because bureaucracy loves acronyms) which is being developed into a standardized “test” to be used in the classes. The objective of the ABAI is to determine student starting ability and their ending ability in the course. This approach is obviously not perfect, since the improvement could be due to other factors—such as skills and knowledge acquired in other courses. There are also some problems inherent to before and after assessment in a class.
One obvious problem is motivation. If the students know the assessment device does not count towards their grade, they are far less likely to take it seriously or even do it. This will tend to bias the results towards poor performance. If the device is used as part of their grade, then there are many concerns. Cheating becomes a factor, which can make the data less accurate. Also, students are more likely to try to prepare for such assessment, which can throw off the results when it is intended to get before data (and even after data). For example, if the first ABAI counted as a grade, then students might study argument basics before taking the ABAI and the assessment would reflect studying rather than their starting point. As such, the challenge is a dilemma: if the assessment does not count towards the grade, then students will tend to not take it seriously and this will skew the results in a negative way. If the assessment does count towards their grade, students would be more inclined to prepare for the assessment and perhaps even cheat, thus skewing the results. One, perhaps morally questionable approach to solving this problem, is to surprise students with the assessment and not reveal whether it counts or not. Another approach is to be honest with the students and note in the assessment report that the results will be biased based on the approach taken.
Another obvious problem is that if the assessment devices count towards the grade, then the “before” assessments will tend to have poor results which could hurt the students’ grades. A solution is to scale the first assessment, but this can cause the students to take it less seriously. Another solution is to count only the “after” assessment, but this runs into the same problem. In the case of paper drafts used for before/after assessment, students generally tend to put effort into the drafts when they are optional. But this tends to skew the results because the better students tend to do drafts. When drafts are mandatory but do not count towards the grade, then many students tend to not take them seriously, which skews the results.
The cynical approach is obvious: simply pick the method that generates the sort of data the bureaucracy wants (which is usually that X% of students improve Y%). A less cynical approach is to test various methods to see what seems to yield the most accurate results—but there is the obvious problem of calibrating the system. After all, assessing the assessment leads to the sort of infinite regress that plagues epistemology: how do you know the method is accurate? If you use a method to assess the method, then one must ask how you know that method is accurate. And so on, into infinity. Not surprisingly, the bureaucracy tends to avoid peering to deeply under the hood.