Another behind the scenes look at assessment. I will be presenting this on November 12.
Introduction
I am involved in assessment in three roles. The first is as the professor who completes the assessment plan and report for Philosophy & Religion. The second is as the chair of the General Education Assessment Committee (GEAC). The third is as a member of the Institutional Level Assessment Committee (ILAC). These roles have shaped my perspective on assessment in useful ways. I see it from the perspective of a person who provides data, but also from the perspective of a collector. I see it primarily from the perspective of a faculty member focused on teaching and research, but I also somewhat understand the administrative perspective. My assessment journey began in 2004, so I have been doing it for a while. But my purpose here is not to expound on my backstory, but to discuss Automated Assessment Instruments.
The Challenges
One fundamental challenge of assessment is getting the needed quantity of quality data. Within this challenge are various sub-challenges. One of these is motivating faculty to provide such data—that is, getting them to buy-into the data collection. If faculty buy-in is not earned, they are more likely to provide incomplete assessment data or even no data at all. They are also more likely to provide low-quality data and might even provide fabricated data to simply get the process over with. De-motivated faculty will tend to provide garbage data and, as the old saying goes, garbage in, garbage out.
A second sub-challenge is that data collection costs resources. While people often think of monetary costs, there are also the costs of time, attention, motivation and so on. To illustrate, even if a faculty member is willing to collect data “for free”, it still takes up their limited time, diverts their attention from such tasks as teaching, and chips away at their motivation.
Since resources are always limited (especially at public institutions), reducing the cost is desirable. But reducing costs is not without its risks and cost-cutting measures often result in both a reduction in cost and quality. And sometimes just a loss of quality. As such, the goal is to develop means of collecting quality data from cooperative faculty at low cost. One way to advance towards this goal is the use of Automated Assessment Instruments (AAI). While talk of automation might lead to thoughts of complicated systems, it is wisest to keep things simple.
Keep It Simple Smarty
Before getting into a discussion of automation, it is wise to start off with the plea to keep your automation process and implementation simple rather than complicated. A complicated assessment system is analogous to the tax code: problematic, convoluted, torturous, difficult, and inconsistent. Complicated systems are a problem because they are challenging to understand and impose an unnecessary cost. Effective simplification makes it easier to understand and lowers the cost. Focusing on proper simplicity also improves assessment quality by focusing effort and using resources more effectively.
Focusing on keeping things simple can yield many positive results. For example, if every faculty member is using different methods and rubrics to assess the same outcome in a course, then this creates a complicated system in terms of sorting out how to compare the results and complete a unified assessment report. If all the faculty agree to use one streamlined automated assessment instrument, then matters are much simpler. As another example, one can also have assessment instruments that are complicated or excessive in scope. I am also including reduction here in the discussion of simplification; one can say KISKIS: Keep It Simple, Keep It Small.
While a systematic guide is beyond the scope of this presentation, philosophy provides some general simplification and reduction principles. Our good dead friend Aristotle provides a fine starting point: ask whether a part of something contributes positively. If not, remove it, “For a thing whose presence or absence makes no visible difference, is not an organic part of the whole.” Parts that have a negative impact should be excised immediately. When erring, it is usually better to err on the side of simplification— but simplification is not without its hazards.
Simplification can sometimes make assessment less useful. For example, while a simplified assessment survey or test is easier to complete than a more complicated one, it collects less data. As a guide to what to keep, Aristotle’s and Confucius’ mark of virtue serves well: one must find the mean between the two extremes. For example, a well-designed instrument or survey balances between the extremes of demanding excessive data and being uselessly simplistic. The instrument or survey should have the right questions, at the right time, for the right reasons and presented in the right way.
A consequentialist approach also serves here: each addition should be assessed in terms of costs and benefits individually and in total. While a part taken in isolation might create more benefit than cost, the cost of the entire system could end up exceed its benefits. In general, having information about student performance is more beneficial than not having it. So, any sensible data collection would appear good when assessed in isolation. But adding all these to the process would have an overall negative consequence: this would impose an absurd burden.
A simplified system need not be simplistic—it could be quite sophisticated. One way to operate a sophisticated system with greater simplicity and ease is with automation, which provides a nice segue into our primary topic.
Automation: Default Participation
As a matter of psychology, people are more likely to stick to a default inclusion when opting out requires effort. An excellent example of this is retirement savings: when employees are automatically enrolled in a retirement plan and must opt out, they enroll in the plan at a significantly higher rate than cases in which employees must opt it to the plan. This generalizes to most human behavior and can be used to increase faculty participation in assessment. While it might appear that I have forgotten about automation and taken up a new topic, the connection between defaults and automation will, I hope, be made clear shortly.
While making faculty participation in assessment the default and requiring them to opt out might result in more participation, the obvious problem is that it is generally much easier to opt out of assessment than participate. So, merely making participation the default might have no positive impact on participation and could cause some resentment—they might dislike the assumption they will do extra work. The fix is to make opting out have more negative value than participating.
As a faculty member, I would never suggest coercing participation. This would annoy faculty and lower the quality of participation. The better solution is to develop means of participation impose a minimal cost to the faculty. Ideally, the participation would also have positive value.
One way of doing this is to have a default in which faculty agree to allow others to gather and assess data from their classes. But this still puts a burden on those who do the gathering and assessment, and these are often other faculty. As with many tasks, an obvious way to make it easier in the long term is to automate it as much as possible. As such, combining default participation with automated assessment can improve faculty participation. The default participation, properly handled, decreases the chances of faculty opting out and the automation lowers the cost of participation. In cases where no effort is required on the part of faculty, participation could be quite good.
While it might be tempting to make even effortless participation unavoidable, faculty should always be given a choice to opt out as a matter of ethics and practicality. In terms of ethics, professors are the moral custodians of their courses and its data and to force them to share the data would be morally problematic—with the obvious exception of final student grades being inputted into the grading system. There is also to practical concern that faculty could be put off by mandatory participation and this could reduce the quality of their participation.
While some faculty will choose to opt out of participation, effective automation can reduce this number. One can, of course, have certain aspects of assessment that are default and others that require opting in—as a rule, the default participation should be for aspects of assessment that take no or minimal effort on the part of faculty. As an illustration, including an AII in standardized class content and having the data collected from Canvas by a designated data collector would be suitable as a form of no-effort default participation.
In the process of discussing automating some aspects of assessment at Florida A&M University, faculty have expressed reasonable concerns about people (or software) poking around inside their classes on the LMS. This was not because faculty had anything to hide (one hopes) but because of reasonable concerns about academic freedom, intrusions into privacy, and worries that such “poking about” might cause glitches or issues. As such, faculty should be informed about such matters and, obviously, such automation should be designed to avoid causing problems. To return to the illustration, faculty would obviously need to be informed that they are expected to use the AII and that someone else will have access to their course to collect the assessment data from it.
Automation: Costs & Benefits
As noted above, one challenge of assessment is the cost. One effective means of reducing this cost while increasing quality is the use of properly constructed Automated Assessment Instruments (AAIs). While an AAI might sound difficult to create and costly to deploy, their creation can be easy for experienced faculty and their cost can be minimal. Best of all, they require little effort to deploy and gathering the data they generate is a relatively simple process on Canvas (or other LMS).
The automated aspect of the AAI refers to the need to be automatic. This can be accomplished by using Canvas’ quiz capabilities to create an instrument that Canvas can score. The assessment part refers to the instrument’s purpose: to assess a specific competence.
A reasonable question is “why should we bother to create AAIs when we can just use existing quizzes?” If you already have quizzes that do all that AAIs are supposed to do, then there is no need to create them: you already have AAIs by another name. But if you do not already have them, AAIs do have advantages that make them worth the effort.
First, while there is an initial cost in creation, their automation makes them “cheap” to operate. Once a suitable AAI is created, it can easily be imported into any appropriate course and assigned to the students. Once the students complete it, Canvas (or other LMS) will score it. Then someone merely needs to collect the results and add them to the data being collected for the relevant competency. While this is likely to be a manual process, there is hope that future improvements will allow for automatic data collection and combination.
Second, an AAI can be crafted to be both specific enough and generic enough for broad deployment while providing useful data—this allows comparing standardized results across the relevant classes. For example, all the philosophy classes I teach can make use of the Argument Basics Assessment Instrument that I use as a direct measure of critical thinking competence. This is because the basics of arguments are universal. Standardized assessments are useful because they allow for meaningful comparisons between classes and the results can be combined into an overall assessment. Obviously, not all competencies allow for such universal AAIs. For example, one AAI for assessing content knowledge across a range of courses would not suffice. To illustrate, the content of my Ethics and Metaphysics classes are different, so I cannot use the same AAI to assess content knowledge in both classes. Fortunately, it is often possible to use similar instruments and adjust the content accordingly. While this makes comparisons between classes less reliable and creates some issues with combining results, if the AAIs are designed in analogous ways, then these issues can be mitigated.
Third, an AAI can be created to assess a specific competency using the appropriate rubric, making it appropriate for assessment. There has long been a saying that “grades are not assessment.” While this has changed in recent years, it is still worth distinguishing between a grade and an assessment. To use an easy and obvious analogy, a paper might have an extra-credit option that can and should impact the grade. But, in general, that extra-credit should not be applied to the assessment of a competency. To illustrate, if a student gets extra credit on a paper for completing a survey for the professor, then that improves the grade but does not improve the competency shown in written communication.
Fourth, pre-made AAIs can be available for easy customization and use in classes. While classes and professors do differ, there will be similarities between similar courses that will often allow for pre-made AAIs to be modified to account for those differences. As noted above, standardization is generally desirable when it comes to assessment.
Fifth, and a bit selfishly, AAIs can be used to easily assess certain GENED competencies for GEAC. In some cases, the competencies assessed by a department/unit will be the same or like those assessed by GEAC. This allows departments to easily contribute data to GEAC and creates opportunities for GEAC to provide departments with pre-made AAIs for assessing GENED competencies.
Looking ahead, an automated system could extract assessment data from classes on Canvas and perform various relevant functions to create useful information. While obviously well within current technology, there is the obvious problem of securing the resources to create such a system. For FAMU and most public institutions, a realistic option is establishing some degree of integration between the school’s LMS and the software it uses for assessment purposes, such as Nuventive, to make data collection and analysis easier. Florida A&M University is currently conducting a test of such an integration. Combining AAIs with this level of integration would automate a significant part of the assessment process, thus lower the cost to faculty.
Automation: Creation
While the creation of an AAI will require content knowledge, there are some general guides. First, just as you should keep the overall assessment process simple and short, you should ensure that your AAI is as simple and short as possible. This, as noted earlier, must be weighed against the negative consequences of simplification.
A more complicated and longer AAI does allow for the collection of more and more accurate data. To use an absurd illustration, if your AAI could ask every relevant question in a competence area, then you would have an excellent picture of the overall competence of the students. But it would probably take the students months to complete the test. Smaller sets of questions are like smaller nets: they will capture less. More concretely, smaller sets of questions can leave out things the student might know or be competent in. They can also leave out things the student might not know or lack competency in. As such, these failings reduce the accuracy of the assessment. But increasing the size or complexity of an AAI can have negative consequences as well.
Longer AAIs take up more student time and, if they are not a meaningful part of the course grade, students might just click through them. AAIs that are too complicated can also have a negative impact by lowering the quality of student performance. Fortunately, professors already have considerable experience in creating tests and quizzes and the same skills can be applied to creating an AAI.
Second, an AAI should have a clearly defined mission. This involves determining the competency to be assessed and the rubric that will be employed in this assessment. The focus should be narrow enough so that it is clear what competency is being assessed, but not so narrow that the data is of limited value as a general assessment. In general, an AAI will tend to have a broader scope than a traditional test or quiz. To illustrate, a test in my Ethics class might focus on specific moral theories rather than being a broad assessment of competence in ethical reasoning. That said, AAIs modeled on traditional tests can be used effectively with due consideration of their purpose. For example, the Ethics test mentioned before would assess specific content knowledge.
Focusing clearly on the mission of each AAI will help avoid swamping the students with assessment assignments and, where possible, these should be used as part of the course rather than mere additions. An integrated AAI also has the mission of being part of the course grade. While, as noted above, it has long been said that grades are not assessment, assessment instruments can be used to generate grades. To illustrate, an exam could be both an AAI for assessing a specific competency in the class and one of the course grades. Such integrated AAIs have two main advantages. The first is that they allow for more assessment without increasing the work students and faculty need to do. The second is that by making the assessment count as part of the grade, students are more likely to take it seriously. In my own experience, when I was testing AAIs and only provided extra-credit for doing them, students seemed to just click through them to get the completion credit. Now that they are part of the class grade, students are taking more time to complete them. As always, the assessments should undergo review and even some beta testing before being deployed for grades.
Third, there should be post and pre versions for each AAI that allow for assessment of student competence upon entering the class and upon completing the class. When my assessment journey began in 2004, there seemed to be little concern about pre/post assessment. We simply gathered up data about various outcomes and reported what we found. This tradition has proven persistent, perhaps because such data has also been deemed adequate. While this data does have value in that it informs us of how the students performed in a competency area, it does not inform us of how much impact we have had on students. Consider, if you will, the following analogy.
Imagine that you are looking for a running coach. You look at the times of the athletes and see that one coach, Samantha, has several runners who run a 5K in under 18 minutes, which is quite good. In contrast, Coach Monica’s runners all run the 5K in the 19–20-minute range. On the face of it, Samantha would seem to be the better coach: her runners are faster than Monica’s. So, you hire Samantha to coach you and are surprised that you do not get any faster. How could that happen? You decide to do some more research and find that Samantha’s runners ran under 18 minutes before she started coaching them and they run about the same after being coached by her. Looking at Monica’s runners, you find that before she started coaching them, they ran the 5K in the 25–30-minute range; so her coaching appears to be very effective—but you would not know this just by looking at the runners’ current times. This also applies to teaching—without pre and post data, we cannot adequately assess the effectiveness of our classes.
Naturally, the goals for outcomes should include specified improvements between the pre and post assessments. This data can also help revise related goals. To illustrate, in Philosophy & Religion we have always used a target of 80% of students being assessed as competent. This is another relic from the beginning of assessment: everyone was told to assess five outcomes and achieve a result of 80% in each. Because reasons? We often fell short of this arbitrary goal in some competency areas, reported our failure and worked on our improvement narrative. When I started doing pre/post assessments, we found some rather important facts. The first was that students improved significantly in the competence areas—thus showing the classes were more successful than they seemed. The second was that students, obviously enough, varied in their starting competence from class to class and semester to semester. This suggested that having a seemingly arbitrary fixed percentage goal is a mistake—it is better to focus on student improvement. This will, of course, improve the overall results as well. But it is worth taking away the advice that outcome goals should be regularly reconsidered and recalibrated.
Fourth, the AAI should ideally include an extensive bank of questions to randomly select from—to minimize student ability to “cooperate” on the AAI. This does take time, but it can be done as an ongoing project. It is also a good idea to update the regularly AAI based on information gathered from student performance. For example, this can reveal questions that are too difficult. Or too easy. While this is but a starting guide, it is my hope it will prove useful. Of course, even the best guide is useless if no one wishes to use AAIs. This takes us to the matter of motivation.
Motivation
While AAIs make assessment easier, there is still the challenge of motivating faculty to use them. And to be active in assessment in general. As a faculty member, I have found a pragmatic appeal can be useful, if blunt. Florida A&M University is subject to performance-based funding. Schools are rewarded and punished based on their assessed performance. While somewhat hyperbolic, my stock line is that participating in assessment impacts the continued employment of faculty. This can be effective.
All accredited schools must undergo the accreditation process. Assessment is now a key part of this process. So, I can honestly tell faculty that their continued employment depends on faculty participation. Threatening faculty would be counterproductive and unethical. But clearly and honestly presenting the stakes can be a great motivator. Fortunately, there are also positive options.
Traditionally, faculty could be motivated by appeals to unpaid extra work for the good of the students. However, the ascendence of the business model has weakened these traditional motivators. As a practical matter, motivation must be considered within the expanding conception of the university as business. Universities have consciously embraced this model and profess to see themselves more as brands and less as institutes of learning and research. Assessment itself is often seen as an intruder from the business world.
In this context, money (or release time) is the most obvious motivator. If resources are available, this would be the best solution to the motivation problem: compensate those working on assessment adequately and assessment will (probably) improve.
Unfortunately, most universities lack the resources or desire. Fortunately, there are low or no cost motivators. These are used in business and other contexts when someone wants to motivate but is unwilling to provide adequate compensation. Or, less cynically, when people want to show their gratitude in non-financial ways.
The GEAC committee members discussed this matter and proposed some options. An obvious free motivator is a sincere expression of gratitude. Faculty sometimes just want to be appreciated. This is, quite literally, the least that should be done.
One low-cost motivator is the certificate of appreciation—this puts the expression of gratitude into a tangible and visible form. There is a risk that over doing it can make them meaningless or even a joke.
Another low-cost motivator is a letter evidencing service. These are in demand by faculty going up for tenure and promotion and provides a compensation that faculty value. In the case of GEAC, explicitly offering such letters for service has proven to be an effective recruiting tool. On the downside, they only work for faculty seeking tenure and promotion and some faculty drop out afterwards.
Digital badges have become quite popular; they are modeled on achievements in video games and rely on a similar psychological mechanism. Some represent skills and accomplishments and could be seen as icon or emoji versions of resume entries. In the case of AAIs, there could be suitable badges for creating and updating them. Given the current popularity of badges, they are worth considering. These could be created and distributed within the university—essentially digital icons performing the same role as certificates. There are also services that offer badge systems—although these often involve a subscription cost. Badges might be a passing fad—or they might be like Pokémon—something that will endure, and people will want to catch them all. Yes, I have often suggested creating Assessémons to incentivize faculty.
Conclusion
While AAIs do have an up-front cost, when properly constructed they can significantly reduce the cost of assessment for faculty. The main assessment challenge is designing instruments that properly assess the students in the relevant competency areas. The main academic challenge is integrating them meaningfully into courses. The main pragmatic problem is motivating faculty to create and use them. Fortunately, all these problems can be solved to the benefit of assessment, faculty, and students.