Perverse incentives in assessment

Marking, by Terry Freedman

Marking, by Terry Freedman

What is a perverse incentive in assessment?

Some years ago I heard that a university had hit upon a great scheme to bring lagging students up to speed. If they failed an end-of-topic test they were given a set of notes to help them revise and correct their misconceptions. Those who passed the test were assumed to not need such revision aids, and so were given nothing.

Well, you can probably guess what happened next. The good students realised that they would be better off failing the tests than passing them. That way, they would be able to build up a huge ring binder full of revision notes without expending any effort whatsoever.

This is an example of what Dylan Wiliam, in an answer to a question I asked him at a recent conference, called a perverse incentive.

Help required – or not

The question I asked related to something I discovered while working at the Qualifications and Curriculum Authority, the on-screen test for assessing ICT capability. The main part of it was scenario-based, and we had built in help that would come on automatically when the computer program decided that the length of student inactivity was such that assistance was probably required. In some cases we found that the students who were not so good at ICT found the help intrusive, because it interrupted their thinking while they were writing notes on paper to work out their next steps. The better students, however, sometimes used the system to obtain further ideas about possible solutions.

The students may or may not have realised that having the assistance incurred "penalty points", ie their overall score was less than it would have been had they not received any help. (To be honest, I can't remember if we told them.)

Should collaboration be penalised or rewarded?

I have to say that I now think this is a two-edged sword: on the one hand, it seems right to give more credit to students who come up with a solution without help, from an assessment point of view. But on the other hand, in real life it doesn't matter how you reach a solution, as long as you get there in the end. Indeed, as the new emphasis by the OECD on testing collaboration suggests, getting help from one's peers is regarded as a good thing. This is important: if you are assessing students' knowledge, skills or understanding per se, then helping them get a right answer is bound to reduce your confidence in their abilities. However, if you are assessing them in the context of group work, you want them to help each other rather than work on their own. But I digress.

What an economist would say

It seems to me that it is very hard to design systems that respond to students' need for extra help without their acting as a "perverse incentive". In order to work as they are intended, I think such systems need to ensure that, to use the economist's terminology, the marginal cost of receiving extra help, especially when it is not strictly required, must exceed the marginal benefit of doing so.

I think designing such a system poses some difficulties, if I may employ typical English understatement. When I was a full-time teacher of ICT and Computing I would spend more time with particular students to make sure they weren't lagging behind, and I offered alternative activities in the classroom in order to provide more steps in getting from A (no understanding) to B (full understanding), and after-school help. As far as I am aware, that approach did not have the sort of unintended consequences I described earlier.

Where does "high stakes" fit in?

You may be tempted to think that the most effective way to ensure that students learn is to make end-of-topic or end-of-year tests high stakes in some way. During my first year at university we were informed that if we failed just one of our courses, even if it had nothing to do with our main degree, we would have to repeat the entire year. I didn't and don't agree with this approach, but it certainly provided an incentive to learn enough to pass all the exams. But as we all know from recent experience, that approach in itself has a perverse incentive: it encourages teaching (or, in this case, learning) to the test.

I was not very good at mathematics when I was at school, but one of my teachers came up with a brilliant idea. He would set a test every Thursday, and anyone who didn't pass would receive a detention lasting 90 minutes. During that detention he offered no help or feedback, but made us re-do the test while sitting in silence for the whole time. This provided the incentive for me to sit next to the best maths student in class and learn how to copy his answers without being caught. How's that for a perverse incentive?

What counts as "punishment"

It occurs to me that in the realm of Computing, "failing" in some sense at building a robot (say) and making it move could lead to the student in question being made to spend even more time on it if someone like my maths teacher was involved -- a "punishment" that would be seen by many students as a reward. Even without the punishment aspect, anyone who loved such activity would perhaps have an incentive to make it not quite work -- although in my experience the pleasure from succeeding would probably outweigh such considerations.

Conclusions

As you will have surmised from these musings, I believe that however you try to take assistance into account when assessing someone's performance, there will almost always be an unintended consequence arising from a perverse incentive.

It seems to me that there is a case for having (high stakes?) summative tests every so often, and to use the information arising from ongoing formative assessment to help you judge the results of those summative tests. For example, if a student has never seemed to "get it" in class, and then scores 90% on a test, you might infer that something strange was going on, and that it needs looking into. This is the approach I suggested in my article How to create a grade-prediction system in Excel, and save yourself loads of time.

Another conclusion I've come to is that the more apparently clever a system is, the more likely it will lead to unintended consequences in practice, perhaps because of perverse incentives. Obviously, that's a hypothesis, and I look forward to receiving news that it has been disproved.

Perhaps when all is said and done teachers simply need to do the best they can to judge students' abilities, while always being aware of the possible shortcomings of different forms of assessment. Indeed, I strongly believe in the efficacy of using one's professional judgement in assessing Computing, or in anything else for that matter.


This article first appeared in Digital Education, a free newsletter for anyone with a professional interest in education technology, ICT or Computing. To subscribe, please click on the button below.