The blog posting is a response to Tom Aston's blog posting: Rubrics as a harness for complexity
I have just reviewed an evaluation of the effectiveness of
policy influencing activities of programs funded by HMG as part of the International
Carbon Finance Initiative. In the
technical report there are a number of uses of rubrics to explain how various
judgements were made. Here, for example,
is one summarising the strength of evidence found during process tracing
exercises:
- Strong support – smoking gun (or DD) tests passed and no hoop tests (nor DDs) failed.
- Some support – multiple straw in the wind tests passed and no hoop tests (nor DDs) failed; also, no smoking guns nor DDs passed.
- Mixed – mixture of smoking gun or DD tests passed but some hoop tests (or DDs) failed – this required the CMO to be revised.
- Failed – some hoop (or DD) tests failed, no double decisive or smoking gun tests passed – this required the theory to be rejected and the CMO abandoned or significantly revised.
Another
rubric described in great detail how three different levels of strength of
evidence were differentiated (Convincing Plausible, Tentative). There was no doubt in my mind that these
rubrics contributed significantly to the value of the evaluation report. Particularly by giving readers confidence in
the judgements that were made by the evaluation team.
But…
I can't help feel that the enthusiasm for rubrics seems to be out of proportion
with their role within an evaluation. They are a useful measurement device that can make
complex judgements more transparent and thus more
accountable. Note the emphasis on the ‘more‘…
There are often plenty of not necessarily so transparent judgements present in the
explanatory text which is used to annotate each point in a rubric scale. Take, for example, the first line of text in Tom
Aston’s first example
here, which reads “Excellent: Clear example of exemplary performance
or very good practice in this domain: no weakness”
As
noted in Tom’s blog it has been argued that rubrics have a wider value i.e. “rubrics are useful
when trying to describe and agree
what success looks like for tracking changes in complex phenomena”. This is where I
would definitely argue “Buyer beware” because rubrics have serious limitations in respect
of this task.
The
first problem is that description and valuation are separate cognitive tasks. Events that take place can be described, they
can also be given a particular value by observers (e.g. good or bad). This dual process is implied in the above
definition of how rubrics are useful. Both of these types of judgements are often
present in a rubrics explanatory text e.g. Clear example of exemplary
performance or very good practice in this domain: no weakness”
The
second problem is that complex events usually have multiple facets, each of
which has a descriptive and value aspect.
This is evident in the use of multiple statements linked by colons in the
same example rubric I refer to above.
So
for any point in a rubric’s scale the explanatory text has quite a big task on
its hands. It has to describe a specific
subset of events and give a particular value to each of those. In addition, each adjacent point on the scale
has to do the same in a way that suggests there are only small incremental
differences between each of these points judgements. And being a linear scale, this suggests or even requires, that there is only one path from the bottom to the top of the scale. Say goodbye to equifinality!
So,
what alternatives are there, for describing and agreeing on what success looks
like when trying to track changes in complex phenomena? One solution which I have argued for, intermittently,
over a period of years, is the wider use of weighted checklists. These are described at
length here.
Their design addresses three problems mentioned above. Firstly,
description and valuation are separated out as two distinct judgements. Secondly, the events that are described and
valued can be quite numerous and yet each can be separately judged on these two criteria. There is then a mechanism for
combining these judgements in an aggregate scale. And there is more than one route from the bottom to the top of this aggregate scale.
“The proof is in the pudding”. One particular weighted checklist, known as
the Basic Necessities Survey, was designed to measure and track changes in household-level poverty. Changes in poverty levels
must surely qualify as ‘complex phenomena ‘.
Since its development in the 1990s, the Basic Necessities Survey has been
widely used in Africa and Asia by international environment/conservation
organisations. There is now a
bibliography available online describing some of its users and uses. https://www.zotero.org/groups/2440491/basic_necessities_survey/library