Friday, December 11, 2020

"If you want to think outside of the box, you first need to find the box" - some practical evaluative thinking about Futures Literacy




Over the last two days, I have participated in a Futures Literacy Lab, run by Riel Miller and organised as part of UNESCO's Futures Literacy Summit. Here are some off-the-cuff reflections.

Firstly the definition of futures literacy. I could not find a decent one, but my search was brief so I expect readers of this blog posting will quickly come up with a decent one. Until then this is my provisional interpretation. Futures literacy includes two types of skills, both of which need to be mastered, although some people will be better at one type than the other:


1. The ability to generate many different alternative views of what might happen in the future.


2. The ability to evaluate a diversity of alternative views of the future, using a range of potentially relevant criteria.

There is probably also a third skill, i.e. the ability to extract useful implications for action from the above two activities,  

The process that I took part in highlighted to me (perhaps not surprising because I'm an evaluator) the importance of the second type of skill above - evaluation. There are two reasons I can think of for taking this view:


1. The ability to critically evaluate one's ideas (e.g. multiple different views of the possible future) is a metacognitive skill which is essential. There is no value in being able to to generate many imagined futures if one is then incapable of sorting the "wheat from the chaff" - however that may be defined.


2. The ability to evaluate a diversity of alternative views of the future, can actually have a useful feedback effect, enabling us to improve the way we search for other imagined futures


Here is my argument for the second claim. In the first part of the exercise yesterday each participant was asked to imagine a possible future development in the way that evaluation will be done, and the role of evaluators, in the year 2050. We were asked to place these ideas on Post-It Notes on an online whiteboard, on a linear scale that ranged between Optimistic and Pessimistic. 

Then a second and orthogonal scale was introduced, which ranged from "I can make a difference" to I can't make a difference". When that second axis was introduced we were asked to adjust our Post-It Notes into a new position that represented our view of its possibility and our ability to make a difference to that event.  These two steps can be seen as a form of self-evaluation of our own imagined futures. Here is the result (don't bother try to read the note details).


Later on, as the process proceeded we were encouraged to 'think out of the box" But how do you do that, ...how do you know what is "out of the box"? Unless you deliberately go to extremes, with the associated risk that whatever you come up with be less useful (however defined)

Looking back at that task now, it strikes me that what the above scatterplot does is show you where the box is, so to speak.  And by contrast, where outside the box also is located.  "Inside the box" is the part of scatterplot where the biggest concentration of posts is located.  The emptiest area and thus most "out of the box" area is the top right quadrant.  There is only one Post-it Note there. So, if more out of the box thinking is needed in this particular exercise setting then perhaps we should start brainstorming about "Optimistic future possibilities and of a kind where I think "I can't make a difference"  - now there is a challenge!

The above example can be considered as a kind of toy model, a simple version of a larger and more complex range of possible applications. That is, that any combination of evaluative dimensions will generate a combinatorial space, which will be densely populated with ideas about possible futures in some areas and empty in others To explore those kinds of areas we will need to do some imaginative thinking at a higher level of abstraction, i.e. of the different kinds of evaluative dimensions that might be relevant. My impression is that this meta-territory has not yet been explored very much. When you look at the futures/foresight literature the most common evaluative dimensions are those of "possibility" and "desirability" (and ones I have used myself, within the ParEvo app). But there must be others that are also relevant in various circumstances.

Postscript 2020 12 11: This afternoon we had a meeting to review the Futures Literacy Lab experience. In that meeting one of the facilitators produced this definition of Futures Literacy, which I have visibly edited, to improve it :-)



 Lots more to be discussed here, for example:

1. Different search strategies that can be used to find interesting alternate futures. For example, random search, and "the adjacent possible" searches are two that come to mind

2. Ways of getting more value from the alternate futures already identified e.g. by recombination 

3. Ways of mapping the diversity of alternate futures that have already been identified e.g using network maps of kind I discussed earlier on this blog (Evaluating Innovation)

4. The potential worth of getting independent third parties to review/evaluate the (a) contents generated by participants, and (b) participants' self-evaluations of their content


For an earlier discussion of mine that might be of interest, see 

"Evaluating the Future"Podcast and paper prepared with and for the EU Evaluation Support Services Unit, 2020




Monday, December 07, 2020

Has the meaning of impact evaluation been hijacked?

 



This morning I have been reading, with interest, Giel Ton's 2020 paper: 
Development policy and impact evaluation: Contribution analysis for learning and accountability in private sector development 

 I have one immediate reaction; which I must admit I have been storing up for some time.  It is to do with what I would call the hijacking of the meaning or definition of 'impact evaluation'.  These days impact evaluation seems to be all about causal attribution. But I think this is an overly narrow definition and almost self-serving of the interests of those trying to promote methods specifically dealing with causal attribution e.g., experimental studies, realist evaluation, contribution analysis and process tracing. (PS: This is not something I am accusing Giel of doing!)

 I would like to see impact evaluations widen their perspective in the following way:

1. Description: Spend time describing the many forms of impact a particular intervention is having. I think the technical term here is multifinality. In a private-sector development programme, multifinality is an extremely likely phenomenon.  I think Giel has in effect said so at the beginning of his paper: " Generally, PSD programmes generate outcomes in a wide range of private sector firms in the recipient country (and often also in the donor country), directly or indirectly."

 2. Valuation: Spend time seeking relevant participants’ valuations of the different forms of impact they are experiencing and/or observing. I'm not talking here about narrow economic definitions of value, but the wider moral perspective on how people value things - the interpretations and associated judgements they make. Participatory approaches to development and evaluation in the 1990s gave a lot of attention to people's valuation of their experiences, but this perspective seems to have disappeared into the background in most discussions of impact evaluation these days. In my view, how people value what is happening should be at the heart of evaluation, not an afterthought. Perhaps we need to routinely highlight the stem of the word Evaluation.

 3. Explanation: Yes, do also seek explanations of how different interventions worked and failed to work (aka causal attribution).  Paying attention of course to heterogeneity, both in the forms of equifinality and multifinality Please Note: I am not arguing that causal attribution should be ignored - just placed within a wider perspective! It is part of the picture, not the whole picture.

 4. Prediction: And in the process don't be too dismissive of the value of identifying reliable predictions that may be useful in future programmes, even if the causal mechanisms are not known or perhaps are not even there.  When it comes to future events there are some that we may be able to change or influence, because we have accumulated useful explanatory knowledge.  But there are also many which we acknowledge are beyond our ability to change, but where with good predictive knowledge we still may be able to respond to appropriately.

Two examples, one contemporary, one very old: If someone could give me a predictive model of sharemarket price movements that had even a modest 55% accuracy I would grab it and run, even though the likelihood of finding any associated causal mechanism would probably be very slim.  Because I’m not a billionaire investor, I have no expectation of being able to use an explanatory model to actually change the way markets behave.  But I do think I could respond in a timely way if I had relevant predictive knowledge.

 Similarly, with the movements of the sun, people have had predictive knowledge about the movement of the sun for millennia, and this informed their agricultural practices.  But even now that we have much improved explanatory knowledge about the sun’s movement few feel this would think that this will help us change the way the seasons progress.

 I will now continue reading Giel's paper…


2021 02 19: I have just come across a special issue of the Evaluation journal of Australasia, on the subject of values. Here is the Editorial section.

Sunday, December 06, 2020

Quality of Evidence criteria that can be applied to Most Significant Change (MSC) stories

 


Two recent documents have prompted me to do some thinking on this subject

If we view Most Significant Change (MSC) stories as evidence of change (and what people think about those changes) what should we look for in terms of quality - what are the attributes of quality we should look for?

Some suggestions that others might like to edit or add to, or even delete...

1. There is clear ownership of an MSC story and the reasons for its selection by the storyteller. Without this, there is no possibility of clarification of any elements of the story and its meaning, let alone more detailed investigation/verification

2. There was some protection against random/impulsive choice. The person who told the story was asked to identify a range of changes that had happened, before being asked to identify the one which was most significant 

3. There was some protection against interpreter/observer error. If another person recorded the story, did they read back their version to the storyteller, to enable them to make any necessary corrections?

4. There has been no violation of ethical standards: Confidentiality has been offered and then respected. Care has been taken not only with the interests of the storyteller but also of those mentioned in a story.

5. Have any intended sources of bias been identified and explained? Sometimes it may be appropriate to ask about " most significant changes caused by....xx..." or "most significant changes of ...x ...type"

6. Have any unintended sources of bias been anticipated and responded to? For example, by also asking about "most significant negative changes " or "any other changes that are most significant"?

7. There is transparency of sources. If stories were solicited from a number of people, we know how these people were identified and who was excluded and why so. If respondents were self-selected we know how they compare to those that did not self-select.

8. There is transparency of selection process: If multiple stories were initially collected then the most significant of these have been selected then reported and used elsewhere, the details of the selection process should be available, including (a) who was involved, (b) how choices were made, and (c) the reasons given for the final choice(s) made

9. Fidelity: Has the written account of why a selection panel chose a story as most significant done the participants' discussion justice? Was it sufficiently detailed, as well as being truthful?

10. Have potential biases in the selection processes been considered? Do most of the finally selected most significant change stories come from people of one kind versus another e.g. men rather than women, one ethnic or religious group versus others? In other words, is the membership of the selection panel transparent? (thanks to Maleeha below).

11.    your thoughts here on.. (using the Comment facility below).

Please note 

1. That in focusing here on "quality of evidence" I am not suggesting that the only use of MSC stories is to serve as forms of evidence. Often the process of dialogue is immensely important and it is the clarification of values and who values what and why so, that is most important. And there are also bound to be other purposes also served

2. (Perhaps the same point, expressed in another way) The above list is intentionally focused on minimal rather than optimal criteria. As noted above, a major part of the MSC process is about the discovery of what is of value, to the participants.  

For more on the MSC technique, see the resources here.