Tuesday, December 15, 2020

The implications of complex program designs: Six proposals worth exploring?

Last week I was involved in a seminar discussion of a draft CEDIL paper reviewing methods that can be used to evaluate complex interventions. That discussion prompted me to the following speculations, which could have practical implications for the evaluation of complex interventions.

Caveat: As might be expected, any discussion in this area will hinge upon the definition of complexity. My provisional definition of complexity is based on a network perspective, something I've advocated for almost two decades now (Davies, 2003). That is, the degree of complexity depends on the number of nodes (e.g. people, objects or events), and the density and diversity of types of interactions between them. Some might object and say what you have described here is simply something which is complicated rather than complex. But I think I can be fairly confident in saying that as you move along this scale of increasing complexity (as I have defined it here) the behaviour of the network will become more unpredictable. I think unpredictability, or at least difficulty of prediction, is a fairly widely recognised characteristic of complex systems (But see Footnote).

The proposals:

Proposal 1. As the complexity of an intervention increases, the task of model development (e.g. a Theory of Change), especially model specification,  becomes increasingly important relative to that of model testing. This is because there are more and more parameters that could make a difference/ be "wrongly" specified

Proposal 2. When the confident specification of model parameters becomes more difficult then perhaps model testing will then become more like an exploratory search of a combinatorial space rather than more focused hypothesis testing.This probably has some implications for the types of methods that can be used. For example, more attention to the use of simulations, or predictive analytics.

Proposal 3. In this situation where more exploration is needed, where will all the relevant empirical data come from, to test the effects of different specifications? Might it be that as complexity increases there is more and more need for monitoring (/time-series data, relative to evaluation / once-off type data?

Proposal 4. And if a complex intervention may lead to complex effects – in terms of behaviour over time – then the timing of any collection of relevant data becomes important. A once-off data collection would capture the state of the intervention+context system at one point in an impact trajectory that could actually take many different shapes (e.g. linear, sinusoidal, exponential, etc. The conclusions drawn could be seriously misleading.

Proposal 5. And going back to model specification, what sort of impact trajectory is the intervention aiming for? One where change happens then plateaus, or one where there is an ongoing increase. This needs specification because it will affect the timing and type of data collection needed.

Proposal 6. And there may be implications for the process of model building. As the intervention gets more complex – in terms of nodes in the network –, there will be more actors involved, each of which will have a view on how the parts and perhaps the w0hole package is and should be working, and the role of their particular part in that process. Participatory, or at least consultative, design approaches would seem to become more necessary

Are there any other implications that can be identified? Please use the Comment facility below.

Footnote: Yes, I know you can also find complex (as in difficult to predict) behaviour in relatively simple systems, like a logistic equation that describes the interaction between predator and prey populations.  And there may be some quite complex systems (by my definition) that are relatively stable. My definition of complexity is more probabilistic than determinist

Friday, December 11, 2020

"If you want to think outside of the box, you first need to find the box" - some practical evaluative thinking about Futures Literacy




Over the last two days, I have participated in a Futures Literacy Lab, run by Riel Miller and organised as part of UNESCO's Futures Literacy Summit. Here are some off-the-cuff reflections.

Firstly the definition of futures literacy. I could not find a decent one, but my search was brief so I expect readers of this blog posting will quickly come up with a decent one. Until then this is my provisional interpretation. Futures literacy includes two types of skills, both of which need to be mastered, although some people will be better at one type than the other:


1. The ability to generate many different alternative views of what might happen in the future.


2. The ability to evaluate a diversity of alternative views of the future, using a range of potentially relevant criteria.

There is probably also a third skill, i.e. the ability to extract useful implications for action from the above two activities,  

The process that I took part in highlighted to me (perhaps not surprising because I'm an evaluator) the importance of the second type of skill above - evaluation. There are two reasons I can think of for taking this view:


1. The ability to critically evaluate one's ideas (e.g. multiple different views of the possible future) is a metacognitive skill which is essential. There is no value in being able to to generate many imagined futures if one is then incapable of sorting the "wheat from the chaff" - however that may be defined.


2. The ability to evaluate a diversity of alternative views of the future, can actually have a useful feedback effect, enabling us to improve the way we search for other imagined futures


Here is my argument for the second claim. In the first part of the exercise yesterday each participant was asked to imagine a possible future development in the way that evaluation will be done, and the role of evaluators, in the year 2050. We were asked to place these ideas on Post-It Notes on an online whiteboard, on a linear scale that ranged between Optimistic and Pessimistic. 

Then a second and orthogonal scale was introduced, which ranged from "I can make a difference" to I can't make a difference". When that second axis was introduced we were asked to adjust our Post-It Notes into a new position that represented our view of its possibility and our ability to make a difference to that event.  These two steps can be seen as a form of self-evaluation of our own imagined futures. Here is the result (don't bother try to read the note details).


Later on, as the process proceeded we were encouraged to 'think out of the box" But how do you do that, ...how do you know what is "out of the box"? Unless you deliberately go to extremes, with the associated risk that whatever you come up with be less useful (however defined)

Looking back at that task now, it strikes me that what the above scatterplot does is show you where the box is, so to speak.  And by contrast, where outside the box also is located.  "Inside the box" is the part of scatterplot where the biggest concentration of posts is located.  The emptiest area and thus most "out of the box" area is the top right quadrant.  There is only one Post-it Note there. So, if more out of the box thinking is needed in this particular exercise setting then perhaps we should start brainstorming about "Optimistic future possibilities and of a kind where I think "I can't make a difference"  - now there is a challenge!

The above example can be considered as a kind of toy model, a simple version of a larger and more complex range of possible applications. That is, that any combination of evaluative dimensions will generate a combinatorial space, which will be densely populated with ideas about possible futures in some areas and empty in others To explore those kinds of areas we will need to do some imaginative thinking at a higher level of abstraction, i.e. of the different kinds of evaluative dimensions that might be relevant. My impression is that this meta-territory has not yet been explored very much. When you look at the futures/foresight literature the most common evaluative dimensions are those of "possibility" and "desirability" (and ones I have used myself, within the ParEvo app). But there must be others that are also relevant in various circumstances.

Postscript 2020 12 11: This afternoon we had a meeting to review the Futures Literacy Lab experience. In that meeting one of the facilitators produced this definition of Futures Literacy, which I have visibly edited, to improve it :-)



 Lots more to be discussed here, for example:

1. Different search strategies that can be used to find interesting alternate futures. For example, random search, and "the adjacent possible" searches are two that come to mind

2. Ways of getting more value from the alternate futures already identified e.g. by recombination 

3. Ways of mapping the diversity of alternate futures that have already been identified e.g using network maps of kind I discussed earlier on this blog (Evaluating Innovation)

4. The potential worth of getting independent third parties to review/evaluate the (a) contents generated by participants, and (b) participants' self-evaluations of their content


For an earlier discussion of mine that might be of interest, see 

"Evaluating the Future"Podcast and paper prepared with and for the EU Evaluation Support Services Unit, 2020