Tuesday, October 18, 2022

Four types of futures that should be covered by a Theories of Change


ParEvo.org is a web app that enables the collaborative exploration of alternative futures, online. In the evaluation stage, participants are asked to identify which of the surviving storylines fall into each of these categories:

  • Most desirable
  • Least desirable
  • Most likely
  • Least likely
In one part of the analysis of storylines generated during a ParEvo exercise the storylines are plotted on scatter plot, where the two dimensions are likelihood and desirability, as seen in this example


Most Theories of Change that I have come across, when working as an evaluator, focus on a future that is seen as desirable and likely (as in expected). At best, the undesirable futures will be mentioned in an accompanying section on risks and their management.

A less myopic approach might be useful, one which would orient the users of the Theory of Change to a more adaptive stance towards the future.

One way forward would be to think of a four-part Theory of Change, each of which has different implications. as follows


The top right cell may already be covered by a Theory of Change. In the desirable but unlikely, and undesirable but likely two cells it would be useful to have ordered lists that describe events, what needs to be done before they happen, and what needs to be done after they happen. In the unlikely and undesirable cell plans for monitoring the status of these events need to be spelled out, and updated on an ongoing basis



Thursday, October 13, 2022

We need more doubt and uncertainty!


This week the Swedish Evaluation Society (SVUK)  is holding its annual conference. I took part in a session today on Theories of Change. The first part of my presentation summarised the points I made in a 2018 CEDIL Inception Report titled 'Theories of Change: Technical Challenges with Evaluation Consequences'. Following the presentation I was asked by Gustav Petersson, the discussant, whether we should pay more attention to the process of generating diagrammatic Theories of Change. I could only agree, reflecting that for example it was not uncommon that a representative of a conference working group might summarise a very comprehensive and in-depth discussion in all too brief and succinct terms when reporting back to a plenary. Leaving out, or understating, the uncertainties , ambiguities and disagreements. Similarly the completed version of a diagrammatic Theory of Change is likely to suffer from the same limitations ... being an overly simplified version of a much more complex and nuanced discussions between those involved in its construction that went on beforehand.

Later in the day I was reminded of this section in the Hitchhiker's Guide to the Galaxy where Vroomfondel, representing a group of striking philosophers said '"That's right!" and shouted , "we demand rigidly defined areas of doubt and uncertainty!"

I'm inclined to make a similar kind of request of those developing Theories of Change.  And of those subsequently charged with assessing the evaluability of the associated intervention, including its Theory of Change. What I mean is that the description of the Theory of Change should make it clear which various parts of the theory the owner(s) of that theory are more confident in verses less confident. Along with descriptions of the nature of the doubt or uncertainty and its causes e.g. first-hand experience, or supporting evidence (or lack of) from other sources.

Those undertaking an evaluability assessment could go a step further and convert various specific forms of doubt and uncertainty into evaluation questions that could form an important part of the Terms of Reference for an evaluation.  This might go some way to remedying another problem discussed during the session, which is the all too common (in my experience) phenomena of Terms of Reference only making generic references to an intervention's Theory of Change. For example, by asking in broad terms about "what works and in what circumstances". Rather than the testing of various specific parts of that theory, which would arguably be more useful, and better use of limited time and resources.

The bottom line: The articulation of a Theory of Change should conclude with a list of important evaluation questions. Unless there are good reasons to the contrary, those questions should then appear in the Terms of Reference for a subsequent evaluation



PS: Vroomfondel is a philosopher. He appears in chapter 25 of The Hitchhiker's Guide to the Galaxy, along with his collegue Majikthise, as a representative of the Amalgamated Union of Philosophers, Sages, Luminaries and Other Thinking Persons (AUPSLOTP; the BBC TV version inserts 'Professional' before 'Thinking'). The Union is protesting about Deep Thought, the computer which is being asked to determine the Answer to the Ultimate Question of Life, the Universe and Everything. See https://hitchhikers.fandom.com/wiki/Vroomfondel



Thursday, June 30, 2022

Using ParEvo to conduct thought experiments


I have just had an interesting conversation with an NGO network who have been developing some criteria to: (a)  help speed up the approval and release of funding in humanitarian emergencies, but (b) at same time minimising risk of poor use of those funds.

They think these criteria are useful but are not entirely sure whether those seeking funding will agree.  So they are exploring ways of testing out their applicability through a wider consultation process.

One way doing this, which we have been discussing, involves the use of ParEvo.org. The plan is that a group of participants representing potential grantees will develop a set of storylines which starts off with a particular organisation seeking funding for a particular humanitarian emergency. Then a branching structure of possible subsequent storyline developments will be articulated through the usual ParEvo process

After those storylines been developed there will be an evaluation phase, as is common practice now with most ParEvo exercises.  At this point the participants will be asked two generic types of questions ( and variations on these), as described below:

1.  Which of the criteria in the current framework would be most likely to help avoid or mitigate the problems seen in storyline X? (Answer=Description & Explanation) 

  • and if the answer is none, are there any other criteria that could be included in the framework that might have helped?

2.  Which of the storylines in the current exercise would have most benefited by criteria X in the current framework, in the sense of problems described there would have been avoided or mitigated. (Answer=Description & Explanation) 

  • and if the answer is none, does this suggest that the criteria is irrelevant and could be removed?
Postscript: One interesting thing about this type of thought experiment is that the theory (the proposed funding criteria) and the possible realities that they may be applied to (where the theories may or may not work there as expected) are constructed by different parties who are independent from each other.  This is not usually the case with thought experiments, and could be seen as a positive variation.

Stay tuned for if and when this idea flies, then soars or crashes


Courtesy https://xkcd.com/

For more on thought experiments, see Armchair science



Friday, June 17, 2022

Alternative futures as "search strategies"




When you read the phrase "search strategy' this may bring to mind what you need when you are doing a literature search on the Internet.  Or you may be thinking about different forms of supervised machine learning, which involve different types of search strategies.  For example in my Excel-based EvalC3 prediction modelling app there are four different search strategies that users can choose from, to help find the most accurate predictive model describing what combinations of attributes are the best predictor of a particular outcome.  Or you may have heard of James March, an organisational theorist who in 1981 wrote a paper called 'A model of adaptive organizational search ' where he talks about how organisations find the right new technologies to develop and explore.This is probably the closest thing to the type of search process that I'm describing below.

Right now I am in the process of helping some other consultants design a ParEvo exercise, in which recipients of research grants from the same foundation will collaboratively develop a number of alternative storylines describing how their efforts to ensure the uptake and use of the research findings takes place (and sometimes fails to take place) over the coming three years.  Because these are descriptions of possible futures they are inherently a form of fiction.  But please note they are not an attempt at "predicting" fiction.  Rather, they are more like a form of 'preparedness enabling ' fiction.

As part of the planning process for this exercise we have had to articulate our expectations of what will come out of this exercise, in terms of possible desirable benefits for both the participants and the foundation.  In other words the beginnings of a Theory of Change, which needs to be supplemented by details of how the exercise will be best be run in this particular instance, and thus hopefully deliver these results.

When thinking about reasonable expectations for this exercise I came up with the following possibilities, which are now under discussion:

1 Participants will hear different interpretations and views of 
  1. What other participants mean when they use the term "research uptake ' 
  2. What successful, and unsuccessful, research uptake looks like in its various forms, to various participants
  3. How the process of research uptake can be facilitated, and inhibited, by a range of factors – some within researchers control and some beyond their control.
2.  This experience may then inform how each of the participants proceed with their own work on facilitating research uptake

3. The storylines that are generated by the end of the exercise will provide the participants and the XXXX trust with a flexible set of expectations against which actual progress with research uptake can be compared at a later date.

So, my current thinking is that what we have here is a description of a particular kind of search strategy where both the objectives worth pursuing, and the means of achieving them, are both being explored at the same time, at least within the ParEvo exercise.  Though other things will also be happening after the exercise, hopefully involving some use of the ideas generated during exercise (see possibility 2)

There is also another facet of the idea of search strategies which needs to be mentioned here.  When search is used in a machine learning context it is always accompanied by an evaluation function which determines whether the search continues or comes to a stop because the best possibility has now been identified (a stopping rule, I think is the term involved).  So, in the three possibilities listed above the last one describes the possibility of an evaluation function.  Exactly how it will work needs more thinking, but I think it will be along the lines of asking participants in the prior exercise to identify the extent to which their experience in the interim period has fitted any of the storylines that were developed earlier, and in what ways it has and has not, and why so in both cases.  Stay tuned...




Thursday, April 28, 2022

Budgets as theories


A government has a new climate policy. It outlines how climate investments will be spread through a number of different ministries, and implemented by those ministries using a range of modalities. Some funding will be channelled to various multilateral organisations. Some will be spent directly by the ministries. Some will be channelled on to the private-sector. At some stage in the future this government wants to evaluate the impact of this climate policy. But before then it is been suggested that an evaluability assessment might be useful, to ask if how and when such an evaluation might be feasible.

This could be a challenge to those with the task of undertaking the evaluability assessment. And even for those planning the Terms of Reference for that evaluability assessment. The climate policy is not yet finalised. And if the history of most government policy statements (that I have seen) has any lessons it is that you can't expect to see a very clearly articulated Theory of Change of the kind that you might expect to find in the design of a particular aid programme.

My provisional suggestion at this stage is that the evaluability assessment should treat the government's budget, particularly those parts involving funding of climate investments, as a theory of what is intended. And to treat the actual flows of funding that subsequently occur as the implementation of that theory.  My naïve understanding of the budget is that it consists of categories of funding, along with subcategories and sub- subcategories, et cetera. In other words a type of tree structure involving a nested series of choices about where more versus less funds should go.  So, the first task of an evaluability assessment would be to map out the theory i.e. the intentions as captured by budget statements at different levels of detail, moving from national to ministerial and then to small units thereafter. And to comment on the adequacy of these descriptions and and gaps that need to be addressed.

This exercise on its own will not be sufficient as an explication of the climate policy theory because it will not tell us how these different flows of funding are expected to do their work. One option would be to follow each flow down to its 'final recipient', if such a thing can actually be identified. But that would be a lot of work and probably leave us with a huge diversity of detailed mechanisms. Alternatively, one might do this on sampling basis, but how would appropriate samples be selected?

There is an alternative which could be seen as a necessity that could then be complemented by a sampling process. This would involve examining each binary choice, starting from the very top of the budget structure and asking 'key informants" questions about why climate funding was present in one category but not the other, or more in one category than the other.  This question on its own might have limited value because budgeting decisions are likely to have a complex and often muddy history, and the responses received might have a substantial element of 'constructed rationality' . Nevertheless the answers could provide some useful context. 

A more useful follow-up question would be to then ask the same informants about their expectations of differences in performance of the amount of climate financing via category X versus category Y.  Followed by a question about how they expect to hear about the achievement of that performance, if at all.  Followed by a question about what they would most like to know about performance in this area. Here performance could be seen in terms of the continuum of behaviours, ranging from simple delivery of the amount of funds as originally planned, to their complete expenditure, followed by some form of reporting on outputs and outcomes, and maybe even some form of evaluation, reporting some form of changes.  

These three follow-up questions would address three facets of an evaluability assessments (EA): a) The ToC - about expected  changes, b) Data availability , c) Stakeholder interests.  Questions would involve two types of comparisons: funding versus no funding, and more versus less funding. The fourth EA question, about the surrounding institutional context, typically asks about the factors that may enable and/or limit an evaluation of what actually happened (more on evaluability assessments here).

 There will of course be complications in this sort of approach.. Budget documents will not simply be a nested series of binary choices, at each level their work may be multiple categories available rather than just two.  However informants could be asked to identify 'the most significant difference 'between all these categories, in effect introducing an intermediary binary category. There could also be a great number of different levels to the budget documents, with each new level in effect doubling the number of choices and associated questions that need to be asked. Prioritisation of enquiries would be needed, possibly based on a 'follow the (biggest amount of) money 'principle.  It is also possible that quite a few informants will have limited ideas or information about the binary comparisons they are asked about.  A wider selection of informants might help fill that gap.  Finally there is the question of how to 'validate" the views expressed about expected differences in performance, availability of performance information and relevant questions about performance.  Validation might take the form of a survey of a wider constituency of stakeholders within the organisation of interest, of the views expressed by the informants.

PS: Re this comment in the third para above: "And to treat the actual flows of funding that subsequently occur as the implementation of that theory"  One challenge the EA team might find is that while it may have accessed to detailed budget documents, in many places it may not yet be clear where funds have been tagged as climate finance spending. That itself would be an important EA finding.

To be continued...

Sunday, April 24, 2022

Making small samples of large populations useful


I was recently contacted by someone who is working for a consulting firm that has a contract to evaluate the implementation of a large-scale health program covering a huge number of countries.  Their client had questioned their choice of 6 countries as case studies.  They were encouraging the consultancy firm to expand the number of country case studies, apparently because they thought this would make this sample of country cases more representative of the population of countries as a whole.  However, the consulting firm wasn't planning to aggregate results of the six country case studies and then make a claim about generalisability of findings across the whole population of countries.  Quite the opposite, the intention was that each country case study would provide a detailed perspective on one or more particular issues that was well exemplified by that case.

In our discussions, I ended up suggesting a strategy that might satisfy both parties in that it addressed to some extent the question of generalisable knowledge at the same time was designed to exploit the particularities of individual country cases.  My suggestion was relatively simple, although implementing might take a bit of work making use of whatever data is available on the full population of countries.  The suggestion was that for each individual case study the first step in the process would be to identify and explain the interesting particularities of that case, within the context of the evaluation's objectives.    Then the evaluation team would look through whatever data is available on the whole population of countries, with the aim of identifying a sub-set of other countries that had similar characteristics (perhaps both generic {political, socio-economic indicators} and issue specific) with the case study country. These would then be assumed to be the countries where the case study findings and recommendations could be most relevant. 

A shown in the diagram below, it is possible that the sub-set of countries relevant to each case study county might overlap to some extent. Even when one case study country is examined it is possible that it might have more than one particularity of interest, each of whose analysis might be usefully generalised to a limited number of other countries. And those different sub-sets of countries may themselves overlap to some extent (not shown below).  


Green nodes = case study countries
Red nodes = remainder of the whole population 
Red nodes connected to green nodes = countries that might find green node country case study findings relevant
Unconnected red nodes = Parts of whole population where case study findings not expected to have any relevance

Another possibility, perhaps seen as unadvisable in normal circumstances, would be to identify the relevant countries to any case study analysis after the fact, not necessarily or only before.  After the case study had actually been carried out there would be much more information available on the relevant particularities of the case study country that might make it easier to identify which other countries these finding were most relevant to. However the client of the evaluation might need to be given some reassurance in advance. For example, by ensuring that at least some of these (red node) countries were identified at the beginning, before the case studies were underway.

PS: It is possible to quantify the nature of this kind of sampling. For example, in the above diagram
Total number of cases =  37  (red and green). 
Case study cases = 5 (14%)  of all cases
Relevant-to-case-study cases = 17 (46%) of all cases
Relevant-to->1-case-study cases = 3 (8%) of all cases
Not-relevant-to-case-study* cases = 15 (40%) of all cases 

*Bear in mind also that in many evaluations case studies will not be the only means of inquiry. For example, there are probably internal and external data sets that can be gathered and analysed re the whole set of 37 countries.

Conclusion: We should not be thinking in terms of binary options. It is not true that either  a case is part of a representative sample of a whole population, or it is representative and of interest only to itself. It can be relevant to a sub-set of the population.