Wednesday, April 24, 2024

Developing and using a Configurational Theory of Change within an evaluation


Figure 1

Ho-hum, yet another evaluation brand being promoted in an already crowded marketplace.FFS... 

Yes, I think this reaction is understandable, but I think there is something here captured under this title (Developing and using a Configurational Theory of Change... ) which has potential value.  I will try to explain...

Many evaluators make use of theories of change, as part of a theory-based approach to evaluation. Many theories of change are described in some type of diagrammatic form. And a typical feature of those diagrams is their convergent nature. That is, they start of with a range of different types of inputs and activities which follow various causal pathways towards a limited number of final outcomes.

This image is almost the complete opposite of what happens in actual practice on the ground. Financial inputs come from a limited number of sources, these become available to a small range of partners who carry out their own range of activities, in a variety of different locations each with their own populations, including those intended and not intended to be affected. This description is of course a simplification, but it applies to many development aid programme designs. The point I'm making here is that this in-reality process is not convergent it is divergent!  It seems like the diagrammatic theories of change I have described are a type of Procrustean bed

This blog posting has been most immediately prompted by a report I have just reviewed on potential evaluation strategies for a large national level climate finance strategy (CFS). The theory of change describes multiple causal pathways connecting the initial provision of government finance through to four expected types of expected impacts.  With two of these causal pathways alone the number of projects being funded is in the hundreds. The report struggled with the issue of how to measure the expected impacts given the scale and likely diversity of events on the ground. And the corresponding challenge of how to sample those projects. Part of my diagnosis of the problem here was the evaluation team's measurement-led approach. And the weakness of the conceptual framework i.e. the incapacity of the theory of change to capture the diversity of what was taking place.

Describing the alternative to my client is now my challenge. I think the alternative has two parts. Firstly, one should start at the beginning, where the money becomes available, and then follow the money (and the people responsible) as it gets distributed according to its intended purposes. If things are not happening as expected early on in this process then this affects expectations of what might and might not be observable later on in the form of 'outcomes' or 'impacts'. Put crudely, there is no point trying to observe the impact of something that has not yet been delivered. And in the case of strategies like the CFS, a large part of success can simply be gettting the money where it should be spent.

Secondly, as money is distributed from a central fund, decisions are going to be made about how it should be parcelled out in different amounts for different purposes through different institutions.  Each time that happens the decisions that have been made about how to do this are hopefully not random. Evaluating how those decisions were made may not necessarily be all that useful, because often there will be opaque mini, meso and macro political processes involved. But the announced decisions may include some intentionally explicit expectations about the official purposes of different allocations. Interviews those responsible for those allocations might also elicit more informal and more current expectations about what might be the short and longer terms effects of some of these allocations, when compared to others.The point I am emphasising here is that sometimes we can come to evaluative judgements not through the use of any overriding predetermined criteria, but by using a more inductive process, where we compare one option to another. This is an excuse for me to quote Marx (G): 

Friend says to Marx – 'Life is difficult'.

Marx replies to friend – 'Compared to what?'

This type of inductive comparative evaluation doesn't have to be completely free form. It is conceivable for example that we could look at two tranches of government climate finance funding and ask (those with proximate responsibilities for that funding) what difference there might between those blocks of funding in terms of how each might meet one or more of the OECD criteria (These range in their concerns from the more immediate issues of coherence and efficiency to later concerns with effectiveness and impact). Respondents answers in the form of expectations can be seen as mini theories a.k.a. hypotheses that then might be testable through the gathering of relevant data.  

Before these questions can be posed the cases that are going to be compared would need to be identified. The 'cases' in this example would be particular blocks of funding. Further along the implementation process the cases could be partners who are receiving funding, or activities that those partners implementing, or communities those activities are directed towards. Nevertheless, at any point along this chain there is still a challenge, which is how to select cases for comparison. For example, if we are looking at a particular budget document which distributes funding into multiple purpose categories we will be faced with the question of which of these categories to compare.

One way forward is to let the interviewed person decide, especially if they have responsibilities in this area. Using hierarchical card sorting (HCS) the interviewer starts with a request, which is phrased like this: 'What is the most significant difference between all these budget categories in terms of how they will achieve the objectives of the climate Finance strategy? Please sort the budget categories into two piles according to this difference and then explain it to me".  Having identified ppiles of types of cases that can be compared the respondent can then be asked for details about their expectations of the cases in one pile versus the other (See FN1).The same question can then be reiterated by focusing on each of those two piles in turn and getting the respondent to break them into two smaller sub- piles. When their answers are followed by explanations this will help differentiate expectations in further detail.

Figure 2 (click on to enlarge)

Figure 2 shows the results of such an exercise, where the respondents were NGO staff responsible for the development and management of a portfolio of projects. They were asked to sort the projects into two piles according to "What they saw as the most significant difference between the projects, of a kind that would make a difference to what they could achieve". Their choices generated the tree structure. They were then asked to make a series of binary choices at each branching point, indentifying which of the two types of projects described there that "they expected to be most successful, in terms of the extent to which they will contribute to the achievement of the overall objectives of the portfolio" . Their choices are shown by the red links. In this diagram their responses have been sorted such that the preferred red option is always shown above the non-preferred option. The aggregate result is a ranked set of 8 types of projects, with the highest rank (1) at the top. Each of these types is not an isolated category of its own, but part of a configuration that can be read along each branch, from left to right.  

Here are some of the type descriptions and the reasons why one versus the other was selected most likely to contirbute to the portfolio objectives. Further discusison would be needed to esytablish how the presence/absence of these characteristics could be identified on the ground.

Wider focus

Aim to influence wider policy and environment, and have more sustainable and wider impact beyond children and their families.

Likely to be more successful: Because it will have a wider reach and be more sustainable

Local focus

More hands on work with children on a day to day basis. Impact may be sustained but it will be limited to children and their families.

Likely to be less successful:

Locally driven

Partner and the projects are locally rooted, driven by local needs and priorities. They are more likely to “get it right”. They can’t walk away when Comic Relief funding ends. More likely to be sustainable.

Likely to be more successful: More embedded in the context, will outlast the project, be more responsive.

UK driven

UK driven projects, almost sub-contracting. They have a set end-point.

Likely to be less successful:

There is a larger question here of course that also relates to sampling. Who are you going to interview in this way? The suggestion above was 'to follow the money '. In other words, to follow lines of responsibility and interview people about the domains of activity they are responsible for, using HCS as a means of structuring the discussion. There is a strategy choice here between what is known as a breadth-first search versus a depth-first search strategies. From a given point in a flow of funds (and of responsibilities) there can be distributions going in different directions, each of which could all be explored. Following all of these is a form of breadth-first search. Alternatively the focus could be just on one of those developments, and following the subsequent distribution of funding and responsibility further down one (or few) line. This is a form of depth-first search. Which of those search strategies to pursue is probably a matter to be decided by the evaluation client. But may also need to be adaptive, informed by what was found by the evaluation team in prior interviews.

Courtesy Jacky Lieu: Comparison of Breadth-First Search and Depth-First Search: Understanding Their Methods and Uses 

But what about aggregation?

If you followed my suggested approach, the closer you got to the people whose lives were of  final/main concern, the small the segments of all the funding you would be looking at. These would be more comparable than when looking at as part of a larger group, with more customised context specific assessments of expected and actual impact. But how would you / the evaluation team then be able to make any overall statement about the strategy as a whole?

The way forward is to think of performance measurement in slightly different terms, than just using a simple indicator based measure. Imagine a scatter plot, with one dimension X describing relative i.e. ranked expectations of achievement and the other dimension Y describing ranked actual/observed/assessed achievements. The entities in the scatter plot are the groups of cases in the smallest available sub-categories that were developed. Their rank position, relative to each other, is evident  when all the binary assessments of expected performance are generated through the process described above. See here for more on how this is done. The scatter plot can in turn be summarised in at least two different ways: using a measure of rank correlation (or how achievement relates to expectations) and using Classification Accuracy, if and when a minimum rank position of achievement is identfiied. Equally importantly, qualitative descriptions can be given of cases that exemplify performance that most meets expectations, and the reverse, along with  positive and negative deviants (outliers).

What we could end up with is a tree structure documenting multiple routes to both high and low performance, implemented in varyingly different  contexts (describable at different levels of scale).

Other scatter plot designs are more relevant to assessments of strategies. The ranking generated by Figure 2 was plotted against the age of the projects and their grant size, which might be expected to be influenced by the contents of a funding strategy. Neither of these two measures showed any relationship to perceived strategic priorities!

To be continued....

PS1: When asking about expected effects of one type of allocation versus another, it may make sense to encourage a focus on more immediately expected effects first, and then later ones. They may be more likely, more easily articulated and more evaluable.

PS2: Hughes-McLure, S. (2022). Follow the money. Environment and Planning A: Economy and Space, 54(7), 1299–1322.