Sunday, April 25, 2010

Evaluating a composite Theory of Change (ToC)

One NGO I have been working with has a project that is being implemented via partner organisations in four different countries. There is one over-arching LogFrame capturing the generic ToC, but the situation in each country is quite different. So country specific LogFrames were developed to recognise that fact. However, for convenience of reporting to the back donor the progress report format has been based on the contents of the generic LogFrame. When it comes to the Mid-Term Review more attention will need to be paid to the country specific LogFrames. But then how will the four MTR results be systematically aggregated into one synthesis report?

Other colleagues of mine have had to review a funding mechanism involving 30 or more project partners. The diversity of activities on the ground there is even greater. Rather than focus on the original LogFrame that describes the purpose of the funding mechanism they got all the project staff and local partners together to construct a retrospective ToC that fitted all the funded activities. That is how they have dealt with the micro-macro compatibility issue.

How then have they evaluated the 30 projects in terms of this composite ToC? Well, because the ToC was reconstructed retrospectively there was no set of readily available monitoring data, for example as based on indicators in a LogFrame. Instead, the main source of evidence has been the qualitative data gathered from interviews with a large and diverse number of stakeholders. This approach has not met with approval in some quarters, which then prompted me to think how you could solve this problem. I should start by saying I do like the idea of a retrospectively constructed ToC, so long as there is also some accountability for its transition from any prior form such as a LogFrame.

My first thought was to treat the 30 projects as “units of evidence” and to try to classify them as having achieved, or not achieved, each stage in the ToC. The ToC was in a graphic form, with multiple events happening at different stages, and with lines representing causal links connecting the various events. The problem would then be how to classify each project. One possible means would be to get stakeholders to “success rank” the projects in relation to a given outcome event in the ToC, and then identify a cut-off point in the success ranking representing an acceptable level of achievement. This cut-off point would need to be explained and justified. To start with the focus of these success ranking exercises should be on those events most central to the ToC, i.e. those with many incoming and outgoing causal links. If there was sufficient time we would end up with a percentage (of projects) measure for each outcome, that could if necessary be weighted by scale of expenditure.

We could then turn our attention to examining what happened to the expected causal relationships between the events in the ToC. At this stage it might be possible to do simple 2 x 2 cross tabulations of the relationships between related pairs of events in the ToC, by counting numbers of projects that had achieved both of the events, achieved one but not the other, and achieved none at all. A Chi Square test could tell us how significant the observed relationship was.

There is however a potential difficulty with the next step in the process, which is to look at the larger picture of how all of the events are causally linked together to make the whole ToC work as expected. Event A may be linked to Event B by project X (and others) achieving both, and Event B may be linked to Event C by project Y (and others) achieving both. But if project X and Y are in different locations then the connection between Event A and C would seem very questionable, because the two Event Bs probably also happened in two different locations. In the worst case we could end up with a ToC where many of the individual casual links were working as expected, but where there was no evidence of the whole set of causal links working together. At the least we would have to investigate the relationships between the projects that created the larger causal pathway. In the imagined example above, we would need to look at projects X and Y and how their achievements of Event B inter-related.

This is, and sounds, complex. One means of simplification would be to identify and focus on the most important causal pathway in the ToC. In my colleagues’ ToC there were at least two major pathways and a number of minor variations. Identifying points of failure in the network of events would be one way forward. This could be done in two stages: 1. Success ranking might show that some outcome events in the ToC were not satisfactorily achieved by any projects. In that case the pathway it belonged to would be broken, 2. Chi square tests might show that although some pairs of events both took place, there was no significant association between them (in the form of the number of projects achieving both). Again the pathway they were part of would be broken.

This reflection on causal pathways make me think that the challenge of having to sort out “the attribution problem” for the final event in the ToC might be what could be called “a problem we would like to have”. It would presume that we have already established a plausible pathway of influence of our own activities. The discussion above suggests it may not be so easy in some cases.


  1. Your description is worthy of a thoughtful response. I think many who work with M&E systems to meet the needs of multiple stakeholders (both local and international) of international development programs face the kinds of challenges you describe - particularly if you are working through partners. There have to be many lessons learned that can be shared. I will do my best to send you a description of our experiences with applying a theory of change for survivors of trauma due to war and conflict in 10 or more countries and trying to measure results against a conceptual framework and log frames. I hope others join in so that this topic doesn't just drop off the map. I consider it extremely important for current and future M&E.

  2. This is a very important issue. Sue Funnell and I used an overall program theory for an evaluation of 635 diverse projects funded under a program for community and family strengthening, which was developed before the evaluation began and then adapted during the program and customised for different projects. We used this to synthesise evidence across the projects, which was not as easy as it sounds since the projects were not required to structure their evaluations around this framework, so there was quite a bit of retrofitting of data and gaps. We presented this at an AES conference and the slides are available at I'd be interested in any comments.
    Patricia Rogers
    CIRCLE at RMIT University, Melbourne Australia

  3. Hi Patricia
    Thanks for your comment and the pdf
    Re "which was developed before the evaluation began and then adapted during the program and customised for different projects." If the program theory was customised for the 635 projects (or sample thereof) how did you then aggregate your judgements about the performance of these different projects?