Friday, March 28, 2014

The challenges of using QCA



This blog posting is a response to my reading of the Inception Report written by the team who are undertaking a review of evaluations of interventions relating to violence against women and girls. The process of the review is well documented in a dedicated blog – EVAW Review

The Inception Report is well worth reading, which is not something I say about many evaluation reports! One reason is to benefit from the amount of careful attention the authors have given to the nuts and bolts of the process. Another is to see the kind of intensive questioning the process has been subjected to by the external quality assurance agents and the considered responses by the evaluation team. I found that many of the questions that came to my mind while reading the main text of the report were dealt with when I read the annex containing the issues raised by SEQUAS and the team’s responses to them.

I will focus on one issue that is challenge for both QCA and data mining methods like Decision Trees (which I have discussed elsewhere on this blog). That is the ratio of conditions to cases. In QCA conditions are attributes of the cases under examination that are provisionally considered as possible parts of causal configurations that explain at least some of the outcomes. After an exhaustive search and selection process the team has ended up with a set of 39 evaluations they will use as cases in a QCA analysis. After a close reading of these and other sources they have come up with a list of 20 conditions that might contribute to 5 different outcomes. With 20 different conditions there are 220 (i.e. 1,048,576) different possible configurations that could explain some or all of the outcomes. But there are only 39 evaluations, which at best will represent only 0.004% of the possible configurations. In QCA the remaining 1,048,537 are known as “logical remainders”. Some of these can usually be used in a QCA analysis through a process using explicit assumptions e.g. about particular configurations plus outcomes which by definition would be impossible to occur in real life. However, from what I understand of QCA practice, logical remainders would not usually exceed 50% of all possible configurations.

The review team has dealt with this problem by summarising the 20 conditions and 5 outcomes into 5 conditions and one outcome. This means there are 25 (i.e. 32) possible causal configurations, which is more reasonable considering there are 39 cases available to analyse. However there is a price to be paid for this solution, which is the increased level of abstraction/generality in the terms used to describe the conditions. This makes the task of coding the known cases more challenging and it will make the task of interpreting the results and then generalising from them more challenging as well. You can see the two versions of their model in the diagram below, taken from their report.
 
What fascinated me was the role of evaluation method in this model (see “Convincing methodology”). It is only one of five conditions that could explain some or all of the outcomes. It is quite possible therefore that all or some of the case outcomes could be explained without the use of this condition. This is quite radical, considering the centrality of evaluation methodology in much of the literature on evaluations. It may also be worrying to DFID in that one of their expectations of this review was it would “generate a robust understanding of the strengths, weaknesses and appropriateness of evaluation approaches and methods”. The other potential problem is that even if methodology is shown to be an important condition, its singular description does not provide any means to discriminating between forms which are more or less helpful.

The team seems to have responded to this problem by proposing additional QCA analyses, where there will be an additional condition that differentiates cases according to whether they used qualitative or quantitative methods.  However reviewers have still questioned whether this is sufficient. The team in return have commented that they will “add to the model further conditions that represent methodological choice after we have fully assessed the range of methodologies present in the set, to be able to differentiate between common methodological choices” It will be interesting to see how they go about doing this, while avoiding the problem of “insufficient diversity” of cases already mentioned above.

One possible way forward has been illustrated in a recent CIFOR Working Paper (Sehring et al, 2013) and which is also covered in Schneider and Wagemann (2012). They have illustrated how it is possible to do a “two-step QCA”, which differentiates between remote and proximate conditions. In the VAWG review this could take the form of an analysis of conditions other than methodology first, then a second analysis focusing on a number of methodology conditions. This process essentially reduces a larger number of remote conditions down to a smaller number of configurations that do make a difference to outcomes, which are then included in the second level of the analysis which uses the more proximate conditions. It has the effect of reducing the number of logical remainders. It will be interesting to see if this is the direction that the VAWG review team are heading.

PS 2014 03 30: I have found some further references to two-level QCA:
 And for people wanting a good introduction to QCA, see

Monday, January 13, 2014

Thinking about set relationships within monitoring data


I have just re-read Howard White's informative blog posting on "Using the causal chain to make sense of the numbers" which refers to what he calls "the funnel of attrition". I have reproduced a copy of his diagram here, one which represents the situation in an imaginary project. He uses the diagram to emphasis the need to do basic analyses of implementation data (informed by a theory of change) before launching into sophisticated analyses of relationships between outputs and impacts.
The same set of data can be represented using a Venn diagram, to show the relationship between these 8 sets of people, as shown in this truncated version below:


Venn diagrams like these can also be read as describing relationships of necessity and sufficiency. According to the above diagram, knowing about the interventions is a necessary condition of taking part in the intervention. There are no cases (in the above sets) where people have taken part without already knowing about the intervention. 

However, it is conceivable that some people could be assigned to an intervention without knowing about it in advance and making their own choice. In that case the set relationships could look more like the diagram below (yellow being participants who were assigned without any prior knowledge). Here the key change is the overlap in their memberships, the actual numbers of people could well be the same.


Its possible to imagine other complexities to this model. For example, some people may change their behavior without necessarily changing their attitudes beforehand, because of compulsion or pressure of some kind. So the revised model might look more like this...(brown being participants changing their behavior due to compulsion)


In both these examples above, what was a necessary condition has becomes a sufficient condition. Knowing about an intervention is sufficient to enable a person to participate in the intervention, but it is not the only way. People can also be assigned to the intervention. Similarly, changing their attitudes is one means whereby a person will change their behavior but behavior may also be changed through other means e.g. compulsion.

The point of these two examples is that when monitoring implementation it is not good enough to simply record and compare the relative numbers who belong to each consecutive group in the "funnel of attrition" . Doing so implies the Theory of Change (or Theory of Action, as some people might prefer to call this) is the only (i.e. necessary) means by which a desired outcome can occur, which seems highly unlikely. Instead, what is needed is a comparison of the membership relationships between one set and the next, to identify whether other conditions might also be sufficient for the expected change to happen. This can be done using nothing more complicated than cross-tabulations in Excel.

But this view does have significant implications for how we monitor project interventions. It means it is not good enough to simply track numbers of people participating in various activities. In order to identify possible relationships of necessity and sufficiency between these events we need to know who participated in each activity, so we can identify the extent to which membership in one set overlapped with another. In my experience this level of implementation monitoring is not very common.

PS: For more reading on set relationships and concepts of sufficient and necessary causal conditions, I highly recommend: 

Goertz, Gary, and James Mahoney. 2012. A Tale of Two Cultures: Qualitative and Quantitative Research in the Social Sciences. Princeton University Press. http://www.amazon.co.uk/Tale-Two-Cultures-Qualitative-Quantitative/dp/0691149712/ref=sr_1_1?ie=UTF8&qid=1353850106&sr=8-1.


Saturday, October 26, 2013

Complex Theories of Change: Recipes for failure or for learning?


The diagram below is a summary of a Theory of Change for interventions in the education sector in Country X. It did not stand on its own, it was supplemented by an extensive text description.



Its complex in the sense that there are many different parts to it and many interconnections between them, including some feedback loops. It seems realistic in the sense of capturing some of the complexity of social change. But it may be unrealistic if it is a prescription for achieving change. Whether it is the later depends on how we interpret the diagram, which I discuss below.

One way of viewing the Theory of Change is in terms of conditions (the elements in the diagram) that may or may not be necessary and/or sufficient for the final outcome to occur. The ideas of necessary and/or sufficient causal conditions are central to the notion of “configurational” models of causation, described by Mahoney and Goertz (2012) and others. A configuration is a set of conditions that may be either sufficient or necessary for an outcome e.g. Condition X + Condition T + Condition D + Condition P -> Outcome. This is in contrast to simpler notions of an outcome having a single cause e.g. Condition T -> Outcome.

The philosopher John Mackie (1974) argued that most of the “causes” that we talk about in everyday life are what are called INUS causes. That is, they are about a condition that is an Insufficient but Necessary part of a configuration of conditions but one which is Unnecessary but Sufficient for an outcome to occur. For example, smoking is a contributory cause of lung cancer, but it is neither necessary nor sufficient to get cancer. There are other ways of getting cancer and all smokers do not get cancer.


The interesting question for me is whether the above Theory of Change represents one or more than one causal configuration. I look at both possibilities and their implications.

If the Theory of Change represents a single configuration then each element, such as “More efficient management of teacher recruitment and deployment”, would be insufficient by itself, but a necessary part of the whole configuration. In other words, every element in the Theory of Change has to work or else the outcome won’t occur. This is quite a demanding expectation. The more complex this “single configuration” model becomes (i.e. by having more conditions), the more vulnerable it will becomes to implementation failure, because even if only part does not work, the whole process will fail. One saving grace is that it would be relatively easy to test this kind of theory. In any locations where the outcome did occur it would be expected that all elements would be present. If some were not, then the missing elements would not qualify as insufficient but necessary conditions.

 The alternative perspective is to see the above Theory of Change as representing multiple causal configurations i.e. multiple possible combinations of conditions, each of which can lead to the desired outcome. So any condition, again such as “More efficient management of teacher recruitment and deployment” may not be necessary under all circumstances. Instead it may be insufficient but necessary part of one of the configurations, but not the others. Viewed from this perspective, the Theory of Change seems less doomed to implementation failure, because there is more than one route to success.

However if there are multiple routes the challenge is then how to identify the different configurations that may be associated with successful outcomes. As it stands the current Theory of Change gives little guidance. Like many Theory of Change at this macro-level / sector perspective it tends towards showing “everything connected to everything”. In fact this limitation seems unavoidable, because with increasing scale there is often a corresponding increase in the diversity of actors, interventions and contexts. In such circumstances there are likely to be many more causal pathways at work. This view suggests that at such a macro level it might be more appropriate for a Theory of Change to initially have relatively modest ambitions and to limit itself to identifying the conditions that are likely to be involved in the various causal configurations.

The focus then would move to on what can be done through subsequent monitoring and evaluation efforts. This could involve three tasks: (a) Identifying where the outcomes have and have not occurred, (b) identifying how they differed in terms of the configuration of conditions that were associated with the outcomes (and absent where the outcomes did not occur). This would involve across-case comparisons. (c) Establishing plausible causal linkages between the observed conditions within each configuration. This would involve within-case analyses. Ideally, the overall findings about the configurations involved would help ensure the sustainability and replicability of the expected outcomes.

The Theory of Change will still be useful in as much as it successfully anticipates the various conditions making up the configurations associated with outcomes, and their absence. It will be less useful if it has omitted many elements, or included many that are irrelevant. Its usefulness could actually be measured! Going back to the recipe metaphor in the title, a good Theory of Change will have at least an appropriate list of ingredients but it will be really up to subsequent monitoring and evaluation efforts to identify what combinations of these produce the best results and how they do so (e.g. by looking at the causal mechanisms connecting these elements).

Some useful references to follow up:
Causality for Beginners, Ray Pawson, 2008
Qualitative Comparative Analysis, at Better Evaluation
Process Tracing, at Better Evaluation
Generalisation, at Better Evaluation

Postscript:

I have just read Owen Barder's review of Ben Ramalingam's new book "Aid on the Edge of Chaos" In that review he makes two comments that are relevant to the argument presented above:
"As Tim Harford showed in his book Adapt, all successful complex systems are the result of adaptation and evolution.  Many in the world of development policy accepted intellectually the story in Adapt but were left wondering how they could, practically and ethically, manage aid projects adaptively when they were dealing with human lives"
"Managing development programmes in a complex world does not mean abandoning the drive to improve value for money. Iteration and adaptation will often require the collection of more data and more rigorous analysis - indeed, it often calls for a focus on results and 'learning by measuring' which many people in development may find uncomfortable."
The point made in the last paragraph about requiring the collection of more data needs to be clearly recognised, as early as possible. Where there are likely to be many possible causal relationships at work, and few if any of these can be confidently hypothesised in advance, the coverage of data collection will need to be wider. Data collection (and then analysis) in this situation is like casting a net onto the waters, albeit still with some idea of where the fish may be. The net needs to be big enough to cover the possibilities.