Wednesday, October 20, 2010

Counter-factuals and counter-theories

Thinking about the counter-factual means thinking about something that did not happen. So consider a project involving the provision of savings and credit services, with the expectation of reducing levels of poverty amongst the participating households. The counter-factual is the situation where the savings and credit services were not provided. This can either be imagined, or monitored through the use of a control group, which is a group of similar households in a similar context.

In the course of 20 years work on monitoring and evaluation of development aid projects I have only come across one good opportunity to analyse changes in household poverty levels through the comparison of participating and non-participating households (i.e. the so called double difference method: comparing participants and non-participants, before and after the intervention). This was in Can Loc District, Ha Tinh province, in Vietnam. In 1996 ActionAid Vietnam began a savings and credit program in Can Loc. In 1997 I helped them design and implement a baseline survey of almost 600 households, being a 10% sample of the population in three communes of Can Loc District, covering participants and non-participants in the savings and credit services (which reached about 25% of all households). This was done using the Basic Necessities Survey (BNS) , an instrument that I have described in detail elsewhere.

A few years later the responsibility for the project was handed over to a Vietnamese NGO called the Pro-Poor Centre (PPC), which had been formed by ex-Action Aid staff who used to work in Ha Tinh. They continued to manage the savings and credit program over the following years. In 2006, nine years after the baseline survey, an ex-ActionAid staff member who was now working for a foundation in Hanoi, held discussions with the PPC about doing a follow up survey of the Ha Tinh households. I was brought in to assist the re-use of the same BNS instrument as in 1996. At this stage the main interest was simply to see how much households' situations had improved over the nine year period, a period of rapid economic growth throughout much of Vietnam.

The survey went ahead, and was implemented with particular care and diligence by the PPC staff. A copy of the 2006 survey report can be found here (See pages 23-25 especially). Fortunately the PPC had carefully kept hard copy records of the 1996 baseline survey (including the sample frame) and I had also kept digital copies of the data. This meant it was possible to make a number of comparisons:
  • Of households poverty status in 2006 compared to 1997
  • Of changes in the poverty status of households who were and were not participating in the saving and credit program during these periods i.e
    1. Those who had never participated
    2. Those were in (in 1997) but dropped out (by 2006)
    3. Those who were not in (in 1997) but joined later (before 2007)
    4. Those who were always in (in 1997 and 2006)
    Somewhat to my surprise, I found what seemed an ideal set of results. Poverty levels had dropped the most in the 4th group ("always members"), then almost as much in the 2nd group ("ex-members"), less in the 3rd group ("new members") and least in the 1st group ("never members"). The 3rd group might have been expected to have changed less because over the years the project had expanded its coverage to include the less poor, reaching 43% of all households by 2006.

    However, the project's focus on the poorest was also a problem. The members of the savings and credit program had not been randomly chosen, so the control group was not really a control group. They were not comparable. (and I had not heard of, nor still know, how to use the propensity score matching method)

    The alternative to considering the effects of a counter-factual (i.e. a non-intervention) is, I guess, what could be called “counter-theoretical” That is, an alternative theory of what has happened, with the existing intervention.

    My counter-theoretical centered on the idea of dependency ratios - poor families typically have high dependency ratios (i.e. many young children, relatively few adults). As families age this ratio will change, with dependent children growing up and becoming more able bodied and able to take on workloads and or generate income. Even without the access to a savings and credit program, this demographic fact alone might have explained why the participating families did better over the nine year period. It could also explain why the 2nd group did almost as well, if they were selected on the same basis of being the poorest, but had been participants for the shorter period of time.

    What I could have and should have done, was go back to the PPC and see what data they had on the family structure of the interviewed households. It is quite likely they would have the relevant data: ages of all family members, given their close involvement with the community. Unfortunately at that time there was not much interest in the impact assessment aspect of the survey, by either the foundation, the PPC or ActionAid, and their support was necessary for any further analysis. Perhaps I gave up too quickly…

    Nevertheless, reflection on this experience makes me wonder how often it would be well worthwhile, in the absence of good control group data, giving more attention to identifying and testing “counter-theoreticals” about the existing intervention, as part of a more rigorous process of coming to conclusions about impacts.

    PS1 3rd November 2010: I have since recalled that as part of the 2006 survey I met with the staff of ActionAid in Hanoi to explain the survey process and to solicit from them their views on the likely causes of any improvements. The attached file shows two lists, one relating to ActionAid interventions in the district, and the other relating to interventions in the same district by other organisations, including government. Micro-finance was at the top of the list of the ActionAid interventions seen as likely causes of change, but there were 7 others, as well as 12 non-ActionAid interventions that were possible causes. This raises the spectre of 12 possible alternative hypotheses, let alone various combinations of these. One approach I subsequently toyed with for generating composite predictions in this kind of multiple-location/multiple-intervention situation was the "Prediction Matrix".

    PS2 3rd November 2010: The current edition of Evaluation (16(4), 2010 has an article by Nicoletta Stame, titled " What doesn’t work? Three Failures, Many Answers" which includes a section on "Rival Explanations" which I have taken the liberty of copy and pasting below:
    "The link between complexity and causation has been at the centre of evaluation theory ever since and has nurtured thinking about 'plausible rival hypotheses' (Campbell, 1969). Although it was originally treated as a methodological problem of validity, it has recently been revisited from the substantive perspective of programme theory. Commenting on Campbell's interest in 'reforms', that are by definition 'complex social change', Yin contrasts two strategies of Campbell: that of the experimental design and that of using rival explanations. He concludes that the second - as Campbell himself came to admit in Campbell (199a) - is better suited to complex interventions (that are changing and multifaceted), as it is with the complex case studies that have been Yin's turf for a long times (Yin, 2000: 242). The use of rival explanations is common in other crafts journalism, detective work, forensic science and astronomy), where 'the investigator defines the most compelling explanations, tests them by fairly collecting data that can support or refute them, and - given sufficiently consistent and clear evidence - concludes that one explanation but not the others is the most acceptable' (Yin, 2000: 243). These crafts are empirical: their advantage is that while a 'whole host of societal changes may be amenable to empirical investigation', especially those where stakes are currently the highest, they are 'freed from having to impose an experimental design' ('the broader and in fact more common use of rival explanations covers real-life, not craft, rivals', Yin, 2000:248). Nonetheless, rival explanations are by no means alien to evaluation, as is shown by how Campbell himself has offered Pawson good arguments for criticizing the way systematic reviews are conducted (Pawson, 2006)."
    "The problem that remains is how to identify rival explanations. From a methodological starting point, Yin says that 'evaluation literature offers virtually no guidance on how to identify and define real-life rivals'. He proposes a typology of real-life rivals, that can variously relate to targeted interventions, to implementation, to theory to external conditions; and proposes examples of how to deal with them taken from such fields as decline in crime rates, support for industrial development, technological innovations, etc. However, Yin appears to overlook something that had indeed fascinated theory-based evaluation since its first appearance: the possible existence of different theories to explain the working of a programme, and the need to choose among them in order to test them. And - as Patton (1989: 377) has advised - it should be noted that in this way it would be possible to engage stakeholders in conceptualizing their own programme’s theories of action. Nevertheless, Yin’s contribution in its explicitness and methodological “correctness” is an important step forward."
    "Weiss responded to Win’s provocative stance. In an article entitled “What to do until the random assigners come”, she locates Yin’s contribution as the next step beyond Campbell’s ideas about plausible rival hypotheses: “where Campbell focused primarily on rival explanations stemming from methodological artifacts, Yin proposes to identify substantive rival explanations” (Weiss, 2002: 217). She describes the process whereby the evaluator “looks around and collects whatever information and qualitative data are relevant to the issue at hand” (2002: 219), in order to see “whether any [other factor, such as other programs or policies, environmental conditions, social and cultural conditions] could have brought about the kinds of outcomes that the target program was trying to affect”, thus setting up systematic inquires into the situation. Weiss concludes that alternative means to random assignment in order to solve the causality dilemma can be a “a combination of Theory-Based Evaluation and Ruling-Out” (the rival explanation)."
    I recommend the whole article...

    PS3 6th December. On re-reading this post, especially Nicoletta's quote, I wondered about the potential usefullness of the "Evolving Storylines" method I developed some years ago. It could be used as a means of developing a small range of alternative histories of a project, that could then each be subject to some testing (by focusing on the most vulnerable point in each story)

    1 comment:

    1. I asked Howard White for possible references to similar approaches to mine above, and he replied as follows:

      Hi Rick

      It sounds to me a similar approach to Scriven's General Elimination Methodology since GEM identifies other possible causes of the observed change in outcomes, and discusses whether they could be responsible - see for example Scriven, Michael (2008) ‘A Summative Evaluation of RCT Methodology: & An Alternative Approach to Causal Research’ Journal of MultiDisciplinary Evaluation 5(9): 15-24.. Of course that is about the intervention versus other possible causes., whereas it sounds like you want to test competing ideas about what happened inside the intervention. When discussing theory based designs I do mention the possibility of counter theories (which would tend to suggest the desired outcome is not observed); see Carvalho and White 2004 American Journal of Evaluation.

      Hope that is of some help
      Best wishes