I have been reading Eric Siegel's book on Predictive Analytics. Though it is a "pop science" account, with the usual "this will change the world" subtitle, it is definitely a worthwhile read.
In chapter 7 he talks the reader through what are called "uplift models", which are Decision Tree models that can not only differentiate groups who respond differently to an intervention, but how much differently when compared to a control group where there is no intervention. All this is in the context of companies marketing their products to the population at large, not the world of development aid organisations.
(Temporarily putting aside the idea of uplift models...) In this chapter he happens to use the matrix below, to illustrate the different possible sets of consumers that exist, given two types of scenarios that can be found where both a control and intervention group are being used.
But this leaves aside two other sets of households who also surely deserve at least equal attention.One are the "hard cases", that did not improve in either setting. Possibly the poorest of the poor. How often are their numbers identified with the same attention to statistical detail? The other are the "Confused", who have improved in the control group, but not in the intervention group. Perhaps these are the ones we should really worry about, or at least be able to enumerate. Evaluators are often asked, in their ToRs, to also give attention to negative project impacts, but how often do we systematically look for such evidence?
Okay, but how will we recognise these groups? One way is to look at the distribution of cases that are possible. Each group can be characterised by how cases are distributed in the control and intervention group, as shown below. The first group (in green) are probably "self-help'ers" because the same proportion also improved in the control group. The second group are more likely to be "need-help'ers" because fewer people improved in the control group. The third group are likely to be the "confused" because more of them did not improve in the intervention group than in the control group. The fourth group are likely to be the "hard cases" if the same high proportion did not improve in the control group either.
How do we find if and where the other groups are? One way of doing this is to split the total population into sub-groups, using one household attribute at a time, to see what difference it makes to the distribution of results. For example, I thought that household’s wealth ranking might be associated with differences in outcomes. So I examined the distribution of outcomes for the poorest and least poor of the four wealth ranked groups. In the poorest group, those who benefited were the “need-help’ers” , but in the “Well-Off” group those who benefited were the “self-help’ers”, perhaps as expected
There are still the two other kinds of outcomes that might exist in some sub-groups – the “hard cases” and the “confused” How can we find where they are? At this point my theory-directed search fails me. I have no idea where to look for them. There are too many household attributes in the data set to consider manually examining how different their particular distribution of outcomes is from the aggregate distribution.