I have been reading Eric Siegel's book on Predictive Analytics. Though it is a "pop science" account, with the usual "this will change the world" subtitle, it is definitely a worthwhile read.
In chapter 7 he talks the reader through what are called "uplift models", which are Decision Tree models that can not only differentiate groups who respond differently to an intervention, but how much differently when compared to a control group where there is no intervention. All this is in the context of companies marketing their products to the population at large, not the world of development aid organisations.
(Temporarily putting aside the idea of uplift models...) In this chapter he happens to use the matrix below, to illustrate the different possible sets of consumers that exist, given two types of scenarios that can be found where both a control and intervention group are being used.
But what happens if we re-label the matrix, using more development project type language? Here is my revised version below:
Looking at this new matrix it struck me that evaluators of development projects may have a relatively improverished view of the potential uses of control groups. Normally the focus is on the net difference in the improvement, between households in the control and intervention groups: How big is it and is it statistically significant? In other words, how many of those in the intervention group were really "self-helpers" who would have improved anyway, versus being "Need help'ers" who would not have improved without the intervention.
But this leaves aside two other sets of households who also surely deserve at least equal attention.One are the "hard cases", that did not improve in either setting. Possibly the poorest of the poor. How often are their numbers identified with the same attention to statistical detail? The other are the "Confused", who have improved in the control group, but not in the intervention group. Perhaps these are the ones we should really worry about, or at least be able to enumerate. Evaluators are often asked, in their ToRs, to also give attention to negative project impacts, but how often do we systematically look for such evidence?
Okay, but how will we recognise these groups? One way is to look at the distribution of cases that are possible. Each group can be characterised by how cases are distributed in the control and intervention group, as shown below. The first group (in green) are probably "self-help'ers" because the same proportion also improved in the control group. The second group are more likely to be "need-help'ers" because fewer people improved in the control group. The third group are likely to be the "confused" because more of them did not improve in the intervention group than in the control group. The fourth group are likely to be the "hard cases" if the same high proportion did not improve in the control group either.
How do we find if and where the other groups are? One way of doing this is to split the total population into sub-groups, using one household attribute at a time, to see what difference it makes to the distribution of results. For example, I thought that household’s wealth ranking might be associated with differences in outcomes. So I examined the distribution of outcomes for the poorest and least poor of the four wealth ranked groups. In the poorest group, those who benefited were the “need-help’ers” , but in the “Well-Off” group those who benefited were the “self-help’ers”, perhaps as expected
There are still the two other kinds of outcomes that might exist in some sub-groups – the “hard cases” and the “confused” How can we find where they are? At this point my theory-directed search fails me. I have no idea where to look for them. There are too many household attributes in the data set to consider manually examining how different their particular distribution of outcomes is from the aggregate distribution.
This is the territory where an automated algorithm
would be useful. Using one attribute at a time, it would split the main
population into two sub-groups, and search for the attribute that made the
biggest difference. The difference to look for would be extremity of range,
as measurable by the Standard Deviation. The
reason for this approach is that the most extreme range would be where one cell in the control
group was 100 and the other was 0, and similarly in the intervention group.
These would be pure examples of the four types of outcome distributions shown
above. [Note that in the two wealth ranked sub-groups above, the Standard Deviation of the distributions was 14% and 15% versus 7% in the whole group]
This is the same sort of work that a Decision Tree algorithm
does, except Decision Trees usually search for binary outcomes and use different “splitting”
criteria. I am not sure if they can use the Standard Deviation, or if they can use a
another measure which would deliver the same results (i.e. identify four possible types
of outcomes).