Tuesday, May 19, 2015

How to select which hypotheses to test?

I have been reviewing an evaluation that has made use of QCA (Qualitative Comparative Analysis). An important part of the report is the section on findings, which lists a number of hypotheses that have been tested and the results of those tests. All of these are fairly complex, involving a configuration of different contexts and interventions, as you might expect in a QCA oriented evaluation.  There were three main hypotheses, which in the results section were dis-aggregated into six more specific hypotheses. The question for me, which has  much wider relevance, is how do you select hypotheses for testing, given limited time and resources available in any evaluation?

The evaluation team have developed three different data sets, each will 11 cases, and with 6, 6 and 9 attributes of these cases (shown in columns), known as "conditions" in QCA jargon. This means there are 26 + 26  + 29 = 640  possible combinations of these conditions that could be associated with and cause the outcome of interest. Each of the hypotheses being explored by the evaluation team represents one of these configurations. In this type of situation, the task of choosing an appropriate hypotheses seems a little like looking for a needle in a haystack

It seems there are at least three options, which could be combined. The first is to review the literature and find what claims (supported by evidence) are made there about "what works" and select from these those that are worth testing e.g. one that seems to have wide practical use, and/or one that could have different and significant program design implications if it is right or wrong. This seems to be the approach that the evaluation team has taken, though I am not so sure to what extent they have used the programming implications as an associated filter.

The second approach is to look for constituencies of interest among the staff of the client who has contracted the evaluation.There have been consultations, but it is not clear what sort of constituencies each of the tested hypotheses have. There were some early intimations that some of the hypotheses that were selected are not very understandable. That is clearly an important issue, potentially limiting the usage of the evaluation findings.

The third approach is an inductive search, using QCA or other software, for configurations of conditions associated with an outcome that have both high level of consistency (i.e. they are always associated with the presence (or the absence ) of an outcome) and  coverage (i.e. they apply to a large proportion of the outcomes of interest). In their barest form these configurations can be be considered as hypotheses. I was surprised to find that this approach had not been used, or at least reported on, in the evaluation report I read. If it had been used but no potentially useful configurations found then this should have been reported (as a fact, not a fault).

Somewhat incidentally, I have been playing around with the design of an Excel worksheet and managed to build in a set of formula for automatically testing how well different configurations of conditions of particular interest (aka hypotheses) account for a set of outcomes of interest, for a given data set. The tests involve measures taken from QCA (consistency and coverage, as above) and from machine learning practice (known as a Confusion Matrix). This set-up provides an opportunity to do some quick filtering of a larger number of hypotheses than an evaluation team might initially be willing to consider (i.e. the 6 above). It would not be as efficient a search as the QCA algorithm, but it would however be a search that could be directed according to specific interest. Ideally this directed search process would identify configurations that are both necessary and sufficient (for more than a small minority of outcomes). A second best result would be those that are necessary but insufficient, or vice versa. (I will elaborate on these possibilities and their measurement in another blog posting)

The wider point to make here is that with the availability of a quick screening capacity the evaluation team, in its consultations with the client, should then be able to broaden the focus of useful discussions away from what are currently quite specific hypotheses,  and towards the contents of a menu of a limited number of conditions that can not only make up these hypotheses but also other alternative versions. It is the choice of these particular conditions that will really make the difference, to the scale and usability of the results of a QCA oriented evaluation. More optimistically, the search facility could even be made available online, for continued use by those interested in the evaluation results, and their possible variants

The Excel file for quick hypotheses testing is here: http://wp.me/afibj-1ux

No comments:

Post a Comment