(Updated 2015 05 25)
When such a data set is available a search can be made for configurations of conditions that are strongly associated with an outcome of interest. This can be done inductively or deductively. Inductive searches involve the uses of systematic search processes (aka algorithms), of which there are a number available. QCA uses the Quine–McCluskey algorithm. Deductive searches involve the development of specific hypotheses from a body of theory, for example about the relationship between the context, intervention and outcome.
Regardless of which approach is used, the resulting claims of association need evaluation. There are a number of different approaches to doing this that I know of, and probably more. All involve, in the simplest form, the analysis of a truth table in this form:
In this truth table the cell values refer to the number of cases that have each combination of configuration and outcome. For further reference below I will label each cell as A and B (top row) and C and D (bottom row)
The first approach to testing is a statistical approach. I am sure that there are a number of ways of doing this, but the one I am most familiar with is the widely used Chi-Square test. Results will be seen as most statistically significant when all cases are in the A and D cells. They will be least significant when they are equally distributed across all four cells.
The second approach to testing is the one used by QCA. There are two performance measures. One is Consistency, which is the proportion of all cases where the configuration is present and the outcome is also present (=A/(A+B)). The other is Coverage, which is the proportion of all outcomes that are associated with the configuration (=(A/(A+C)).
When some of the cells have 0 values three categorical judgements can also be made. If only cell B is empty then it can be said that the configuration is Sufficient but not Necessary. Because there are still values in cell C this means there are other ways of achieving the outcome in addition to this configuration.
If only cell C is empty then it can be said that the configuration is Necessary but not Sufficient. Because there are still values in cell B this means there are other additional conditions that are needed to ensure the outcome.
If cells B and C are empty then it can be said that the configuration is both Necessary and Sufficient
In all three situations there only needs to be one case to be found in a previously empty cell(s) to disprove the standing proposition. This is a logical test, not a statistical test.
The third approach is one used in the field of machine learning, where the above matrix is known as a Confusion Matrix. Here there is a profusion of performance measures available (at least 13). Some of the more immediately useful measures are:
- Accuracy: (A+D)/(A+B+C+D), which is similar to but different from the Chi Square measure above
- True Positives: A/(A+B), which corresponds to QCA consistency
- True Negatives: D/(B+D)
- False Positives: C/(A+C)
- False Negatives: B/(B+D)
- Positive predictive value: A/(A+C), which corresponds to QCA coverage
- Negative predictive value: D/(C+D)
If you want to find how "robust" a hypothesis is, you could calculate the diversity present in the configurations of all the cases covered by the hypothesis (i.e. not just the attributes specified by the hypotheses, which will be all the same). If that percentage is large this suggests the hypothesis works in a greater diversity of circumstances, a feature that could be of real practical value.
This notion of diversity is to some extent implicit in the Coverage measure. More coverage implies more diversity of circumstances. But two hypotheses with the same coverage (i.e. proportion of cases they apply to) could be working in circumstances with quite different degrees of diversity (i.e. the cases covered were much more diverse in their overall configurations).
Similarity: Each row in a QCA like data set is a string of binary values. The similarity of these configurations of attributes can be measured in a number of ways:
- Jaccard index, the proportion of all instances in two configurations where the binary value 1 is present in the same position i.e. the same attribute is present.
- Hamming distance, the number of positions at which the corresponding values in two configurations are different. This includes the values 0 and 1, whereas Jaccard only looks at 1 values
- If you want to find a "representative" case in a data set, you would look for the case with the lowest average Hamming distance in the whole data set
- If you wanted to compare the two most similar cases, you would look for the pair of cases with the lowest Hamming distance.
Applying the criteria: My immediate interest is in the use of these kinds of tests for two evaluation purposes. The first is selective screening of hypotheses about causal configurations that are worth more time intensive investigations, an issue raised in a recent blog.
- Configurations that are Sufficient and not Necessary or Necessary but not Sufficient.
- Among these, configurations which were Sufficient but not Necessary, and with high coverage should be selected,
- And configurations which were Necessary but not Sufficient, and with high consistency, should also be selected.
- Plus all configurations that were Sufficient and Necessary (which are likely to be less common)
Having done the filtering suggested above, the following kinds of within-case investigations would seem useful:
- Are there any common casual mechanisms underlying all the cases found to be Necessary and Sufficient, i.e those within cell A?
- A good starting point would be a case within this set of cases that had the lowest average Hamming distance, i.e. one with the highest level of similarity with all the other cases.
- Once one or more plausible mechanism were discovered in that case a check could be made to see if they are present in other cases in that set, this could be done in two ways: (a) incrementally, by examining adjacent cases, i.e cases with the lowest Hamming distance from the representative case, (b) by partitioning the rest of the cases, and examining a case with a median level Hamming distance, i.e. half way between being the most similar and most different cases.
- Where the configuration is Necessary but not Sufficient, how do the cases in cell B differ from those in cell A, in ways that might shed more light on how the same configuration leads to different outcomes? This is what has been called a MostSimilarDifferentOutcome (MSDO) comparison,
- If there are many cases this could be quite a challenge, because the cases could differ on many dimensions (i.e. on many attributes). But using the Hamming distance measure we could make this problem more manageable by selecting a case from cell A and B that had the lowest possible Hamming distance. Then a within-case investigation could find additional undocumented differences that account for some or all of the difference in outcomes.
- That difference could then be incorporated into the current hypothesis (and data set) enabling more cases from cell B to now be found in cell A i..e Consistency would be improved
- Where the configuration is Sufficient but not Necessary,in what ways are the cases in cell C the same as those in cell A, in ways that might shed more light on how the same outcome is achieved by different configurations? This is what has been called a MostDifferentSimilarOutcome (MDSO) comparison,
- As above, if there are many cases this could be quite a challenge. Here I am less clear, but my current inclination would be to find cases in cells A and C that were most similar, in terms of Hamming distance. Then a within-case investigation may find undocumented similarities that account for some of the similar outcomes
- That difference could then be incorporated into the current hypothesis (and data set) enabling more cases from cell C to now be found in cell A i..e Coverage would be improved