I was recently asked whether that EvalC3 could be used to do a synthesis study,
analysing the results from multiple evaluations. My immediate response was yes, in principle. But it probably needs more thought.
I then recalled that I had seen somewhere an Oxfam synthesis
study of multiple evaluation results that used QCA. This is the reference, in case you want to
read, which I suggest you do.
Shephard, D., Ellersiek, A.,
Meuer, J., & Rupietta, C. (2018). Influencing Policy and Civic Space: A
meta-review of Oxfam’s Policy Influence, Citizen Voice and Good Governance
Effectiveness Reviews | Oxfam Policy & Practice. Oxfam. https://policy-practice.oxfam.org.uk/publications/*
Like other good examples of QCA analyses in practice, this
paper includes the original data set in an appendix, in the form of a truth
table. This means it is possible for
other people like me to reanalyse this data using other methods that might be
of interest, including EvalC3. So, this
is what I did.
The Oxfam dataset includes five conditions a.k.a. attributes
of the programs that were evaluated. Along
with two outcomes each pursued by some of the programs. In total there was data on the attributes and
outcomes of twenty-two programs concerned with expanding civic space and
fifteen programs concerned with policy influence. These were subject to two different QCA analyses.
The analysis of civic space outcomes
In the Oxfam analysis of the fifteen programs concerned with expanding civic space, QCA analysis found four “solutions” a.k.a. combinations of
conditions which were associated with the outcome of successful civic space. Each of these combinations of conditions was
found to be sufficient for the outcome to occur. Together they accounted for the outcomes
found in 93% or fourteen of the fifteen cases.
But there was overlap in the cases covered by each of these solutions,
leaving the question open as to which solution best fitted/explained those
cases. Six of the fourteen cases had two
or more solutions that fitted them.
In contrast, the EvalC3 analysis found two predictive models
(=solutions) which are associated with the outcome of expanded civic space. Each of these combinations of conditions was
found to be sufficient for the outcome to occur. Together they accounted for all fifteen cases
where the outcome occurred. In addition,
there was no overlap in the cases covered by each of these models.
The analysis of policy influencing outcomes
in the Oxfam analysis of the twenty-two programs concerned
with policy influencing the QCA analysis found two solutions associated with
the outcome of expanding civic space. Each of these was sufficient for the outcome,
and together they accounted for all the outcomes. But there was some overlap in coverage, one
of the six cases were covered by both solutions.
In contrast, the EvalC3 analysis found one predictive model
which was necessary and sufficient for the outcome, and which accounted for all
the outcomes achieved.
Conclusions?
Based on parsimony alone, the EvalC3 solutions/predictive
models would be preferable. But
parsimony is not the only appropriate criteria to evaluate a model. Arguably a more important criterion is the
extent which a model fits the details of the cases covered when those cases
are closely examined. So, really what the EvalC3
analysis has done is to generate some extra models that need close attention,
in addition to those already generated by the QCA analysis. The number of cases covered by multiple models
is been increased.
In the Oxfam study, there was no follow-on attention given to
resolving what was happening in the cases that were identified by more than one
solution/predictive model. In my
experience of reading other QCA analyses, this lack of follow-up
is not uncommon.
However, in the Oxfam study for each of the solutions found at least one
detailed description was given of an example case that that solution
covered. In principle, this is good practice. But unfortunately, as far as I
can see, it was not clear whether that particular case was exclusively covered by
that solution, or part of a shared solution.
Even amongst those cases which were exclusively covered by a solution
there are still choices that need to be made (and explained) about how to
select particular cases as exemplars and/or for a detailed examination of any
causal mechanisms at work.
QCA software does not provide any help with this task. However, I did find some guidance in
specialist text on QCA: Schneider, C.
Q., & Wagemann, C. (2012). Set-Theoretic Methods for the Social Sciences: A
Guide to Qualitative Comparative Analysis. Cambridge University Press. https://doi.org/10.1017/CBO9781139004244
(it’s a heavy read in part of this book but overall, it is very informative). In section 11.4 titled Set-Theoretic Methods
and Case Selection, the authors note ‘Much emphasis is put on the importance
of intimate case knowledge for a successful QCA. As a matter of fact, the idea of QCA is a
research approach and of going back-and-forth between ideas and evidence
largely consists of combining comparative within case studies and QCA is a
technique. So far, the literature has
focused mainly on how to choose cases prior to and during but not after a QCA
were by QCA we here refer to the analytic moment of analysing a truth table it
is therefore puzzling that little systematic and specific guidance has so far
been provided on which cases to select for within case studies based on the
results of i.e. after a QCA… ‘The authors then go on to provide some
guidance (a total of 7 pages of 320).
In contrast to QCA software, EvalC3 has a number of built-in tools and some
associated guidance on the EvalC3 website, on how to think about case selection
as a step between cross-case analysis and subsequent within-case analysis. One of the steps in the seven-stage EvalC3
workflow (Compare Models) is the generation of a table that compares the case
coverage of multiple selected alternative models found by one’s analysis to
that point. This enables the
identification of cases which are covered by two or more models. These types of cases would clearly warrant
subsequent within-case investigation.
Another step in the EvalC3 workflow called Compare Cases,
provides another means of identifying specific cases for follow-up within-case
investigations. In this worksheet
individual cases can be identified as modal or extreme examples within various
categories that may be of interest e.g. True Positives, False Positives, et
cetera. It is also possible to identify
for a chosen case what other case is most similar and most different to that
case, when all its attributes available in the dataset are considered. These measurement capacities are backed up by
technical advice on the EvalC3 website on the particular types of questions that
can be asked in relation to different types of cases selected on the basis of
their similarities and differences. Your comments on these suggested strategies would be very welcome.
I should explain...
...why the models found by EvalC3 were different from those found from the QCA analysis. QCA software finds solutions i.e. predictive models by reducing all the configurations found in a truth table down to the smallest possible set, using what is known as a minimisation algorithm called the Quine McCluskey algorithm.
In contrast, EvalC3 provides users with a choice of four different search algorithms combined with multiple alternative performance measures that can be used to automatically assess the results generated by those search algorithms. All algorithms have their strengths and weaknesses, in terms of the kinds of results they can find and cannot find, including the QCA Quine McCluskey algorithm and the simple machine learning algorithms built into EvalC3. I think the McCluskey algorithm has particular problems with datasets which have limited diversity, in other words, where cases only represent a small proportion of all the possible combinations of the conditions documented in the dataset. Whereas the simple search algorithms built into EvalC3 don't experience that is a difficulty. This is my conjecture, not yet rigorously tested.
[In the above data set the cases represented the two data sets analysed represented 47% and 68% of all the possible configurations given the presence of five different conditions]
While EvalC3 results described above did differ from the QCA analyses, they were not in outright contradiction. The same has been my experience when I reanalysed other QCA datasets. EvalC3 will either simply duplicate the QCA findings or produce variations on those, and often those which are better performing.