This blog is a continuation of a dialogue that is based on Michaela Raab and Wolfgang Stuppert's EVAW blog. I would have preferred to post my response below via their blog's Comment facility, but it cant cope with long responses or hypertext links. They in turn have had difficulty posting comments on my YouTube site where this EES presentation (Triangulating the results of Qualitative Comparative Analyses (EES Dublin 2014) can be seen. It was this presentation that prompted their response here on their blog.
Hi Michaela and Wolfgang
Thanks for going to the trouble of responding in detail to
my EES presentation.
Before responding in detail I should point out to readers
that the EES presentation
was on the subject of triangulation, and how to compare QCA and Decision Tree
models, when applied to the same data set. In my own view I think it is
unlikely that either of these methods will produce the “best” results in all
circumstances. The interesting challenge is to develop ways of thinking about
how to compare and choose between specific models generated by these, and what may be other
comparable methods of analysis. The penultimate slide (#17) in the presentation highlights
the options I think we can try out when faced with different kinds of
differences between models.
The rest of this post responds to particular points that have been made by Michaela and Wolfgang, and then makes a more general conclusion.
The rest of this post responds to particular points that have been made by Michaela and Wolfgang, and then makes a more general conclusion.
Re “1. The
decision tree analysis is not based on the same data set as our
QCA” This is correct. I was in a bit of a quandary because while the
original data set was fuzzy set (i.e. there intermediate values between 0 and
1) the solutions that were found were described in binary form i.e. the
conditions and outcomes either were or were not present. I did produce a
Decision Tree with the fuzzy set data but I had no easy means of comparing the
results with the binary results of the QCA model. That said, Michaela and
Wolfgang are right in expecting that such a model would be more complex and
have more configurations.
Re “2. Decision tree
analysis is compared with a type of QCA solution that is not meant to maximise
parsimony.” I agree that “If the
purpose was to compare the parsimony of QCA results with those of decision
trees, then the 'parsimonious' QCA solution should be used” But the
intermediate solution was the solution that was available to me, and parsimony
was not the only criteria of interest in my presentation. Accuracy (or
consistency in QCA terms) was also of interest. But it was the difference in
parsimony that stood out the most in this particular model comparison.
Re “3. The decision
tree analysis performs less well than stated in the presentation” Here I
think I disagree. The focus of the presentation is on consistency of those
configurations that predict effective evaluations only (indicated in the tree diagram by squares
with 0.0 value rather than 1.0 value ), not the whole model. Among the three configurations that predict effective
evaluations the consistency was 82%. Slide 15 may have confused the discussion because
the figures there refer to coverage rather than consistency (I should have made
this clear).
Re “none of the paths
in our QCA is redundant”. The basis for my claim here was some simple
color coding of each case according to which QCA configuration applied to
them. Looking back at the Excel file it appears to me that cases 14 and 16 were
covered by two configurations and cases 16 and 32 by another two configurations.
BUT bear in mind this was done with the binary (crisp) data, not the fuzzy
valued data. (The two configurations that did not seem to cover unique cases
were quanqca*sensit*parti_2 and qualqca*quanqca*sensit*compevi_3). The
important point here is not that redundancy is “bad” but where it is found it
can prompt us to think about how to investigate such cases if and when they
arise (including when two different models provide alternate configurations for
the same cases).
4. “The decision tree
consistency measure is less rigorous than in QCA” I am not sure that this matters in the case of the comparison
at hand but it may matter when other comparisons are made. I say this because on
the measures given on slide 13 the QCA model actually seems to perform better
than the Decision Tree model. BUT again, a possibly confounding factor is the
use of crisp versus fuzzy values behind the two measures. There is nevertheless
a positive message here though, which is to look carefully into how the
consistency measures are calculated for any two models being compared. On a wider
note, there is an extensive array of performance measures for Decision Tree
(aka classification) models that can be summarised in a structure known as a
Confusion Matrix. Here is a good summary of these: http://www.saedsayad.com/model_evaluation_c.htm
Moving on, I am pleased that Michaela and Wolfgang have
taken this extra step: “Intrigued by the
idea of 'triangulating' QCA results with decision tree analysis, we have
converted our QCA dataset into a binary format (as Rick did, see point 1 above)
and conducted a csQCA with that data”. Their results show that the QCA
model does better in three of four comparisons (twice on consistency levels and
once on number of configurations). However, we differ in how to measure the
performance of the Decision Tree model. Their count of configurations seem to
involve double counting (4+4 for both types of outcome), whereas I count 3 and
2, reflecting a total of the 5 that exist in the tree. On this basis I see the Decision
Tree model doing better on parsimony for both types of outcome but the QCA
model doing better on consistency for both types of outcomes.
What would be really interesting to explore, now that we have two more comparable models,
is how much overlap there was in the contents of the configurations found by the
two analyses, and the actual contents of those configurations i.e. the specific
conditions involved. That is what will probably be of most interest to the
donor (DFID) who funded the EVAW work. The findings could have operational consequences.
In addition to exploring the concrete differences between models
based on the same data I think one other area that will be interesting to explore
is how often the best levels of parsimony and accuracy can be found in one
model versus one being available at the cost of the other in any given model. I
suspect QCA may privilege consistency whereas Decision Tree algorithms might not
do so. But this may simply reflect variations in analysis settings given for a particular
analysis. This question has some wider relevance, since some parties might want
to prioritise accuracy whereas others might want to prioritise parsimony. For
example, a stock market investor could do well with a model that has 55%
accuracy, whereas a surgeon might need 98%. Others might want to optimise both.
And a final word of thanks is appropriate, to Michaela and
Wolfgang for making their data set publicly available for others to analyse.
This is all too rare an event, but hopefully one that will become more common
in the future, encouraged by donors and modeled by examples such as theirs.