Saturday, September 26, 2020

EvalC3 versus QCA - compared via a re-analysis of one data set


I was recently asked whether that EvalC3 could be used to do a synthesis study, analysing the results from multiple evaluations.  My immediate response was yes, in principle.  But it probably needs more thought.

I then recalled that I had seen somewhere an Oxfam synthesis study of multiple evaluation results that used QCA.  This is the reference, in case you want to read, which I suggest you do.

Shephard, D., Ellersiek, A., Meuer, J., & Rupietta, C. (2018). Influencing Policy and Civic Space: A meta-review of Oxfam’s Policy Influence, Citizen Voice and Good Governance Effectiveness Reviews | Oxfam Policy & Practice. Oxfam. https://policy-practice.oxfam.org.uk/publications/*

Like other good examples of QCA analyses in practice, this paper includes the original data set in an appendix, in the form of a truth table.  This means it is possible for other people like me to reanalyse this data using other methods that might be of interest, including EvalC3.  So, this is what I did.

The Oxfam dataset includes five conditions a.k.a. attributes of the programs that were evaluated.  Along with two outcomes each pursued by some of the programs.  In total there was data on the attributes and outcomes of twenty-two programs concerned with expanding civic space and fifteen programs concerned with policy influence.  These were subject to two different QCA analyses.

The analysis of civic space outcomes

In the Oxfam analysis of the fifteen programs concerned with expanding civic space, QCA analysis found four “solutions” a.k.a. combinations of conditions which were associated with the outcome of successful civic space.  Each of these combinations of conditions was found to be sufficient for the outcome to occur.  Together they accounted for the outcomes found in 93% or fourteen of the fifteen cases.  But there was overlap in the cases covered by each of these solutions, leaving the question open as to which solution best fitted/explained those cases.  Six of the fourteen cases had two or more solutions that fitted them.

In contrast, the EvalC3 analysis found two predictive models (=solutions) which are associated with the outcome of expanded civic space.  Each of these combinations of conditions was found to be sufficient for the outcome to occur.  Together they accounted for all fifteen cases where the outcome occurred.  In addition, there was no overlap in the cases covered by each of these models.

The analysis of policy influencing outcomes

in the Oxfam analysis of the twenty-two programs concerned with policy influencing the QCA analysis found two solutions associated with the outcome of expanding civic space.  Each of these was sufficient for the outcome, and together they accounted for all the outcomes.  But there was some overlap in coverage, one of the six cases were covered by both solutions.

In contrast, the EvalC3 analysis found one predictive model which was necessary and sufficient for the outcome, and which accounted for all the outcomes achieved.

Conclusions?

Based on parsimony alone, the EvalC3 solutions/predictive models would be preferable.  But parsimony is not the only appropriate criteria to evaluate a model.  Arguably a more important criterion is the extent which a model fits the details of the cases covered when those cases are closely examined.  So, really what the EvalC3 analysis has done is to generate some extra models that need close attention, in addition to those already generated by the QCA analysis.  The number of cases covered by multiple models is been increased.

In the Oxfam study, there was no follow-on attention given to resolving what was happening in the cases that were identified by more than one solution/predictive model.  In my experience of reading other QCA analyses, this lack of follow-up is not uncommon.

However, in the Oxfam study for each of the solutions found at least one detailed description was given of an example case that that solution covered.  In principle, this is good practice. But unfortunately, as far as I can see, it was not clear whether that particular case was exclusively covered by that solution, or part of a shared solution.  Even amongst those cases which were exclusively covered by a solution there are still choices that need to be made (and explained) about how to select particular cases as exemplars and/or for a detailed examination of any causal mechanisms at work.  

QCA software does not provide any help with this task.  However, I did find some guidance in specialist text on QCA:  Schneider, C. Q., & Wagemann, C. (2012). Set-Theoretic Methods for the Social Sciences: A Guide to Qualitative Comparative Analysis. Cambridge University Press. https://doi.org/10.1017/CBO9781139004244 (it’s a heavy read in part of this book but overall, it is very informative).  In section 11.4 titled Set-Theoretic Methods and Case Selection, the authors note ‘Much emphasis is put on the importance of intimate case knowledge for a successful QCA.  As a matter of fact, the idea of QCA is a research approach and of going back-and-forth between ideas and evidence largely consists of combining comparative within case studies and QCA is a technique.  So far, the literature has focused mainly on how to choose cases prior to and during but not after a QCA were by QCA we here refer to the analytic moment of analysing a truth table it is therefore puzzling that little systematic and specific guidance has so far been provided on which cases to select for within case studies based on the results of i.e. after a QCA… ‘The authors then go on to provide some guidance (a total of 7 pages of 320).

In contrast to QCA software, EvalC3 has a number of built-in tools and some associated guidance on the EvalC3 website, on how to think about case selection as a step between cross-case analysis and subsequent within-case analysis.  One of the steps in the seven-stage EvalC3 workflow (Compare Models) is the generation of a table that compares the case coverage of multiple selected alternative models found by one’s analysis to that point.  This enables the identification of cases which are covered by two or more models.  These types of cases would clearly warrant subsequent within-case investigation.

Another step in the EvalC3 workflow called Compare Cases, provides another means of identifying specific cases for follow-up within-case investigations.  In this worksheet individual cases can be identified as modal or extreme examples within various categories that may be of interest e.g. True Positives, False Positives, et cetera.  It is also possible to identify for a chosen case what other case is most similar and most different to that case, when all its attributes available in the dataset are considered.  These measurement capacities are backed up by technical advice on the EvalC3 website on the particular types of questions that can be asked in relation to different types of cases selected on the basis of their similarities and differences. Your comments on these suggested strategies would be very welcome.

I should explain...

...why the models found by EvalC3 were different from those found from the QCA analysis.  QCA software finds solutions i.e. predictive models by reducing all the configurations found in a truth table down to the smallest possible set, using what is known as a minimisation algorithm called the Quine McCluskey algorithm.

In contrast, EvalC3 provides users with a choice of four different search algorithms combined with multiple alternative performance measures that can be used to automatically assess the results generated by those search algorithms. All algorithms have their strengths and weaknesses, in terms of the kinds of results they can find and cannot find, including the QCA Quine McCluskey algorithm and the simple machine learning algorithms built into EvalC3. I think the McCluskey algorithm has particular problems with datasets which have limited diversity, in other words, where cases only represent a small proportion of all the possible combinations of the conditions documented in the dataset. Whereas the simple search algorithms built into EvalC3 don't experience that is a difficulty. This is my conjecture, not yet rigorously tested.

[In the above data set the cases represented the two data sets analysed represented 47% and 68% of all the possible configurations given the presence of five different conditions]

While EvalC3 results described above did differ from the QCA analyses, they were not in outright contradiction. The same has been my experience when I reanalysed other QCA datasets. EvalC3 will either simply duplicate the QCA findings or produce variations on those, and often those which are better performing. 


Wednesday, July 29, 2020

Converting a continuous variable into a binary variable i.e. dichotomising


If you Google "dichotomising data" you will find lots of warnings that this is basically a bad idea!. Why so? Because if you do so you will lose information. All those fine details of differences between observations will be lost.

But what if you are dealing with something like responses to an attitude survey? Typically these have five-pointed scales ranging from disagree to neutral to agree, or the like. Quite a few of the fine differences in ratings on this scale may well be nothing more than "noise", i.e. variations unconnected with the phenomenon you are trying to measure. A more likely explanation is that they reflect differences in respondents "response styles", or something more random

Aggregation or "binning" of observations into two classes (higher and lower) can be done in different ways. You could simply find the median value and split the observations at that point. Or, you could look for a  "natural" gap in the frequency distribution and make the split there. Or, you may have a prior theoretical reason that it makes sense to split the range of observations at some other specific point.

I have been trying out a different approach. This involved not just looking at the continuous variable I wanted to dichotomise, but also its relationship with an outcome that will be of interest in subsequent analyses. This could also be a continuous variable or a binary measure.

There are two ways of doing this. The first is a relatively simple manual approach. In the first approach, the cut-off point for the outcome variable has already been decided, by one means or another.  We then needed to vary the cut-off point in the range of values for the independent variable to see what effect they had on the numbers of observations of the outcome above and below its cut-off value. For any specific cut-off value for the independent variable an Excel spreadsheet will be used to calculate the following:
  1. # of True Positives - where the independent variable value was high and so was the outcome variable value
  2. # of False Positives - where the independent variable value was high but the outcome variable value was low
  3. # of False Negative - where the independent variable value was low but the outcome variable value was high
  4. # of True Negatives - where the independent variable value was low and the outcome variable value was low
When doing this we are in effect treating cut-off criteria for the independent variable as a predictor of the dependent variable.  Or more precisely, a predictor of the prevalence of observations with values above a specified cut-off point on the dependent variable.

In Excel, I constructed the following:
  • Cells for entering the raw data - the values of each variable for each observation
  • Cells for entering the cut-off points
  • Cells for defining the status of each observation  
  • A Confusion Matrix, to summarise the total number of observations with each of the four possible types described above.
  • A set of 6 widely used performance measures, calculated using the number of observations in each cell of the Confusion Matrix.
    • These performance measures tell me how good the chosen cut-off point is as a predictor of the outcome as specified. At best, all those observations fitting the cut-off criterion will be in the True Positive group and all those not fitting it would be in the True Negative group. In reality, there are also likely to be observations in the False Positive and False Negative groups.
By varying the cut-off points it is possible to find the best possible predictor i.e. one with very few False Positive and very few False Negatives. This can be done manually when the cut-off point for the outcome variable has already been decided.

Alternatively, if the cut-off point has not been decided for the outcome variable, a search algorithm can be used to find the best combination of two cut-off points (one for the independent and one for the dependent variable).  Within Excel, there is an add-in called Solver, which uses an evolutionary algorithm to do such a search, to find the optimal combination

Postscript 2020 11 05: An Excel file with the dichotomization formula and a built-in example data set is available here  

  
2020 08 13: Also relevant: 

Hofstad, T. (2019). QCA and the Robustness Range of Calibration Thresholds: How Sensitive are Solution Terms to Changing Calibrations? COMPASSS Working Papers, 2019–92. http://www.compasss.org/wpseries/Hofstad2019.pdf  

 This paper emphasises the importance of declaring the range of original (pre-dichotomised) values over which the performance of a predictive model remains stable