Monday, June 14, 2021

Paired case comparisons as an alternative to a configurational analysis (QCA or otherwise)

[working draft]

The challenge

The other day I was asked for some advice on how to implement a QCA type of analysis within an evaluation plan that was already fairly circumscribed in its design. Circumscribed both by the commissioner and by the team proposing to carry out the evaluation. The commissioner had already indicated that they wanted a case study orientated approach and had even identified the maximum number of case studies that they wanted to see (ten) .  While the evaluation team could see the potential use of a QCA type analyses they were already committed to undertaking a process type evaluation, and did not want a QCA type analyses to dominate their approach. In addition, it appeared that there already was quite developed conceptual framework that included many different factors which might be contribute causes to the outcomes of interest.

As is often the case, there seemed to be a shortage of cases and excess of potentially explanatory variables. In addition, there were doubts within the evaluation team as to whether a thorough QCA analysis would be possible or justifiable given the available resources and priorities.

Paired case comparisons as the alternative

My first suggestion to the evaluation team was to recognise that there is a middle ground between across-case analysis involving medium to large numbers of cases, and a within-case analysis. Typically, a QCA analysis will use both, going back and forth, using one to inform the other, over a number of iterations.. The middle ground between these two options is case comparisons – particularly comparisons of pairs of cases. Although in the situation described above there will be a maximum of 10 cases that can be explored, the number of pairs of these cases that can be compared is still quite big (45).  With these sort of numbers some sort of strategy is necessary for making choices about the types of pairs of cases that will be compared. Fortunately there is already a large literature on case selection. My favourite summary is the one by  Gerring, J., & Cojocaru, L. (2015). Case-Selection: A Diversity of Methods and Criteria. 

My suggested approach was to use what is known as the Confusion Matrix as the basis for structuring the choice of cases to be compared.  A Confusion Matrix is a simple truth table, showing a combination of two sets of possibilities, for example as follows:

Inside the Confusion Matrix are four types of cases: 
  1. True Positives where there are cases with attributes that fit my theory and where the expected outcome is present
  2. False Positives, where there are cases with attributes that fit my theory but where the expected outcome is absent
  3. False Negatives, where there are cases which do not have attributes that fit my theory but where nevertheless the outcome is present
  4. True Negatives, where there are cases which do not have attributes that fit my theory and where the outcome is absent as expected
Both QCA and supervised machine learning approaches are good at identifying individual (or packages of)  attributes which are good predictors of when outcomes are present or when they are absent – in other words where there are large number of true positive and negative cases. And the exceptions, the False Positive and false Negatives. But this type of cross case-based led analysis do not seem to be available as an option to the evaluation team I have mentioned above.

1. Starting with True Positives

So my suggestion has been to look at the 10 cases that they have at hand, and start by focusing in on those cases where the outcome is present. Imagine there are 5. And then to start by looking at one of these. When examining that case they should identify one or more attributes which they think is the most likely explanation for the outcome being present. So please note here that this initial theory is coming from a single within-case analysis, not the cross-case analysis. The evaluation team will now have a single case in the category of True Positive. 

2. Comparing False Positives and True Positives

The next step in the analysis is to identify at least one most relevant case which can be provisionally described as a False Positive.. This False Positive case should be one that is as similar as possible in all its attributes to the True Positive case, with the obvious exception of the outcome not being present.  This type of analysis choice is called MSDO, standing for most similar design, different outcome - see de Meur reference below.  Also see below on how to measure similarity.

When making the comparison between the true and false positive cases there are two possible kinds of explanations that might be found for the difference in the outcome. One might be the presence of blocking factors in the true positive that prevents the hypothesised causal attribute to not work as expected. The other might be the absence of some additional enabling factor in the false positive case that otherwise enables the hypothesised causal attribute to work as expected. If either can be found then the original theory regarding the True Positive case can be updated, and the (previously) False Positive case now be moved into that category. The theory describing the two True Positive cases can now be seen as provisionally "sufficient"for the outcome, until another False Positive case is found and needs to be examined in a similar fashion.But if no explanation can be found the case can remain as a False Positive.

3. Comparing False Negatives and True Positives

The third step in the analysis is to identify at least one most relevant case which can be described as a False Negative.  This False-Negative case should be one that is as different as possible in all its attributes to the True Positive case. This type of analysis choice is called MDSO, standing for most different design, same outcome. 

 The aim here should be to try to identify if the same or different causal mechanisms are at work,  when compared to those seen in the True Positive case. If it is the same, then that case can now be reclassed as a True Positive. Similarly to the above paragraph, the theory describing the now two True Positive cases can now be seen as provisionally "necessary"for the outcome, until another False Negative case is found and examined in a similar fashion.If the casual mechanism seems to be different then the case remains as a False Positive.

Both the second and third step will help both elaborate the details, and establish the limits of the scope, of the theory identified in step one.. This suggested process makes use of the Confusion Matrix as a kind of very simple chess board, where pieces (aka cases) are introduced on to the board, one at a time, and then sometimes moved to other adjacent positions (depending on their relation to other pieces on the board).

If there are only ten cases available to study, and these have an even distribution of outcomes present and absent, then this three step process of analysis could be reiterated five times i.e. once for each case where the outcome was present. Thus involving  up to 10 case comparisons, out of the 45 possible.

Measuring similarity

The above process depends on the ability to make systematic and transparent judgements about similarity. One way of doing this, which I have previously built into an Excel app called EvalC3, is to start by describing each case with a string of binary coded attributes of the same kind as used in QCA, and in some forms of supervised machine learning. An example set of workings can be seen in this Excel sheet, showing  an imagined data set of 10 cases, with 10 different attributes and then the calculation and use of  Hamming Distance as the similarity measure to chose cases for the kinds of comparisons described above. That list of attributes and the Hamming distance measure, is likely to  need to be updated, as the investigation of False Positives and False Negatives proceeds.

Incidentally, the more attributes that have been coded per case, the more discriminating this kind of approach can become. In contrast to cross-case analysis where an increase in numbers of attributes per case is usually problematic

Related sources

For some of my earlier thoughts on case comparative analysis see  here, These were developed for use within the context of a cross-case analysis process. But the argument above is about how to proceed when the staring point is a within-case analysis.

See also:
  • Nielsen, R. A. (2014). Case Selection via Matching
  • de Meur, G., Bursens, P., & Gottcheiner, A. (2006). MSDO/MDSO Revisited for Public Policy Analysis. In B. Rihoux & H. Grimm (Eds.), Innovative Comparative Methods for Policy Analysis (pp. 67–94). Springer US. 
  • de Meur, G., & Gottcheiner, A. (2012). The Logic and Assumptions of MDSO–MSDO Designs. In The SAGE Handbook of Case-Based Methods (pp. 208–221). 
  • Rihoux, B., & Ragin, C. C. (Eds.). (2009). Configurational Comparative Methods: Qualitative Comparative Analysis (QCA) and Related Techniques. Sage. Pages 28-32 for a description of "MSDO/MDSO: A systematic  procedure for matching cases and conditions". 
  • Goertz, G. (2017). Multimethod research, causal mechanisms, and case studies: An integrated approach. Princeton University Press.

Monday, May 24, 2021

The potential use of Scenario Planning methods to help articulate a Theory of Change

Over the past few months I have been engaged in discussions with other members of the Association of Professional Futurists (APF) Evaluation Task Force about how activities and outcomes in the field of foresight/alternative futures/scenario planning can usefully be evaluated.

Just recently the subject of Theories of Change has come up, and it struck me that there are at least three ways of looking at Theories of Change in this context:

The first perspective: A particular scenario (i.e. an elaborated view of the future) can contain within it a particular theory of change. One view of the future may imply that technological change will be the main driver of what happens. Another might emphasise the major long-term causal influence of demographic change.

The second perspective: Those organising a scenario planning exercise are also likely to have either explicitly or implicitly or mixture of both a Theory of Change of how their exercise is expected to influence on the participants, and the influence those participants will have on others.

The third perspective looks in the opposite direction and raises the possibility that in other settings a Theory of Change may contain a particular type of future scenario. I'm thinking here particularly of Theories of Change as used by organisations planning economic and/or social interventions in developed and developing economies. This territory has been explored recently in a paper by Derbyshire (2019), titled "Use of scenario planning as a theory-driven evaluation tool. FUTURES & FORESIGHT SCIENCE, 1(1), 1–13.  In that paper he puts forward a good argument for the use of scenario planning methods as a way of developing improved Theories of Change. Improved in a number of ways.  Firstly a much more detailed articulation of the causal processes involved. Secondly, more adequate attention to risks and unintended consequences. Thirdly, more adequate involvement of stakeholders in these two processes.

Both the task force discussions and my revisiting of the paper by Derbyshire have prompted me to think about the potential use of a ParEvo exercise as a means of articulating the contents of a Theory of Change for a development intervention. And to start to reach out to people who might be interested in testing such possibilities. The following possibilities come to mind:

1.  A ParEvo exercise could be set up to explore what happens when X project is set up in Y circumstances with Z resources and expectations.  A description of this initial setting would form the seed paragraph(s) of the ParEvo exercise. The subsequent iterations would describe the various possible developments that took place over a series of calendar periods, reflecting the expected lifespan of the intervention, and perhaps a limited period thereafter. The participants would be, or act in the role of, different stakeholders in the intervention. Commentators of the emerging storylines could be independent parties with different forms of expertise relevant to the intervention and its context. 

2.  As with all previous ParEvo exercises to date, after the final iteration there would be an evaluation stage, completed by at least the participants and the commentators, but possibly also by others in observer roles.  You can see a copy of a recent evaluation survey form here, to see the types of evaluative judgements that would be sought from those involved and observing.

3.  .3.  There seemed to be at least two possible ways of using the storylines that have been generated, to inform the design of a Theory of Change. One is to take whole storylines as units of analysis. For example, a storyline evaluated as both most likely and most desirable, by more participants than any other storyline, would seem an immediately useful source of detailed information about a causal pathway that should go into a Theory of Change. Other storylines identified as most likely but least desirable would warrant attention as risks that also need to be built into a Theory Of Change, along with any potential means of preventing and/or mitigating those risks. Other storylines identified as least likely but most desirable would warrant attention as opportunities, also to be built into a Theory Of Change, along with means of enabling and exploiting those opportunities.

4. 34.  The second possible approach would give less respect to the existing branch structure, and focus more on the contents of individual contributions i.e. paragraphs in the storylines.  Individual contributions could be sorted into categories familiar to those developing Theories of Change: activities, outputs, outputs, and impacts.  These could then be recombined into one or more causal pathways that the participants thought was both possible and desirable.  In effect, a kind of linear jigsaw puzzle. If the four categories of event types were seen as being too rigid a schema (a reasonable complaint!),  but still an unfortunate necessity, they could be introduced after the recombination process, rather than before. Either way, it probably would be useful to include another evaluation stage, making a comparative evaluation of the different combinations of contributions that had been created.  Using the same metrics as are already being used with existing ParEvo exercise.

       More ideas will follow..

     The beginnings of a bibliography...

Derbyshire, J. (2019). Use of scenario planning as a theory-driven evaluation tool. FUTURES & FORESIGHT SCIENCE, 1(1), 1–13.
Ganguli, S. (2017). Using Scenario Planning to Surface Invisible Risks (SSIR). Stanford Social Innovation Review.



Sunday, March 21, 2021

Mapping the "structure of cooperation": Adding the time dimension and thinking about further analyses


In October 2020 I wrote the first blog of the same name, based on some experiences with analysing the results of a ParEvo exercise. (ParEvo is a web assisted participatory scenario planning process).

The focus of that blog posting was a scatter plot of the kind shown below. 

Figure 1: Blue nodes = ParEvo exercise participants. Indegree and Outdegree explained below. Green lines = average indegree and average outdegree

The two axes describe two very basic aspects of network structures, including human social networks. Indegree, in the above example, is the number of other participants who built on that participant's contributions. Outdegree is the number of other participant's contributions that participant built on.  Combining these two measures we can generate (in classic consultants' 2 x 2 matrix style!) four broad categories of behavior, as labelled above. Behaviors , not types of people, because in the above instance we have no idea how generalisable the participants' behaviors are across different contexts. 

There is another way of labelling two of the quarters of the scatter plot, using a distinction widely used in evolutionary theory and the study of organisational behavior (March, 1991Wilden et al, 2019). Bridging behavior can be seen as a form of "exploitation" behavior, i.e., it involves making use of others prior contributions, and in turn having one's contributions built on by others.  Isolating behavior can be seen as a form of "exploration" behavior, i.e., building storylines with minimal help from other participants.  General opinion suggest that there is no ideal balance of these two approaches, rather it is thought to be context dependent. But, in stable environments exploitation is thought to be more relevant whereas in unstable environments, exploration is seen as more relevant.

What does interest me is the possibility of applying this updated analytical framework to other contexts. In particular to: (a) citation networks, (b) systems mapping exercises. I will explore citation networks first. Here is an example of a citation network extracted from a public online bibliographic database covering the field of computer science. Any research funding programme will be able to generate such data, both from funding applications and subsequent research generated publications.

Figure 2: A network of published papers, linked by cited references

Looking at the indegree and outdegree attributes of all the documents within this network the average indegree, and outdegree, was 3.9. When this was used as a cutoff value for identifying the four types of cooperation behavior, their distribution was as follows: 

  • Isolating / exploration = 59% of publications
  • Leading = 17%
  • Following = 15%
  • Bridging / exploitation = 8%
Their location within the Figure 2 network diagram is shown below in this set of filtered views.

Figure 3: Top view = all four types, Yellow view = Bridging/Exploitation, Blue = Following, Red = Leading, Green = Isolating/Exploration

It makes some sense to find the bridging/exploitation type papers in the center of the network, and the isolating/exploration type papers more scattered and especially out in the disconnected peripheries. 

It would be interesting to see whether the apparently high emphasis on exploration found in this data set would be found in other research areas. 

The examination of citation networks suggests a third possible dimension to the cooperation structure scatter plot. This is time, as represented in the above example as year of publication. Not surprisingly, the oldest papers have the higher indegree and the newest papers have the lower. Older papers (by definition, within an age bounded set of papers) have lower outdegree compared to newer papers).  But what is interesting here is the potential occurrence of outliers, of two types: "rising stars" and "laggards". That is, new papers with higher than expected indegree ("rising stars") and old papers with lower than expected indegree ("laggards", or a better name??), as seen in the imagined examples (a) and (b) below.

Another implication of considering the time dimension is the possibility of tracking the pathways of individual authors over time, across the scatter plot space. Their strategies may change over time. "If we take the scientist .. it is reasonable to assume that his/her optimal strategy as a graduate student should differ considerably from his/her optimal strategy once he/she received tenure" ( Berger-Tal, et al, 2014) They might start by exploring, then following, then bridging, then leading.

Figure 4: Red line = Imagined career path of one publication author. A and B = "Rising Star" and "Laggard" authors

There seem to be two types of opportunities present here for further analyses:
  1. Macro-level analysis of differences, in the structure of cooperation across different fields of research. Are there significant differences in the scatter plot distribution of behaviors? If so, to what extent are these differences associated with different types of outcomes across those fields? And if so, is there a plausible causal relationship that could be explored and even tested?  
  2. Micro-level analysis of differences, in the behavior of individual researchers within a given field. Do individuals tend to stick to one type of cooperation behavior (as categorised above). Or is their behavior more variable over time? If the latter , there any relatively common trajectory? What are the implications for these micro-level behaviors for the balance of exploration and exploitation taking place in a particular field?

Thursday, January 28, 2021

Connecting Scenario Planning and Theories of Change

This blog posting was prompted by Tom Aston’s recent comment at the end of an article about theories of change and their difficulties.  There he said “I do think that there are opportunities to combine Theories Of Change with scenario planning. In particular, context monitoring and assumption monitoring are intimately connected. So, there’s an area for further exploration”

Scenario planning, in its various forms, typically generates multiple narratives about what might happen in the future. A Theory Of Change does something similar but in a different way.  It is usually in a more diagrammatic rather than narrative form. Often it is simply about one particular view of how change might happen i.e., particular causal pathway or package thereof.  But in more complex network representations Theories Of Change do implicitly present multiple views of the future, in as much as there are multiple causal pathways that can work through these networks.

ParEvo is a participatory approach to scenario planning which I have developed and which has some relevance to discussion of the relationship between scenario planning and Theories Of Change.  ParEvo is different from many scenario planning methods in that it typically generates a larger number of alternative narratives about the future, and these narratives proceed rather than follow a more abstract analysis of causal processes that might be at work generating those narratives. My notion is that this narrative–first approach involves less cognitive demands on the participants, and is an easier activity to get participants engaged in from the beginning. Another point worth noting about the narratives is that they are collectively constructed, by different self-identified combinations of (anonymised) participants.

At the end of a ParEvo exercise participants are asked to rate all the surviving storylines in terms of their likelihood of happening in real life and their desirability.  These ratings can then be displayed in a scatterplot, of the kind shown in the two examples below.  The numbered points in the scatterplot are IDs for specific storylines generated in the same ParEvo exercise. Each of the two scatterplot represents a different ParEvo exercise.


The location of particular storylines in a scatterplot has consequences. I would argue that storylines which are in the likely but undesirable quadrant of the scatterplot deserve the most immediate attention.  They constitute risks which, if at all possible, need to be forfended, or at least responded to appropriately when they do take place. The storylines in the unlikely but desirable quadrant problem justify the next lot of attention.  This is the territory of opportunity. The focus here would be on identifying ways of enabling aspects of those developments to take place.  

Then attention could move to the likely and desirable quadrant.  Here attention could be given to the relationship between what is anticipated in the storylines and any pre-existing Theory Of Change.  The narratives in this quadrant may suggest necessary revisions to the Theory Of Change.  Or, the Theory of Change may highlight what is missing or misconceived in the narratives. The early reflections on the risk and opportunity quadrants might also have implications for revisions to the Theory Of Change.

The fourth quadrant contains those storylines which are seen as unlikely and undesirable.  Perhaps the appropriate response here is simply to periodically to check and update the judgements about likelihood and undesirability.

These four views can be likened to the different views seen from within a car.  There is the front view, which is concerned about likely and desirable events, our expected an intended direction of change.  Then there are two peripheral views, to the right and left, which are concerned with risks and opportunities, present in the desirable but unlikely, and undesirable but likely quadrants. Then there is the rear view, out the back, looking at undesirable and unlikely events.

In this explanation I have talked about storylines in different quadrants, but in the actual scatterplots develop so far the picture is a bit more complex.  Some storylines are way out in the corners of the scatterplot and clearly need attention, but others are more muted and mixed in the position characteristics, so prioritising which of these to give attention to first versus later could be a challenge.

There is also a less visible third dimension to this scatterplot. Some of the participants judgements about likelihood and desirability were not unanimous. These are the red dots in the scatterplot above. In these instances some resolution of differences of opinion about the storylines would need to be the first priority. However it is likely that some of these differences will not be resolvable, so these particular storylines will fall into the category of "Knightian uncertainties", where probabilities are simply unknown. These types of developments can't be planned for in the same way as the others where some judgements about likelihood could be made. This is the territory where bet hedging strategies are appropriate, a strategy seen both in evolutionary biology and in human affairs.  Bet hedging is a response which will be functional in most situations but optimal in none. For example the accumulation of capital reserves in a company, which provides insurance against unexpected shocks, but which is at the cost of efficient use of capital..

There are some other opportunities for connecting thinking about Theories Of Change and the multiple alternative futures that can be identified through a ParEvo process.  These relate to systems type modelling that can be done by extracting keywords from the narratives and mapping their cooccurrence in the paragraphs that make up these narratives, using social network analysis visualisation software.  I will describe these in more detail in the near future, hopefully.