Monday, April 20, 2015

In defense of the (careful) use of algorithms and the need for dialogue between tacit (expertise) and explicit (rules) forms of knowledge

This blog posting is a response to the following paper now available online
Greenhalgh, T., Howick, J., Maskrey, N., 2014. Evidence based medicine: a movement in crisis? BMJ 348,
Background: Chris Roche passed this very interesting paper on to me, received via "Kate", who posted a comment on  Chris's posting on "What has cancer taught me about the links between medicine and development? which can be found on Duncan Green's "From Poverty to Power" blog. 

The paper is interesting in the first instance because both the debate and practice about evidence based policy and practice seems to be much further ahead in the field of medicine than it is in the field of development aid (...broad generalisation that this is...).

It is also of interest to reflect on the problems and solutions copied below and to think how many of these kinds of issues can also be seen in development aid programs.

 According to the paper, the problems with the current version of evidence based medicine include:

  1. Distortion of the evidence based brand ("The first problem is that the evidence based “quality mark” has been misappropriated and distorted by vested interests. In particular, the drug and medical devices industries increasingly set the research agenda. They define what counts as disease ... They also decide which tests and treatments will  be compared in empirical studies and choose (often surrogate) outcome measures for establishing “efficacy.”
  2. Too much evidence:  The second aspect of evidence based medicine’s crisis (and yet, ironically, also a measure of its success) is the sheer volume of evidence available. In particular, the number of clinical guidelines is now both unmanageable and unfathomable. One 2005 audit of a 24 hour medical take in an acute hospital, for example, included 18 patients with 44 diagnoses and identified 3679 pages of national guidelines (an estimated 122 hours ofreading) relevant to their immediate care"
  3. Marginal gains and a shift from disease to risk: "Large trials designed to achieve marginal gains in a near saturated therapeutic field typically overestimate potential benefits (because trial samples are unrepresentative and, if the trial is overpowered, effects may be statistically but not clinically significant) and underestimate harms (because adverse events tend to be under detected or under reported)."
  4. Overemphasis on following algorithmic rules: "Well intentioned efforts to automate use of evidence through computerised decision support systems, structured templates, and point of care prompts can crowd out the local,individualised, and patient initiated elements of the clinical consultation"
  5. Poor fit for multi-morbidity. "Multi-morbidity (a single condition only in name) affects every person differently and seems to defy efforts to produce or apply objective scores, metrics, interventions, or guidelines"
The paper's proposed solutions or ways forward include:
  1. Individualised for the patient: Real evidence based medicine has the care of individual patients as its top priority, asking, “what is the best course of action for this patient, in these circumstances, at this point in their illness or condition?” It consciously and reflexively refuses to let process (doing tests, prescribing medicines) dominate outcomes (the agreed goal of management in an individual case). 
  2. Judgment not rules. Real evidence based medicine is not bound by rules.  
  3. Aligned with professional, relationship based care.  Research evidence may still be key to making the right decision—but it does not determine that decision. Clinicians may provide information, but they are also trained to make ethical and technical judgments, and they hold a socially recognised role to care, comfort, and bear witness to suffering.
  4. Public health dimension . Although we have focused on individual clinical care, there is also an important evidence base relating to population level interventions aimed at improving public health (such as pricing and labelling of consumables, fluoridation of water, and sex education). These are often complex, multifaceted programmes with important ethical and practical dimensions, but the same principles apply as in clinical care. 
  5. Delivering real evidence based medicine. To deliver real evidence based medicine, the movement’s stakeholders must be proactive and persistent. Patients (for whose care the movement exists) must demand better evidence, better presented, better explained, and applied in a more personalised way with sensitivity to context and individual goals.
  6. Training must be reoriented from rule following Critical appraisal skills—including basic numeracy, electronic database searching, and the ability systematically to ask questions of a research study—are prerequisites for competence in evidence based medicine. But clinicians need to be able to apply them to real case examples.
  7. Evidence must be usable as well as robust. Another precondition for real evidence based medicine is that those who produce and summarise research evidence must attend more closely to the needs of those who might use it
  8. Publishers must raise the bar. This raises an imperative for publishing standards. Just as journal editors shifted the expression of probability from potentially misleading P values to more meaningful confidence intervals by requiring them in publication standards, so they should now raise the bar for authors to improve the usability of evidence, and especially to require that research findings are presented in a way that informs individualised conversations.
  9. ...and more
While many of these complaints and claims that make a lot of sense, I think there is also a risk"throwing the baby out with the bathwater" if care is not taken with some. I will focus on a couple of ideas that run through the paper.

The risk lies in seeing two alternative modes of practice as exclusive choices. One is rule based, focused on average affects when trying to meet common needs in populations and the other is expertise focused on the specific and often unique needs of individuals. Parallels could be drawn between different type of aid programs, e.g. centrally planned and nationally rolled out services meeting basic needs like water supply or education and much more person centered participatory rural development programs

Alternatively, one can see these two approaches as having complementary roles that can help and enrich each other. The authors describe one theory of learning which probably applies in many fields, including medicine: The first stage " ...beginning with the novice who learns the basic rules and applies them mechanically with no attention to context. The next two stages involve increasing depth of knowledge and sensitivity to context when applying rules. In the fourth and fifth stages, rule following gives way to expert judgments, characterised by rapid, intuitive reasoning informed by imagination, common sense, and judiciously selected research evidence and other rules"  During this process a lot of explicit knowledge become tacit, and almost automated, with conscious attention left for the more case specific features of a situation. It is an economic use of human cognitive powers. Michael Polanyi wrote about this process years ago (1966, The Tacit Dimension).

The other side of this process is when tacit knowledge gets converted into explicit knowledge. That's what some anthropologists and ethnographers do. They seek to get into the inner world of their subjects and to make it accessible to others. One practitioner whose work interests me in particular is Christina Gladwin, who wrote a book on Ethnographic Decision Trees in 1989. This was all about eliciting how people, like small farmers in west Africa, made decisions about what crops to plant. The result was a decision tree model, that summarised all the key choices farmers could make, and the final outcomes those different choices would lead to. This was not  a model of how they actually thought, but a model of how different combinations of choices were associated with different outcomes of interest. These decision trees are not so far removed from those used in medical practice today.

A new farmer coming into the same location could arguably make use of such a decision tree to decide what to crops to plant. Alternatively they could work with one of the farmers for a number of seasons, which then might cover all the eventualities in the decision tree, and learn from that direct experience. But this would take much more time. In this type of setting explicit rule based knowledge is an  easier and quicker means of transferring knowledge between people. Rule based knowledge that can be quickly and reliably communicated is also testable knowledge.  Following the same pattern of rules may or may not always lead to the expected outcome in another context.

And now a word about algorithms. An algorithm is a clearly defined sequence of steps that will lead to a desired end, sometimes involving some iteration until that end state gets closer. A sequence of choices in a decision tree is an algorithm. At each choice point the answer will dictate what choices to be made next. These are the rules mentioned in the paper above. There are also algorithms for constructing such algorithms. On this blog I have made a number of postings about QCA and (automated) Decision Tree models, both of which are means of constructing testable causal models. Both involve computerised processes for finding rules that best predict outcomes of interest. I think they have a lot of potential in the field of development aid.

But returning to the problems of evidence based medicine, it is very important to note that algorithms are means of achieving specific goals. Deciding which goals need to be pursued remains a very human choice. Even within the use of both QCA and (automated) Decision tree modeling users have to decide the extent to which they want to focus on finding rules that are very accurate or those which are less accurate but which apply to a wider range of circumstances (usually simple rather than complex rules).

So, in summary, in any move towards evidence based practice, we need to ensure that tacit and explicit forms of knowledge build upon each other rather than getting separated as different and competing forms of knowledge. And while we should develop, test and use good algorithms, we should remember they are always means to an end, and we remain responsible for choosing the ends we are trying to achieve.

Postscript 2015 05 04: Please also read this recent cautionary analysis of the use of algorithms for the purposes of public policy implementation. The author points out that algorithms can embody and perpetuate cultural biases. How is that possible? It is possible because all evidence-based algorithms are developed using historical data i.e. data sets of what has happened in the past. Those data sets, e.g. of arrest and conviction data in a given city reflect historical practice by human institutions in that city, with all their biases, conscious and not so conscious. They don't reflect ideal practice, simply the actual practice at the time. Where an algorithm is not based on analysis of historical data then it may have its origins in a more ethnographic study of the practice of human experts in the domain of interest. Their practice, and their interpretations of their practice, are also equally subject to cultural biases. The analysis by Virginia Eubanks include four useful suggestions to counter these risks, one of which is that "We need to learn more about how policy algorithms work" by demanding more transparency about the design of a given algorithm and its decisions. But this may not be possible, or in some cases publicly desirable. One alternative method of interest is the algorithmic audit.

Saturday, April 18, 2015

A mistaken criticism of the value of binary data

When reviewing a recent evaluation report I came across the following comment:
"Crisp set QCA where binary codings are used to establish the presence or absence of certain conditions does not facilitate a nuanced or granular analysis."
Wrong. Simply wrong.

A DFID strategy for promoting "improved governance" could be coded  as present or absent. This does seem crude, given the varieties of ways in which a governance strategy could actually be implemented. But the answer is not to ditch binary coding, but to extend it.

This can be done by breaking down the concept of "a strategy for improved governance"  into a number of component parts or attributes, and then coding for their presence/absence. The initial conception of the governance strategy is then deemed present if all 10 attributes are present. But it only takes a single change in one attribute at a time to produce 9 new versions of almost the same strategy. If you change two attributes at a time, there are ( 1 think... 1-(10 x 10) =) 100 new versions. If any number of attributes can be changed then this means there are 2 to the power of 10 possible configurations of the strategy, some of which may be very different from the present strategy. Basically it does not take much tweaking of the initial configuration before you will have nuances by the bucketful!

The limitations of the dis-aggregation-into-components approach have nothing to do with the nature of binary coding, but rather whether there are enough cases available to allow identification of the kinds of outcomes associated with the different varieties of configurations arising from the more micro-level coding of attributes.

If there are enough cases available, then learning about what works through the emergence (or planned development) of variations in the initial configuration then becomes possible. Some of these new versions of a governance strategy may work more effectively than the initial model, and others less so. Incremental exploration becomes possible.

For more on the idea of exploring adjacent variations in causal configurations see Andreas Wagner's very interesting (2014) book titled "The Arrival of the Fittest" which explores a theory of how innovation is possible in biological systems. Here is a review of the book, in the Times Higher Education website.

There is also a connection here, I think, with Stuart Kauffman's concept of "the adjacent possible", an idea also taken up by Stephen Johnson in his book "Where do good ideas come from: The natural history of innovation" Here is a review of the book in the Guardian

Postscript 2015 05 14: I heard the same"binary is crude"  criticism again today from a person attending a QCA presentation at the UK Evaluation Society Conference in London.

This time I will present another response. Binary judgments can be and often are derived from a dichotomised scale that captures graduations of the phenomena of interest. As Carroll Patterson pointed out today, with current QCA software it is now possible to experiment with varying the location of the cut-off point on such scales, and observe the consequences for the quality of the configurations that are then identified as the best fitting solutions The same approach is also possible with searches for best-fitting configurations using an evolutionary algorithm, which is another approach I have been experimenting with recently. It is also possible to go much further into the specific details of the underlying concept being measured by a scale by basing it on the aggregated output of a weighted checklist, like the kind I have described elsewhere. Basically, the limit to what is possible is defined by the imagination of the researcher/evaluator, not any inherent limitation of binary measures.

Postscript  2015 05 17: I tried to post a Comment below in reply to Anon's comment below, but wont accept any HTML formatting, so I will place the comment here instead.

RE "If, to combat the reductiveness of binary coding, you introduce a scale of 4-6 points, you still face the same problem in coding something more complex – a remote non expert is reducing a complex context and process to a number in an arbitrary way. "
Coding for QCA (and other purposes, such as when using NVIVO) should always be done in a way that is transparent and replicable, with attention to inter-rate reliability. It should certainly not be done in an “arbitrary way”
RE "Grading a large, diverse and complicated country on a scale of 0-1 or 1-5 on 'improved governance' is just ridiculous. Anyone who has studied the way people actually behave, governance, how decisions are really made or projects succeed or fail, will tell you that this reductiveness does not helpfully or accurately reflect reality."
QCA has been used in a field of Political Science since the 1980s and many of these applications have been cross-country analyses of political systems.
RE QCA is not qualitative – as it seeks to reduce a complex qualitative issue to a quantitative score - a number.
In crisp-set QCA data set the “number” 0 or 1 is actually a category not a numerical value. QCA could be done just as well by replacing  the 0’s and 1’s with the words “absent” and “present” 
RE "QCA is not comparative – the serious comparative part comes afterwards in some form of qualitative analysis, which researchers can choose. Looking at the truth table for patterns is the only form of comparison that QCA offers."
There are two levels of analysis involved in QCA: within-case analysis and between-case analysis.  At the beginning within-case analysis informs the selection of conditions to be included in a data set. When inconsistencies are found in an examination of configurations in a data set good practice advises a return to within-case analysis to identify missing conditions that can resolve these inconsistencies.  When these have been resolved and set of configurations has been identified that accounts for all case in the most parsimonious way possible,  these then need to be interpreted by reference to   the details of specific cases, with particular attention to more detailed process that connect the conditions making up the configurations.
RE "In my view QCA is a quantitative form of data management and pattern identification."
It does depend on what you mean by quantitative. It is based on a form of mathematics known as set theory, but that is about logical relationships, not quantities. In case there is any reservation about its significance, pattern identification is very important. In a data set with 10 different conditions there are 2 to the 10 different possible combinations of these that might be consistently associated with an outcome of interest. Finding these is like looking for a needle in a haystack. QCA and other methods like decision tree algorithms, help us find what part of the haystack the needle is most likely to be found. But as I said at the end of my section of the UKES presentation, finding a plausible configuration is not enough. It is necessary but not sufficient  for a strong causal claim. There also needs to be a plausible account of the likely causal mechanisms at work that connect the conditions in the configurations. These will only be found and confirmed through detailed within-case investigations, using methods like (but not only) process tracing. And the pattern finding has to be systematic, and transparent in the way it has been done. This is the case with QCA and Decision Tree modeling, where there are specifics algorithms used, both with their known limitations

There is a useful reference that may be of interest: Wagemann, C., Schneider, C.Q., 2007. Standards of Good Practice in Qualitative Comparative Analysis (qca) and Fuzzy-Sets.