Showing posts sorted by date for query decision trees. Sort by relevance Show all posts
Showing posts sorted by date for query decision trees. Sort by relevance Show all posts

Monday, April 20, 2015

In defense of the (careful) use of algorithms and the need for dialogue between tacit (expertise) and explicit (rules) forms of knowledge



This blog posting is a response to the following paper now available online
Greenhalgh, T., Howick, J., Maskrey, N., 2014. Evidence based medicine: a movement in crisis? BMJ 348, http://www.bmj.com/content/348/bmj.g3725
Background: Chris Roche passed this very interesting paper on to me, received via "Kate", who posted a comment on  Chris's posting on "What has cancer taught me about the links between medicine and development? which can be found on Duncan Green's "From Poverty to Power" blog. 

The paper is interesting in the first instance because both the debate and practice about evidence based policy and practice seems to be much further ahead in the field of medicine than it is in the field of development aid (...broad generalisation that this is...).

It is also of interest to reflect on the problems and solutions copied below and to think how many of these kinds of issues can also be seen in development aid programs.

 According to the paper, the problems with the current version of evidence based medicine include:

  1. Distortion of the evidence based brand ("The first problem is that the evidence based “quality mark” has been misappropriated and distorted by vested interests. In particular, the drug and medical devices industries increasingly set the research agenda. They define what counts as disease ... They also decide which tests and treatments will  be compared in empirical studies and choose (often surrogate) outcome measures for establishing “efficacy.”
  2. Too much evidence:  The second aspect of evidence based medicine’s crisis (and yet, ironically, also a measure of its success) is the sheer volume of evidence available. In particular, the number of clinical guidelines is now both unmanageable and unfathomable. One 2005 audit of a 24 hour medical take in an acute hospital, for example, included 18 patients with 44 diagnoses and identified 3679 pages of national guidelines (an estimated 122 hours ofreading) relevant to their immediate care"
  3. Marginal gains and a shift from disease to risk: "Large trials designed to achieve marginal gains in a near saturated therapeutic field typically overestimate potential benefits (because trial samples are unrepresentative and, if the trial is overpowered, effects may be statistically but not clinically significant) and underestimate harms (because adverse events tend to be under detected or under reported)."
  4. Overemphasis on following algorithmic rules: "Well intentioned efforts to automate use of evidence through computerised decision support systems, structured templates, and point of care prompts can crowd out the local,individualised, and patient initiated elements of the clinical consultation"
  5. Poor fit for multi-morbidity. "Multi-morbidity (a single condition only in name) affects every person differently and seems to defy efforts to produce or apply objective scores, metrics, interventions, or guidelines"
The paper's proposed solutions or ways forward include:
  1. Individualised for the patient: Real evidence based medicine has the care of individual patients as its top priority, asking, “what is the best course of action for this patient, in these circumstances, at this point in their illness or condition?” It consciously and reflexively refuses to let process (doing tests, prescribing medicines) dominate outcomes (the agreed goal of management in an individual case). 
  2. Judgment not rules. Real evidence based medicine is not bound by rules.  
  3. Aligned with professional, relationship based care.  Research evidence may still be key to making the right decision—but it does not determine that decision. Clinicians may provide information, but they are also trained to make ethical and technical judgments, and they hold a socially recognised role to care, comfort, and bear witness to suffering.
  4. Public health dimension . Although we have focused on individual clinical care, there is also an important evidence base relating to population level interventions aimed at improving public health (such as pricing and labelling of consumables, fluoridation of water, and sex education). These are often complex, multifaceted programmes with important ethical and practical dimensions, but the same principles apply as in clinical care. 
  5. Delivering real evidence based medicine. To deliver real evidence based medicine, the movement’s stakeholders must be proactive and persistent. Patients (for whose care the movement exists) must demand better evidence, better presented, better explained, and applied in a more personalised way with sensitivity to context and individual goals.
  6. Training must be reoriented from rule following Critical appraisal skills—including basic numeracy, electronic database searching, and the ability systematically to ask questions of a research study—are prerequisites for competence in evidence based medicine. But clinicians need to be able to apply them to real case examples.
  7. Evidence must be usable as well as robust. Another precondition for real evidence based medicine is that those who produce and summarise research evidence must attend more closely to the needs of those who might use it
  8. Publishers must raise the bar. This raises an imperative for publishing standards. Just as journal editors shifted the expression of probability from potentially misleading P values to more meaningful confidence intervals by requiring them in publication standards, so they should now raise the bar for authors to improve the usability of evidence, and especially to require that research findings are presented in a way that informs individualised conversations.
  9. ...and more
While many of these complaints and claims that make a lot of sense, I think there is also a risk"throwing the baby out with the bathwater" if care is not taken with some. I will focus on a couple of ideas that run through the paper.

The risk lies in seeing two alternative modes of practice as exclusive choices. One is rule based, focused on average affects when trying to meet common needs in populations and the other is expertise focused on the specific and often unique needs of individuals. Parallels could be drawn between different type of aid programs, e.g. centrally planned and nationally rolled out services meeting basic needs like water supply or education and much more person centered participatory rural development programs

Alternatively, one can see these two approaches as having complementary roles that can help and enrich each other. The authors describe one theory of learning which probably applies in many fields, including medicine: The first stage " ...beginning with the novice who learns the basic rules and applies them mechanically with no attention to context. The next two stages involve increasing depth of knowledge and sensitivity to context when applying rules. In the fourth and fifth stages, rule following gives way to expert judgments, characterised by rapid, intuitive reasoning informed by imagination, common sense, and judiciously selected research evidence and other rules"  During this process a lot of explicit knowledge become tacit, and almost automated, with conscious attention left for the more case specific features of a situation. It is an economic use of human cognitive powers. Michael Polanyi wrote about this process years ago (1966, The Tacit Dimension).

The other side of this process is when tacit knowledge gets converted into explicit knowledge. That's what some anthropologists and ethnographers do. They seek to get into the inner world of their subjects and to make it accessible to others. One practitioner whose work interests me in particular is Christina Gladwin, who wrote a book on Ethnographic Decision Trees in 1989. This was all about eliciting how people, like small farmers in west Africa, made decisions about what crops to plant. The result was a decision tree model, that summarised all the key choices farmers could make, and the final outcomes those different choices would lead to. This was not  a model of how they actually thought, but a model of how different combinations of choices were associated with different outcomes of interest. These decision trees are not so far removed from those used in medical practice today.

A new farmer coming into the same location could arguably make use of such a decision tree to decide what to crops to plant. Alternatively they could work with one of the farmers for a number of seasons, which then might cover all the eventualities in the decision tree, and learn from that direct experience. But this would take much more time. In this type of setting explicit rule based knowledge is an  easier and quicker means of transferring knowledge between people. Rule based knowledge that can be quickly and reliably communicated is also testable knowledge.  Following the same pattern of rules may or may not always lead to the expected outcome in another context.

And now a word about algorithms. An algorithm is a clearly defined sequence of steps that will lead to a desired end, sometimes involving some iteration until that end state gets closer. A sequence of choices in a decision tree is an algorithm. At each choice point the answer will dictate what choices to be made next. These are the rules mentioned in the paper above. There are also algorithms for constructing such algorithms. On this blog I have made a number of postings about QCA and (automated) Decision Tree models, both of which are means of constructing testable causal models. Both involve computerised processes for finding rules that best predict outcomes of interest. I think they have a lot of potential in the field of development aid.

But returning to the problems of evidence based medicine, it is very important to note that algorithms are means of achieving specific goals. Deciding which goals need to be pursued remains a very human choice. Even within the use of both QCA and (automated) Decision tree modeling users have to decide the extent to which they want to focus on finding rules that are very accurate or those which are less accurate but which apply to a wider range of circumstances (usually simple rather than complex rules).

So, in summary, in any move towards evidence based practice, we need to ensure that tacit and explicit forms of knowledge build upon each other rather than getting separated as different and competing forms of knowledge. And while we should develop, test and use good algorithms, we should remember they are always means to an end, and we remain responsible for choosing the ends we are trying to achieve.

Postscript 2015 05 04: Please also read this recent cautionary analysis of the use of algorithms for the purposes of public policy implementation. The author points out that algorithms can embody and perpetuate cultural biases. How is that possible? It is possible because all evidence-based algorithms are developed using historical data i.e. data sets of what has happened in the past. Those data sets, e.g. of arrest and conviction data in a given city reflect historical practice by human institutions in that city, with all their biases, conscious and not so conscious. They don't reflect ideal practice, simply the actual practice at the time. Where an algorithm is not based on analysis of historical data then it may have its origins in a more ethnographic study of the practice of human experts in the domain of interest. Their practice, and their interpretations of their practice, are also equally subject to cultural biases. The analysis by Virginia Eubanks include four useful suggestions to counter these risks, one of which is that "We need to learn more about how policy algorithms work" by demanding more transparency about the design of a given algorithm and its decisions. But this may not be possible, or in some cases publicly desirable. One alternative method of interest is the algorithmic audit.

Tuesday, October 07, 2014

Comparing QCA and Decision Tree models - an ongoing discussion



This blog is a continuation of a dialogue that is based on Michaela Raab and Wolfgang Stuppert's  EVAW blog. I would have preferred to post my response below via their blog's Comment facility, but it cant cope with long responses or hypertext links. They in turn have had difficulty posting comments on my YouTube site where this EES presentation (Triangulating the results of Qualitative Comparative Analyses (EES Dublin 2014)  can be seen. It was this presentation that prompted their response here on their blog.

Hi Michaela and Wolfgang

Thanks for going to the trouble of responding in detail to my EES presentation.

Before responding in detail I should point out to readers that the EES presentation was on the subject of triangulation, and how to compare QCA and Decision Tree models, when applied to the same data set. In my own view I think it is unlikely that either of these methods will produce the “best” results in all circumstances. The interesting challenge is to develop ways of thinking about how to compare and choose between specific models generated by these, and what may be other comparable methods of analysis. The penultimate slide (#17)  in the presentation highlights the options I think we can try out when faced with different kinds of differences between models.

The rest of this post responds to particular points that have been made by Michaela and Wolfgang, and then makes a more general conclusion.

Re  “1. The  decision tree analysis is not based on the same data set as our QCA” This is correct. I was in a bit of a quandary because while the original data set was fuzzy set (i.e. there intermediate values between 0 and 1) the solutions that were found were described in binary form i.e. the conditions and outcomes either were or were not present. I did produce a Decision Tree with the fuzzy set data but I had no easy means of comparing the results with the binary results of the QCA model. That said, Michaela and Wolfgang are right in expecting that such a model would be more complex and have more configurations.

Re “2. Decision tree analysis is compared with a type of QCA solution that is not meant to maximise parsimony.”  I agree that “If the purpose was to compare the parsimony of QCA results with those of decision trees, then the 'parsimonious' QCA solution should be used” But the intermediate solution was the solution that was available to me, and parsimony was not the only criteria of interest in my presentation. Accuracy (or consistency in QCA terms) was also of interest. But it was the difference in parsimony that stood out the most in this particular model comparison.

Re “3. The decision tree analysis performs less well than stated in the presentation” Here I think I disagree. The focus of the presentation is on consistency of those configurations that predict effective evaluations only (indicated in the tree diagram by squares with 0.0 value rather than 1.0 value ), not the whole model.  Among the three configurations that predict effective evaluations the consistency was 82%. Slide 15 may have confused the discussion because the figures there refer to coverage rather than consistency (I should have made this clear).

Re “none of the paths in our QCA is redundant”. The basis for my claim here was some simple color coding of each case according to which QCA configuration applied to them. Looking back at the Excel file it appears to me that cases 14 and 16 were covered by two configurations and cases 16 and 32 by another two configurations. BUT bear in mind this was done with the binary (crisp) data, not the fuzzy valued data. (The two configurations that did not seem to cover unique cases were  quanqca*sensit*parti_2  and qualqca*quanqca*sensit*compevi_3). The important point here is not that redundancy is “bad” but where it is found it can prompt us to think about how to investigate such cases if and when they arise (including when two different models provide alternate configurations for the same cases).

4. “The decision tree consistency measure is less rigorous than in QCA”       I am not sure that this matters in the case of the comparison at hand but it may matter when other comparisons are made. I say this because on the measures given on slide 13 the QCA model actually seems to perform better than the Decision Tree model. BUT again, a possibly confounding factor is the use of crisp versus fuzzy values behind the two measures. There is nevertheless a positive message here though, which is to look carefully into how the consistency measures are calculated for any two models being compared. On a wider note, there is an extensive array of performance measures for Decision Tree (aka classification) models that can be summarised in a structure known as a Confusion Matrix. Here is a good summary of these: http://www.saedsayad.com/model_evaluation_c.htm

Moving on, I am pleased that Michaela and Wolfgang have taken this extra step: “Intrigued by the idea of 'triangulating' QCA results with decision tree analysis, we have converted our QCA dataset into a binary format (as Rick did, see point 1 above) and conducted a csQCA with that data”. Their results show that the QCA model does better in three of four comparisons (twice on consistency levels and once on number of configurations). However, we differ in how to measure the performance of the Decision Tree model. Their count of configurations seem to involve double counting (4+4 for both types of outcome), whereas I count 3 and 2, reflecting a total of the 5 that exist in the tree. On this basis I see the Decision Tree model doing better on parsimony for both types of outcome but the QCA model doing better on consistency for both types of outcomes.

What would be really interesting to explore,  now that we have two more comparable models, is how much overlap there was in the contents of the configurations found by the two analyses, and the actual contents of those configurations i.e. the specific conditions involved. That is what will probably be of most interest to the donor (DFID) who funded the EVAW work. The findings could have operational consequences.

In addition to exploring the concrete differences between models based on the same data I think one other area that will be interesting to explore is how often the best levels of parsimony and accuracy can be found in one model versus one being available at the cost of the other in any given model. I suspect QCA may privilege consistency whereas Decision Tree algorithms might not do so. But this may simply reflect variations in analysis settings given for a particular analysis. This question has some wider relevance, since some parties might want to prioritise accuracy whereas others might want to prioritise parsimony. For example, a stock market investor could do well with a model that has 55% accuracy, whereas a surgeon might need 98%. Others might want to optimise both.

And a final word of thanks is appropriate, to Michaela and Wolfgang for making their data set publicly available for others to analyse. This is all too rare an event, but hopefully one that will become more common in the future, encouraged by donors and modeled by examples such as theirs.


Friday, March 28, 2014

The challenges of using QCA



This blog posting is a response to my reading of the Inception Report written by the team who are undertaking a review of evaluations of interventions relating to violence against women and girls. The process of the review is well documented in a dedicated blog – EVAW Review

The Inception Report is well worth reading, which is not something I say about many evaluation reports! One reason is to benefit from the amount of careful attention the authors have given to the nuts and bolts of the process. Another is to see the kind of intensive questioning the process has been subjected to by the external quality assurance agents and the considered responses by the evaluation team. I found that many of the questions that came to my mind while reading the main text of the report were dealt with when I read the annex containing the issues raised by SEQUAS and the team’s responses to them.

I will focus on one issue that is challenge for both QCA and data mining methods like Decision Trees (which I have discussed elsewhere on this blog). That is the ratio of conditions to cases. In QCA conditions are attributes of the cases under examination that are provisionally considered as possible parts of causal configurations that explain at least some of the outcomes. After an exhaustive search and selection process the team has ended up with a set of 39 evaluations they will use as cases in a QCA analysis. After a close reading of these and other sources they have come up with a list of 20 conditions that might contribute to 5 different outcomes. With 20 different conditions there are 220 (i.e. 1,048,576) different possible configurations that could explain some or all of the outcomes. But there are only 39 evaluations, which at best will represent only 0.004% of the possible configurations. In QCA the remaining 1,048,537 are known as “logical remainders”. Some of these can usually be used in a QCA analysis through a process using explicit assumptions e.g. about particular configurations plus outcomes which by definition would be impossible to occur in real life. However, from what I understand of QCA practice, logical remainders would not usually exceed 50% of all possible configurations.

The review team has dealt with this problem by summarising the 20 conditions and 5 outcomes into 5 conditions and one outcome. This means there are 25 (i.e. 32) possible causal configurations, which is more reasonable considering there are 39 cases available to analyse. However there is a price to be paid for this solution, which is the increased level of abstraction/generality in the terms used to describe the conditions. This makes the task of coding the known cases more challenging and it will make the task of interpreting the results and then generalising from them more challenging as well. You can see the two versions of their model in the diagram below, taken from their report.
 
What fascinated me was the role of evaluation method in this model (see “Convincing methodology”). It is only one of five conditions that could explain some or all of the outcomes. It is quite possible therefore that all or some of the case outcomes could be explained without the use of this condition. This is quite radical, considering the centrality of evaluation methodology in much of the literature on evaluations. It may also be worrying to DFID in that one of their expectations of this review was it would “generate a robust understanding of the strengths, weaknesses and appropriateness of evaluation approaches and methods”. The other potential problem is that even if methodology is shown to be an important condition, its singular description does not provide any means to discriminating between forms which are more or less helpful.

The team seems to have responded to this problem by proposing additional QCA analyses, where there will be an additional condition that differentiates cases according to whether they used qualitative or quantitative methods.  However reviewers have still questioned whether this is sufficient. The team in return have commented that they will “add to the model further conditions that represent methodological choice after we have fully assessed the range of methodologies present in the set, to be able to differentiate between common methodological choices” It will be interesting to see how they go about doing this, while avoiding the problem of “insufficient diversity” of cases already mentioned above.

One possible way forward has been illustrated in a recent CIFOR Working Paper (Sehring et al, 2013) and which is also covered in Schneider and Wagemann (2012). They have illustrated how it is possible to do a “two-step QCA”, which differentiates between remote and proximate conditions. In the VAWG review this could take the form of an analysis of conditions other than methodology first, then a second analysis focusing on a number of methodology conditions. This process essentially reduces a larger number of remote conditions down to a smaller number of configurations that do make a difference to outcomes, which are then included in the second level of the analysis which uses the more proximate conditions. It has the effect of reducing the number of logical remainders. It will be interesting to see if this is the direction that the VAWG review team are heading.

PS 2014 03 30: I have found some further references to two-level QCA:
 And for people wanting a good introduction to QCA, see