Friday, March 16, 2012

Can we evolve explanations of observed outcomes?

In mathematics and computer science, an optimization problem is the problem of finding the best solution from all feasible solutions. There are various techniques for doing so.

Science as a whole can be seen as an optimisation process, involving a search for explanations that have the best fit with observed reality.

In evaluation we often have a similar task, of identifying what aspects of one or more project interventions best explains the observed outcomes of interest. For example, the effects of various kinds of improvements in health systems on rates of infant mortality.  This can done in two ways. One is by looking internally at the design of a project, at its expected workings and then trying to find evidence of whether it did so in practice. This is the territory of theory led-evaluation.  The other way is to look externally, at alternative explanations involving other influences, and to seek to test those. This is ostensibly good practice but not very common in reality, because it can be time consuming and to some extent inconclusive, in that there may always be other explanations not yet identified and thus untested. This is where randomised control trials (RCTs) come in. Randomised allocation of subjects between control and intervention groups nullifies the possible influence of other external causes.  Qualitative Comparative Analysis (QCA) takes a slightly different approach, searching for multiple possible configurations of conditions which are both necessary and sufficient to explain all observed outcomes (both positive and negative instances).

The value of theory led approaches, including QCA, is that the evaluator’s theories help the search for relevant data, amongst the myriad of possibly relevant design characteristics, and combinations thereof. The absence of a clear theory of change is often one reason why baseline surveys are so expansive in contents, but yet rarely used. Without a half way decent theory we can easily get lost. It is true that "There is nothing as practical as a good theory" (Kurt Lewin)

The alternative to theory led approaches

There is however an alternative search process which does not require a prior theory, known as the evolutionary algorithm, the kernel of the process of evolution. The evolutionary processes of variation, selection and retention, iterated many times over, have been able to solve many complex optimisation problems such as the design of a bird that can both fly long distances and dive deep in the sea for fish to eat. Genetic algorithms (GA) are embodiments of the same kinds of process in software programs, in order to solve problems of interest to scientists and businesses. These are useful in two respects. One is the ability to search vary large combinatorial spaces very quickly. The other is that they can come up with solutions involving particular combinations of attributes that might not have been so obvious to a human observer.

Development projects have attributes that vary. These include both the context in which they operate and the mechanisms by which they seek to work. There are many possible combinations of these attributes, but only some of these are likely to be associated with achieving a positive impact on peoples’ live. If they were relatively common then implementing development aid projects would not be so difficult. The challenge is how to find the right combination of attributes. Trial and error by varying project designs and their implementaion on the ground is a good idea in principle, but in practice it is slow. There is also a huge amount of systemic memory loss, for various reasons including poor or non-existent communications between various iterations of a project design taking place in different locations.

Can we instead develop models of projects, which combine real data about the distribution of project attributes with variable views of their relative importance in order to generate an aggregate predicted result? This expected result can then be compared to an observed result (ideally from independent sources).  By varying the influence of the different attributes a range of predicted results can be generated, some of which may be more accurate than others. The best way to search this large space of possibilities is by using a GA. Fortunately Excel now includes a simple GA add-in, known as Solver.

The following spreadsheet shows a very basic example of what such a model could look like, using a totally fictitious data set. The projects and their observed scores on four attributes (A-D) are shown on the left. Below them is a set of weights, reflecting the possible importance of each attribute for the aggregate performance of the projects. The Expected Outcome score for each project is the sum of the score on each attribute x the weight for that score.  In other words the more a project has an important attribute (or combination of these) the higher will be its Expected Outcome score. That score is important only as a relative measure, relative to that of the other projects in the model.

The Expected Outcome score for each project is then compared to an Observed Outcome measure (ideally converted to a comparable scale), and the difference is shown as the Prediction Error. On the bottom left an is aggregate measures of prediction error, the Standard Deviation. The original data can be found in this Excel file.


The initial weights were set at 25 for each attribute, in effect reflecting the absence of any view about which might be more important. With those weights, the SD of the Prediction Errors was 1.25 After 60,000+ iterations in the space of 1 minute the SD had been reduce down to 0.97. This was achieved with this new combination of weights: Attribute A:19, Attribute B: 0, Attribute C: 19, Attribute D: 61.The substantial error that remains can be considered as due to causal factors outside of the model (i.e. as is described by the list of attributes)[1].

It seems that it is also possible to find least appropriate solutions, i.e, those which make the least accurate Outcome Predictions. Using the GA set to find the maximum error, it was found that in the above example a 100% weighting given to Attribute A generated a SD of 1.87. This is the nearest that such an evolutionary approach comes to disproving a theory.

GA deliver functional rather than logical proofs that certain explanations are better than others. Unlike logical proofs, they are not immortal. With more projects included in the model it is possible that there may be a fitter solution, which applies to this wider set. However, the original solution to the smaller set would still stand.

Models of complex processes can sometimes be sensitive to starting conditions. Different results can be generated from initial settings that are very similar. This was not the case in this exercise, with widely different initial weighting’s evolving and converging on almost identical sets of final weightings e.g. 19, 0, 19,  62 versus 61) producing the same final error rate. This robustness is probably due to the absence of feedback loops in the model, which could be created where the weighted score of one attribute affected those of another.  That would a much more complex model, possibly worth exploring at another time.

Small changes in Attribute scores made a more noticable difference to the Prediction Error. In the above model varying Project 8’s score on attribute A from 3 to 4 increases the average error by 0.02. Changes in other cells varied in direction of their effects. In more realistic models with more kinds of attributes and more project cases the results are likely to be less sensitive to such small differences in attribute scores.

The heading of this post asks “Can we evolve explanations of observed outcomes?” My argument above suggests that in principle it should be possible. However there is a caveat. A set of weighted attributes that are associated with success might better be described as the ingredients of an explanation. Further investigative work would be needed to find out how those attributes actually interact together in real life.  Before then, it would be interesting to do some testing of this use of GAs on real project datasets.

Your comments please...

PS 6 April 2012: I have just come across the Kaggle website. This site hosts competitions to solve various kinds of prediction problems (re both past and future events) using a data set available to all entrants, and gives prizes to the winner - who must provide not only their prediction but the algorithm that generated the prediction. Have a look. Perhaps we should outsource the prediction and testing of results of development projects via this website? :-) Though..., even to do this the project managers would still have a major task on hand: to gather and provide reliable data about implementation characteristics, as well as measures of observed outcomes... Though...this might be easier with some projects that generate lots of data, say micro-finance or education system projects.

 View this Australian TV video, explaining how the site works and some of its achievements so far. And the Fast Company interview of the CEO

PS 9 April 2012:  I have just discovered that there is a whole literature on the use of genetic algorithms for rule discovery "In a nutshell, the motivation for applying evolutionary algorithms to data mining is that evolutionary algorithms are robust search methods which perform a global search in the space of candidate solutions (rules or another form of knowledge representation)" (Freitas, 2002) The rules referred to are typcially "IF...THEN..."type statements

[1] Bear in mind that this example set of attribute scores and observed outcome measures is totally fictitious, so the inability to find a really good set of fitting attributes should not be surprising. In reality some sets of attributes will not be found co-existing because of their incompatibility e.g. corrupt project management plus highly committed staff

No comments:

Post a Comment