In mathematics and computer science, an optimization problem
is the problem of finding the best solution from all feasible solutions. There
are various techniques for doing so.
Science as a whole can be seen as an optimisation process,
involving a search for explanations that have the best fit with observed
reality.
In evaluation we often have a similar task, of identifying
what aspects of one or more project interventions best explains the observed
outcomes of interest. For example, the effects of various kinds of improvements in health
systems on rates of infant mortality. This can done in two ways. One is by looking internally at
the design of a project, at its expected workings and then trying to find
evidence of whether it did so in practice. This is the territory of theory
led-evaluation. The other way is to look
externally, at alternative explanations involving other influences, and to seek
to test those. This is ostensibly good practice but not very common in reality,
because it can be time consuming and to some extent inconclusive, in that there
may always be other explanations not yet identified and thus untested. This is
where randomised control trials (RCTs) come in. Randomised allocation of
subjects between control and intervention groups nullifies the possible influence of other external causes. Qualitative Comparative Analysis (QCA) takes
a slightly different approach, searching for multiple possible configurations of conditions which are both
necessary and sufficient to explain all observed outcomes (both positive and
negative instances).
The value of theory led approaches, including QCA, is that the
evaluator’s theories help the search for relevant data, amongst the myriad of
possibly relevant design characteristics, and combinations thereof. The absence of a clear theory of change is often
one reason why baseline surveys are so expansive in contents, but yet rarely
used. Without a half way decent theory we can easily get lost. It is true that "There is nothing as practical as a good theory" (Kurt Lewin)
The alternative to theory led approaches
The alternative to theory led approaches
There is however an alternative search process which does
not require a prior theory, known as the evolutionary algorithm, the kernel of the process of evolution. The evolutionary processes of variation, selection and
retention, iterated many times over, have been able to solve many complex
optimisation problems such as the design of a bird that can both fly long distances and dive
deep in the sea for fish to eat. Genetic algorithms
(GA) are embodiments of the same kinds of process in software programs, in
order to solve problems of interest to scientists and businesses. These are
useful in two respects. One is the ability to search vary large combinatorial
spaces very quickly. The other is that they can come up with solutions involving
particular combinations of attributes that might not have been so obvious to a
human observer.
Development projects have attributes that vary. These
include both the context in which they operate and the mechanisms by which they
seek to work. There are many possible combinations of these attributes, but
only some of these are likely to be associated with achieving a positive impact
on peoples’ live. If they were relatively common then implementing development
aid projects would not be so difficult. The challenge is how to find the right
combination of attributes. Trial and error by varying project designs and their implementaion on the ground is a good idea in
principle, but in practice it is slow. There is also a huge amount of systemic
memory loss, for various reasons including poor or non-existent communications
between various iterations of a project design taking place in different
locations.
Can we instead develop models
of projects, which combine real data about the distribution of project attributes with variable
views of their relative importance in order to generate an aggregate predicted result?
This expected result can then be compared to an observed result (ideally from
independent sources). By varying the influence
of the different attributes a range of predicted results can be generated, some
of which may be more accurate than others. The best way to search this large
space of possibilities is by using a GA. Fortunately Excel now includes a
simple GA add-in, known as Solver.
The following spreadsheet shows a very basic example of what
such a model could look like, using a totally fictitious data set. The projects
and their observed scores on four attributes (A-D) are shown on the left. Below them is
a set of weights, reflecting the possible importance of each attribute for the
aggregate performance of the projects. The Expected Outcome score for each
project is the sum of the score on each attribute x the weight for that
score. In other words the more a project has an important attribute (or combination of these) the higher will be its Expected Outcome score. That score is important only as a relative measure, relative to that of the other projects in the model.
The Expected Outcome score for each project is then compared to an Observed Outcome measure (ideally converted to a comparable scale), and the difference is shown as the Prediction Error. On the bottom left an is aggregate measures of prediction error, the Standard Deviation. The original data can be found in this Excel file.
The Expected Outcome score for each project is then compared to an Observed Outcome measure (ideally converted to a comparable scale), and the difference is shown as the Prediction Error. On the bottom left an is aggregate measures of prediction error, the Standard Deviation. The original data can be found in this Excel file.
The initial weights were set at 25 for each attribute, in
effect reflecting the absence of any view about which might be more important.
With those weights, the SD of the Prediction Errors was 1.25 After 60,000+
iterations in the space of 1 minute the SD had been reduce down
to 0.97. This was achieved with this new combination of weights: Attribute A:19, Attribute B: 0, Attribute C: 19, Attribute D: 61.The
substantial error that remains can be considered as due to causal factors
outside of the model (i.e. as is described by the list of attributes)[1].
It seems that it is also possible to find least appropriate
solutions, i.e, those which make the least accurate Outcome Predictions. Using the GA set to find the maximum error, it was found that in the
above example a 100% weighting given to Attribute A generated a SD of 1.87.
This is the nearest that such an evolutionary approach comes to disproving a theory.
GA deliver functional rather than logical proofs that
certain explanations are better than others. Unlike logical proofs, they are
not immortal. With more projects included in the model it is possible that there may be a fitter
solution, which applies to this wider set. However, the original solution to
the smaller set would still stand.
Models of complex processes can sometimes be sensitive to starting conditions. Different results can be generated from initial settings that are very similar. This was not the case in this exercise, with widely different initial weighting’s evolving and converging on almost identical sets of final weightings e.g. 19, 0, 19, 62 versus 61) producing the same final error rate. This robustness is probably due to the absence of feedback loops in the model, which could be created where the weighted score of one attribute affected those of another. That would a much more complex model, possibly worth exploring at another time.
Models of complex processes can sometimes be sensitive to starting conditions. Different results can be generated from initial settings that are very similar. This was not the case in this exercise, with widely different initial weighting’s evolving and converging on almost identical sets of final weightings e.g. 19, 0, 19, 62 versus 61) producing the same final error rate. This robustness is probably due to the absence of feedback loops in the model, which could be created where the weighted score of one attribute affected those of another. That would a much more complex model, possibly worth exploring at another time.
Small changes in Attribute scores made a more noticable difference to the Prediction Error. In the above model
varying Project 8’s score on attribute A from 3 to 4 increases the average
error by 0.02. Changes in other cells varied in direction of their effects. In more realistic models with more kinds of attributes and more project cases the results are likely to be less sensitive to such small differences in
attribute scores.
The heading of this post asks “Can we evolve explanations of
observed outcomes?” My argument above suggests that in principle it should be
possible. However there is a caveat. A set of weighted attributes that are
associated with success might better be described as the ingredients of an explanation. Further investigative work would be
needed to find out how those attributes actually interact together in real life. Before then, it would be interesting to do some testing
of this use of GAs on real project datasets.
Your comments please...
Your comments please...
PS 6 April 2012: I have just come across the Kaggle website. This site hosts competitions to solve various kinds of prediction problems (re both past and future events) using a data set available to all entrants, and gives prizes to the winner - who must provide not only their prediction but the algorithm that generated the prediction. Have a look. Perhaps we should outsource the prediction and testing of results of development projects via this website? :-) Though..., even to do this the project managers would still have a major task on hand: to gather and provide reliable data about implementation characteristics, as well as measures of observed outcomes... Though...this might be easier with some projects that generate lots of data, say micro-finance or education system projects.
View this Australian TV video, explaining how the site works and some of its achievements so far. And the Fast Company interview of the CEO
PS 9 April 2012: I have just discovered that there is a whole literature on the use of genetic algorithms for rule discovery "In a nutshell, the motivation for applying evolutionary algorithms to data mining is that evolutionary algorithms are robust search methods which perform a global search in the space of candidate solutions (rules or another form of knowledge representation)" (Freitas, 2002) The rules referred to are typcially "IF...THEN..."type statements
View this Australian TV video, explaining how the site works and some of its achievements so far. And the Fast Company interview of the CEO
PS 9 April 2012: I have just discovered that there is a whole literature on the use of genetic algorithms for rule discovery "In a nutshell, the motivation for applying evolutionary algorithms to data mining is that evolutionary algorithms are robust search methods which perform a global search in the space of candidate solutions (rules or another form of knowledge representation)" (Freitas, 2002) The rules referred to are typcially "IF...THEN..."type statements
[1] Bear in mind that this example set of attribute scores
and observed outcome measures is totally fictitious, so the inability to find a
really good set of fitting attributes should not be surprising. In reality some sets
of attributes will not be found co-existing because of their incompatibility
e.g. corrupt project management plus highly committed staff