Friday, June 26, 2015

Evolving better performing hypotheses, using Excel



You may not know this, but there is an Excel add-in called Solver. This can be used to evolve better solutions to computation problems.

It can also be used to identify hypotheses that have a better fit with available data. Here is how I have been using Solver in Excel....

I start by entering a data set into Excel that is made up of cases (e.g.25 projects), listed row by row. The columns then describe attributes of those projects, e.g. as captured in project completion reports. These attributes can include aspects of the project context, the way the project was working plus some outcome performance measures.

A filtering mechanism is then set up in Excel, where you can chose a specific sub-set of the attributes that are of interest (aka our hypothesis), and then all the projects that have these attributes are automatically highlighted.

The performance of this sub-set of projects is then tested, using a simple device called a Confusion Matrix , which tells us what proportion of the selected projects have "successful" versus "unsuccessful" outcomes (if that is the outcome of interest). Ideally, if we have selected the right set of attributes then the proportion of "successful" projects in the selected sub-set will be greater than their prevalence rate in the whole set of projects.

But given that the number of possible hypotheses available to test doubles with each extra attribute added into the data set, this sort of intuitive/theory led search could take us a long time to find the best available solution. Especially when the data set has a large number of attributes.

This is where Solver helps. Using its evolutionary algorithm the Solver add-in provides a quick means of searching a very large space of possible solutions. To do this there are three parameter which need to be set, before setting it to work. The first is the Objective, which is the value you want to maximise. I usually choose the "Accuracy" measure in the Confusion Matrix. The second is the range of cells whose values can be varied. These are the ones identifying the set of project attributes, which can be used to make a hypothesis. They can be set to (1), absent (0) or not relevant (2). The third is is the Constraints, which limit the values that these variable cells can take e.g. not negative, and nothing other than these three types of values.

In the space of a minute Solver then explores up to 70,000 possible combinations of project attributes to find the combination that generate the most accurate prediction i.e. a set of projects with the highest possible proportion of "successful" projects. In my recent trials, using a real data set, the accuracy levels have been up around 90%. I have been able to compare the results with those found using another algorithm which I have written about in earlier posts here, called a  Decision Tree algorithm. Despite being quite different algorithms, the solutions that both algorithms have found (i.e. the specific set of project attributes) have been very similar in content (i.e. the attributes in the solution), and both had the same level of accuracy.

An important side benefit of this approach to finding out "what works" is that by manually tweaking the solutions found by Solver you can measure the extent to which each attribute in the winning solution makes a difference to its overall accuracy. This is the kind of question many donors want to answer re the projects they fund, when they ask "What difference does my particular contribution make?"

If you want help setting up Excel to try out this method, and have a data set ready to work on, feel free to contact me for advice: rick.davies@gmail.com, or leave a comment below

2 comments:

  1. Hi Rick. I'm very interested in the application of solver in Excel, as you have described above. I'm wanting to have a tentative play but it would really help to have a look at the a sample set to see exactly how you've set up the parameters and what the output looks like in Excel. Would you be prepared to make your sample set available for me to have a look at. I may still need your help but would like to make a preliminary attempt myself. Thanks for this and many useful articles besides. Best regards Geoff Stone (geoff@ripple.gen.nz)

    ReplyDelete
    Replies
    1. Hi Geoff

      My apologies for not replying sooner. Please now go to https://evalc3.net/ and then get back to me as needed

      regards, rick davies

      Delete