Prediction markets as a source of independent and continuous evaluation for development projects?

Imagine that at the beginning of a development project, even during the planning stage, the designers identified a number of observable events, which were expected to be achieved at different points in time throughout the lifetime of the project. Some might be very immediate, such as spending 90% of the annual budget by the end of each year, while others might be much longer term in nature, such as primary school completion rates in district x exceeding 90% by 2010. Well something like this happens already in most development projects, you might say. These types of events are described in the planning documents, and associated work plans. And later on, in progress reports.

But these statements of intentions are often not very publicly accessible, and they rarely have much consequence. Cutting off funding if specific performance targets are not reached rarely happens (in my experience) and probably for good reason, it is a very crude “sledgehammer” type of response. And often the donors themselves are not independent judges, they need their projects to be seen to be succeeding, and cutting off funding is effectively a criticism of their own previous judgements, as well as the recent performance of their grantee.

There is an alternative which might be able to provide more independent and continuous assessments of project progress. These are called "prediction markets" (also sometimes called information markets, decision markets, idea futures, and virtual markets).

Prediction markets allow a group of people to express an opinion over a period of time about the probability of an event occurring. A question is posed and people buy and sell shares in stocks representing possible answers to that question. The highest priced stock at the end of a period of time is the group's prediction. (A definition provided by

Prediction markets have been championed in James Surowiecki’s 2004 book, “The Wisdom of Crowds”, and widely discussed and used since then (see list of links below). They have been used to successfully predict political events (election outcomes), sports events (winners), market success of commercial products, and many other types of events. Recent well known users have included Google, Yahoo, Microsoft, and HP. The most important claim that has been made is that prediction markets can generate more accurate predictions of events than individual experts or highly structured planning / design processes involving multiple specialists.

It is possible that prediction markets could also be usefully applied to development projects. Two types of benefits might arise.

  • Firstly, the existence of the markets might generate incentives for a wider variety and larger number of people to become engaged in a discussion about aid and development. Even prediction markets that do not involve real money bets do manage to attract large numbers of participants, who get rewarded by social recognition and self-esteem, if they win.
  • Secondly, those responsible for aid budgets might get more accurate information about the expected performance of their portfolio of projects than they do from those who are directly responsible for the implementation of these projects.

The challenge would be how to create incentives for project managers to publicly disclose information about their project and its performance. For example, via project websites. Rewards could be given to project managers where:

  • the number of participants in the prediction market was large, relative to previous comparble markets
  • the most favored bet was successful (predicted the observed outcome)

But this then raises the question of who would provide the rewards. Donors to the project might have the same reservations as project manegsr about disclosure of information, and reluctance to see negative predictions proved correct. The alternative would be to find independent third parties, possibly specialist NGOs, who might have an interest in promoting greater transparency by aid agencies.

Project prediction markets could have different uses at different stages. During the implementation of the project the prediction market would be providing real time feedback on expectations about whether a project was likely to succeed, that might encourage corrective behaviours by project managers. At the end of the project, when success/failure has been defined and winning bets paid off, it would be useful to compare the project manager’s own bets against the market as whole. And to analyse any discrepancies between them, and any lessons to be learnt from these.

Prediction markets can be open to the public, or internal (as used in Google, Yahoo, Microsoft, and HP). The proposal outlined above is for the use of public prediction markets in development project outcomes. But to allow and encourage project "insiders” to participate, on condition that their bets are disclosed. In the same way that directors of companies can buy and sell shares in their own company, but they are normally required to disclose these dealings.

Incidentally, the operation of prediction markets might also generate a modest income for development purposes, by using open source, proprietary or web-hosted software to host the market (see the Wikipedia listing).

An experiment

Please take part in this very experimental prediction market, where the prediction concerns the achievement of one of the Millenium Development Goals (MDG). Go to
You can leave your comments and questions in the Discussion section. Note that you have $5,000 token dollars available to spend. These are called "inkles". Bear in mind that this particular MDG prediction market is very much in the beta stage, where I expect there will be quite a few problems that will need to be sorted out.


Assumptions, evidence and multiple stakeholders

Over the last few months I have been on the sidelines of a review of an NGO funding mechanism. The review report has been drafted, then re-written. But as yet, as far as I can see, three major issues have not been addressed. These are likely to be relevant to many other multi-donor NGO funding mechanisms.

Issue No. 1: The treatment of key assumptions

The first issue is core funding of NGOs (national and local). At the centre of the original project design was the belief that that provision of core funding will make an important difference to how NGOs work. The review team recognised this idea. But they did not then question or explore in any detail how the provision of core funding will lead to better development outcomes. Yet this was undoubtedly the potential killer assumption in the centre of the project design. In fact there are two linked assumptions here, that both needed examination, even if only at a desk level.

The first assumption is that core funding will increase the freedom and autonomy of NGOs. This assumption could have been explored by looking at the different NGOs that had been funded by the project, and then making some comparisons.

Firstly, by comparing NGOs where project provided core funding was a big versus small proportion of the NGO's overall budget. In the review there was no table showing such figures, though they were readily available, and though there were significant differences between NGOs in this respect. Is there any evidence of autonomy being greater where core funding was a bigger proportion of an NGO's income? Or are other factors more important in determining autonomy? An even bet, I suspect.

Secondly, by comparing the extent of the constraints imposed on NGOs by core funding mechanism, versus the constraints the NGOs experienced when using other sources of funding. Did the review team ask NGOs to compare the project (as core funder) to their other donors in terms of the constraints they imposed, and what did they find out about the differences? Complaints about funding procedures need a comparator.

The second assumption is that increased freedom and autonomy of NGOs will lead to better development outcomes.

Here it would be useful to compare the core funded NGOs’ performance against that of other NGOs who are more constrained by their donors (e.g. as a result of their project specific funding), but working on the same type of development outcomes. For example, where both NGOs work on education sector issues. At the very least it would have been possible to identify some of these cases through interviews with NGOs, and maybe even interview some of them, to at least get to the stage of developing some indicative hypotheses.

The reason for making such a comparison is that there are some good counter arguments in favour of constraint as necessary component of creativity, versus privileging freedom and autonomy. Biological evolution is the most creative process we know of, and that process works through the imposition of a very severe constraint: the need to be able to adapt to the current environment, or die. Architecture is another field where it is recognised that the presence of constraints can drive creative solutions. There have also been extensive research on the role of constraints in the fields of art and literature.

Issue No. 2: The use of evidence

The second issue was about the use of evidence. Although there had been an annual review earlier in the year, important lessons have not yet been learned from the experience. In that review there was extensive and selective use of unattributed comments by NGOs, with no information presented on how representative each of these views were. Understandably that caused major problems for the acceptance of that review. The first and second draft of the mid-term review seemed to continue that questionable tradition, albeit with a little more balance.

The alternative approach, which had been proposed before the most recent review, was that by default, all comments made by NGOs should be from identifiable sources. Exceptions could then be made where there were explainable reasons why identities had to be withheld. The assumptions behind this proposal were that:

* NGOs are mature organisations led by mature people who have a working relationship with the funder, which can withstand open expression of criticisms. Not the reverse.

* NGOs need to be confident and assertive, if they are to be effective advocates. If they cannot openly express their critical views to their own donor, how can they ever be effective advocates of critical views to less sympathetic audiences?

In contrast, the review made a brief and sympathetic reference to the earlier reviews use of evidence from NGOs, and then focused on the issue of whether results of the review interviews should be quantified or not. This was not the primary issue, and does not even need to be seen as an either/or choice. The primary issue with the both review methodologies was how transparent and trustworthy the process of data collection and analysis is. The continued selective use of unattributed comments weakened the value of both reviews.

Given there are only a small number of NGOs that had been funded it would have been quite easy to tabulate, using text not numbers, all the answers given to a number of key questions, using one table per question, and to use these in the respective relevant sections of the report.

Issue No. 3: Multi-stakeholder involvement

The third issue was multi-stakeholder involvement. The problematic nature of multi-stakeholder involvement in strategy design and evaluation was barely recognised. The project design required a common goal and convergent activities working towards that goal. Yet it involved working with NGOs who are autonomous, diverse and sometimes conflicting in their views of the world. The project's ability to find a common goal, or to mobilise people around a common goal, was in practice very very limited. Similar challenges face the whole issue of appropriate NGO representation on the project’s governing body.

Nevertheless, in this context the review proposed “Widening the dialogue on problem definition and strategy development, bringing together NGOs, government, donors and others, and using competitive funding to NGO consortia to channel demand and support the identified priorities”. And at the same time “Limit the role of the [management team] to administering grants and allied activities, avoiding other activities of a more interventionist type that might undermine the central aim”.

The review proposed that project’s strategy be defined via the proposed steering committee, and supported via “strategic issues” meetings involving a wider group of stakeholders. Although proposals were made re the inclusion of different categories of people, based largely on expertise categories, how they are chosen, and subsequently re-appointed and replaced was the more challenging question that was answered.

There is a counter argument that an independent review team could have given some attention to. That is, project’s strategy should be developed by a limited group of identifiable stakeholders with visible interests. And that NGOs using project funding should also be encouraged to seek funding from alternative sources. Overall, the project (or its donors) should be encouraging the development of plurality of funding sources, representing a diversity of strategies. Rather than trying to merge many conflicting interests within the strategy of one funding mechanism, in a non-transparent fudge that satisfies no one.

Evidence that the (development) world is getting better

...a new approach to monitoring and evaluation ;-)

I have recently been reviewing the language we use, in the world of development aid, and come to conclusion that there is an accumulating body of evidence that the world is getting better.

Here are some examples of changes I have noticed. If you have noticed other similar changes, please post them as Comments below.

  • In the past we only had projects, now we have initiatives (updated 20 May 2010) 
    • In the past we only had projects, but we have interventions (June 2011, see DFID Business Plan How to Do guidance)
  • In the past we had plans, but now we have strategies 
  • In the past we did research, but now we do analytic work 
  • In the past we interpreted data, but now we are involved in sensemaking  (updated 29 May 2010) 
  • In the past we just had stories, but now we have narratives  (updated 29 May 2010) 
  • In the past we just did monitoring and evaluation, but now we do management for development results (MDR)
  •  In the past we were concerned about coordination, but now we are concerned about harmonisation 
  • In the past we only wanted things to work but now we expect them to be fit-for-purpose 
  • In the past we only had interests, but now we have passions 
  • In the past we had problems, but now we only have issues 
  • In the past we only had news, but now we have breaking news 
  • In the past we were just donors, but now we are development partners 
  • In the past we were just NGOs, but now we are Civil Society Organisations 
  • In the past we had some concepts that were not very practically useful but now we have "sensitising concepts" 
  • In the past we took a particular persepective... now we have an analytical lens (13 April 2011) 
  • In the past we used to stimulate discussion, but now we open a space for a dialogue (or versions thereof) (14 April 2011) 
  • In the past we used to ask a question or make a point in a conference, but now we make an intervention (or is this now also passe?) (14 April 2011)
  • In the past we used to search the internet, but now we do horizon scanning work (July 17, 2012)

Integrating funding applications and baseline surveys

This idea falls into the category of "things I should have learned years ago!"

Over the last year or so I have been working on a number of research funding mechanisms in Ghana and Vietnam. Both involve something like the traditional two stage process of inviting simple / short Concept Notes for research, then from amongts the best of these, inviting fully developed Proposals for research. Quite a lot of information is provided by the grantee-to-be by this process, as well as by those who dont end up qualifying as grantees.

But up to now it has never occured to me that we should design this process to simultaneously gather information about the baseline status of these organisations and their activities, for subsequent monitoring and evaluation purposes. Especially information about their relationships with other actors at this early stage, which is of increasing interest to me. Instead, in one instance, we have organised a separate baseline survey some months later, involving the approved grantees only. Needless to say, this did not impress the new grantees, who had thought they had finished with form filling for the time being!

Another advantage of this approach is that by including the non-successfull applicants, we gather some wider contextual data, that will put the characteristics of the approved grantees in a broader perspective. Some of this information may reflect on the capabilty of the non-successful applicant, but other data may be more independent.

I have also been pushing a number of grant making bodies to use the application process to generate predictions of subsequent success, on a numerical scale. These predictions can later be compared to actual / perceived success, some years down the road. Not only is the correlation between the prediction and outcome of interest, so will be the positive and negative outliers (the unexpected successes and the unexpected failures). This is where case study investigations could help us learn a lot about what makes the difference between success and faulure.

The risks of big increases in aid flows to poor countries

The latest IDS Policy Briefing (Issue 25) is titled "Increased Aid: Minimising Problems, Maximising Gains", and contains a summary of papers in a recent IDS Bulletin on the same theme. Many of the concerns raised in this Briefing relate to my own experience and echo my pre-existing concerns. In particular:
- Donors will be pre-occupied with issues of quantity, and their attention will be diverted away from the more crucial question of the quality and effectiveness of aid.

- Absorptive capacity - the ability to put aid to effective use - is already in doubt as a result of poor governance, rigid and unresponsive administrative systems, and above all, the shortage of human resources.

- Governance reforms, that might help with absorption problems, take time. Speeding them up to enable bigger aid flows, is unlikely to work

- Increased aid flows will weaken incentives for recipient government to reprioritise their existing expenditures and to improve governance

- Aid coordination efforts will not keep up with increased aid flows from multiple directions, and the burden on the host government of managing its aid will increase.

- If much of the increased aid comes in the form of loans rather than grants (from WB and others) then this will impede recipient governments' efforts to break out of indebtedness and dependence on aid.

- In smaller countries the increased volume of aid might drive up the value of the national currency making exports more expensive and undermining efforts to base growth on rising exports
In one of the (summarised) IDS papers, Ros Eyben argues that a surge in aid could magnify certain donors practices that disempower recipient governments and civil society. She is especially concerned about the potentially damaging impact of "results-based management" endorsed by donors at Monterey in 2002 as the optimal approach. Apparently because it enables donors to define what is happening, and undermines the ability of recipient governments to do so.

I am a little skeptical about the ability of any M&E method or approach to have much of an impact, but I do share her concern about the impact of large increases of foreign aid on national sovereignty and the accountability of governments to their people. Some time ago I sat in on a meeting held to discuss the findings of a report on the cost of achieving the MDGs in country x. The focus of the discussion was almost wholly on the accuracy of the calculations. What was ignored was the massive scale of the required increases and how (if delivered) they would effectively dwarf the country's own revenue sources. And even more importantly, the political implications of such a situation. What sensible government would pay much attention to its own population's expressed needs, when the bulk of its revenue was coming from external aid? No amount of "governance reform" would seem to be able to redress the effects of these perverse incentives.

I was reminded of the slogan of the American revolution "No taxation without representation", and wondered whether the truth of the reverse also needs some publicity: "No (effective) representation without taxation"

I suspect that the use of results-based management may be more a symptom, rather than a cause; a too-simple response to the perverse accountability effects of large aid flows: governments becoming less accountable to their people. RBM might be more useful if it more directly addressed the causes of this quandary: the imbalance of aid revenue versus tax revenue. It could do this by including as a key performance measure the ability of the recipient government to increase its tax revenues. Achievement could be associated with increased aid revenues, but not otherwise. The target ratio between tax and aid revenues would of course have to be identified country by country, because tax raising capacity will vary widely. Ironically this is one performance measure that the people in the recipient countries would probably not be very happy with, but which could in the longer term empower them more in relation to their governments, than any large increase in aid.

I am sure there are already many cases where increased taxation is an objective of concern to donors and host governments. But how often, if ever, have donors been willing to limit their aid flows to any percentage or multiple of recipient countries tax revenues? Here there is another set of perverse incentives at work, to do with the nature of the "aid business". That to do so would be to undermine the resource base on which aid bureacrats earn their living and where they cand find future promotion opportunities. Some will be able to create career opportunities out of a well argued need to cut aid or to limit aid growth, but not all - by definition.

Further dis-incentives would probably lay ahead, should maximum levels of aid per country ever be agreed. Donor countries would have to agree who would contribute how much to a limited pot, when in fact many were seeking to maximise their influence. This would be the ultimate "hamonisation" challenge!

Late Note (23/03/06) From "Business in Africa"

"...Taking a look at the country’s national budget, virtually all spend is draft estimates and not accounted for cent by cent. An example of this lack of budget control is highlighted in the SAIIA report, where the director of monitoring and evaluation in the ministry of economic development and planning is quoted as saying, “What we are saying is that out of the 60 billion kwacha in the 2002/2003 budget, we only know of 37 billion kwacha that was used. We don’t know what happened to the rest because of lack of data.”

Foreign support makes up close to half of Malawi’s budget, which means that it is difficult to implement policies when the budget is dependent largely on sources from outside the country. In 2000, International Monetary Fund (IMF) aid and other donor funding was suspended following high-level corruption, archaic fiscal discipline and poor governance under the Bakili Muluzi administration. The IMF and World Bank resumed aid in 2004 when the economic minister in Muluzi’s cabinet, Bingu wa Mutharika, was named president."

Okay, so running a close second is the need for a performance indicator on the availability of adequate government accounts. This would be in the interests of citizens and supporting donors. No conlfict of interest here, unless donors have privileged access to government accounts that citizens do not. Unfortunately this is not an unknown occurence.

Another part of this latest story is the stop-go nature of aid flows. In highly aid dependent countries that cannot be a good way to proceed. What sort of performance monitoring system would generate such clumsy responses? Either one that was only barely working, or being ignored by its own proponents (or both).