Monday, March 07, 2016

Why I am sick of Evaluation Questions!

[Beginning of rant] Evaluation questions are are a cop out, and not only that, they are an expensive cop out. Donors commissioning evaluations should not be posing lists of sundry open ended questions about how their funded activities are working and or having an impact.

They should have at least some idea of what is working (or not) and they should be able to  articulate these ideas. Not only that, they should be willing, and even obliged, to use evaluations to test those claims. These guys are spending public monies, and the public hopefully expects that they have some idea about what they are doing i.e. what works. [voice of inner skeptic: they are constantly rotated through different jobs, so probably don't have much idea about what is working, at all]

If open ended evaluation questions were replaced by specific claims or hypotheses then evaluation efforts could be much more focused and in-depth, rather than broad ranging and shallow. And then we might have some progress in the accumulation of knowledge about what works.

The use of swathes of open ended evaluation questions also relates to the subject of institutional memory about what has worked in the past. The use of open ended questions suggests little has been retained from the past, OR is now deemed to be of any value. Alas and alack, all is lost, either way [end of rant]

Background: I am reviewing yet another inception report, which includes a lot of discussion about how evaluation questions will be developed. Some example questions being considered:
How can we value ecosystem goods and services and biodiversity?  

How does capacity building for better climate risk management at the institutional level
translate into positive changes in resilience

What are the links between protected/improved livelihoods and the resilience of people and communities, and what are the limits to livelihood-based approaches to improving resilience?


  1. What a delightful rant, Rick!

    Tho perhaps you could have titled it 'Why I am sick of particular types of evaluation questions' or 'Why I am sick of particular types of evaluation commissioners'.

    The evaluations that I have done lately have commenced with the commissioners and me sitting down together to develop the evaluation protocol, including the eval questions. What a fantastic improvement over when I used to do evaluations for UN agencies in which I would have foisted on me pages (literally!) of evaluation questions, most of which were to do with largely meaningless minutiae that missed the mark.

  2. Totally sensible view. Sometimes these questions listed go on for pages and they still expect the report within 20 pages...That's completely nonsensical..Time some hue & cry is made and glad to see you are leading the charge!

  3. maiwada zubairu has left a new comment on your post "Why I am sick of Evaluation Questions!":

    I agree with David - the title is misleading. One would have thought we are throwing away evaluation questions and alternatives are offered. Agree - what is needed is re-phrasing the evaluation questions to meet the two fundamentals of evaluations- Accountability and Learning

  4. On a related rant -- having to do with the NUMBER of questions in an evaluation ToR, much less the purpose of the questions themselves -- I was helping develop training materials for an agency. We agreed that ideally an evaluation should be focused on no more than 5-or-so MAIN questions. We tried to find actual examples of evaluation ToRs to use as case studies. Hard to find any that had fewer than 30 or more questions; some as many as 125 questions (like the pages of questions David McDonald referred to above). You are right, Rick, in your main point: There is something wrong if they (the agencies/implementers) didn't already have pretty good answers to some of their questions. And they need to be both realistic and strategic in identifying questions for which the evaluative effort should really be focused on.

    1. Hi Jim,

      We've gotten traction with the need for fewer questions working with the USAID Mission in Colombia. As a small number of questions is recommended in USAID guidance now, we have something to back up our assertion that this is a wise way to go. See:,, (Okay, sometimes they go to six EQs. But we're 16 task orders in, and three years, and I can think of only one that went above that.) However, I have to say I'm intrigued with Rick's idea that a hypothesis makes a bolder starting point than some of these wide-ranging, open-ended questions. I might put it in front of our open-minded USAID client and see if we can try a different angle - maybe hybrid to start.

    2. Hi Keri

      I think hybrid is the way to go. I would not throw out open ended questions but I would probably put them in as second priority, after eliciting and testing hypotheses reflecting the beliefs of the donors and implementers about how their programs are working

      In other recent contexts I have suggested that after having started with open ended questions (and perhaps being required to) the evaluation team tries to convert these into hypotheses by asking various stakeholders what they think the answers to these questions are, and then getting these questions shaped into a form that can then be tested through systematic inquiry during an evaluation

  5. Thank you Rick for raising fundamental questions here I hope we will have a rich debate that may shape the way evaluation questions are constructed. In my opinion it is time evaluation commissioners becomes specific on what they want to measure. I will proposed that at this point indicators set out in the initial project should be turn to specific it is time we begin to ask ourselves ..were we able to achieve this or that based on the indicator we set out to. Most evaluation question are set on the objectives but the objective had measurable indicators. So these indicators should form the basis of the evaluation questions and we can in addition to answering these questions we can add issues of spillover...end of ranting.

  6. I whole-heartedly agree. I think the use of too many, non-specific questions (often drawn from 'master lists' in the public domain, because 'that's what everyone else uses') has led to many meaningless evaluations. Often, in fact usually (!), there is not the resource envelope to accompany the huge list of questions/issues The Evaluator is asked to look at, and therefore what results is broad-based evaluation, trying to 'tick boxes' but of little depth and utilisation in terms of real learning and subsequent change. I think this issue of better Evaluation Questions ties in really closely with recent debates about over-ambitious evaluations and inadequate resources to implement them. I also worry that we are not using institutional memory or expertise/experience adequately - that if previous M&E and reflection/evaluation hasn't fitted in with the current 'modes of favour' then they are not robust enough to be worthy of consideration. Thanks for your thoughts, I think rants can be both cathartic and helpful to others as they realise they are not alone in struggling or being frustrated with certain aspects of practice.

  7. 'Evaluation questions' are indeed useless in providing value, as any private sector management consultant can tell you (they used the approach in the 1950s-1960s, but abandoned it because it was inefficient, made for meaningless/boring findings, and failed to yield evidence-based recommendations).

    Since then, they used a hypothesis-driven approach as you suggest (see:

    It's imperfect, and takes more effort/skill to pull off successfully. But it's greatly superior to the 'evaluation questions' approach most evaluators still use. Why don't evaluators learn from what the private sector already learned half a century ago?

    1. Hi Anon

      Thanks for your link above, which i have now followed up

      regards, rick

  8. Hi Rick, following the line of though of Jim Rugh, it came to my mind the mini-book "Actionable Evaluation Basics", by Jane Davidson. She states there that almost any evaluation needs around 5 main questions (two plus, or two less). Compared with the cases that are being metioned here, well... ;-)

    Best, Pablo

  9. I certainly identify with the rant and the comments, although I would not substitute hypotheses for questions. Applying the notion of evaluation utility, I find that negotiating a manageable number of USEFUL questions that, in AEA’s words, “ensure that an evaluation will serve the information needs of intended users” is the solution.
    It raises another issue, however: the prevalent tendering madness of launching terms of reference that are non-negotiable. I propose a solution that comes from my own experience hiring staff: an evaluator-centred commissioning process.
    1) Recruit evaluator(s) rather than call for proposals with pre-determined TORs.
    2) Engage with the best candidates based on their potential best match for you.
    3) Consult with references.
    4) Hire the best qualified candidate and develop the terms of reference together.

    1. Apologies. The authorship of this post above was intended to be made visible. It was made by

      Ricardo Wilson-Grau Consultoria em Gestão Empresarial Ltda
      Evaluation | Outcome Harvesting
      Rua Marechal Marques Porto 2/402, Tijuca, Rio de Janeiro, CEP 20270-260, Brasil
      Telephone: 55 21 2284 6889; Skype: ricardowilsongrau
      Direct via VOIP, dialing locally from USA or Canada: 1 347 404 5379

  10. Rick: we avoid this problem by narrowing down who will use and own the evaluation, then challenging them to define PURPOSES that are specific; and for each purpose we have them draft Key Evaluation Questions (all of this is part of Utilization-focused Evaluation). We spend a lot of time with them editing the Key Evaluation Questions. More on this at Cheers, Ricardo Ramirez

  11. This resonates with my experience as well. I work for an INGO that both commissions and conducts evaluations. Reflecting on why we stretch ourselves – and the evaluators we hire – so thin, I often feel like the OECD-DAC evaluation criteria play role. The 5 criteria (relevance, effectiveness, etc) – plus several others for evaluations of humanitarian interventions (coherence, connectedness, etc) – lead to dozens of evaluation questions. I believe some donors’ evaluation policies state that the DAC criteria should be followed in all (final) evaluations. But even when we have more flexibility to define our own evaluation questions, I find program managers default to including most or all the DAC criteria in their evaluation SOWs. ALNAP makes it clear in their guidance to “only use the criteria that relate to the questions you want answered”. I’d like to see a similar warning label for the DAC criteria on OECD’s materials.

  12. I find evaluation commissioners, especially UN agencies I've worked with, are not only attached to the long list of evaluation questions they have posed, but also wary of any deviation or exploration around those, e.g.., asking contribution related questions on results. Gender integration in evaluation is an example, where a separate section of women's participation questions was posed in an evaluation I did recently, and all they wanted was a 'rubber stamp' on their narrow view of gender integration.

  13. It is hard to disagree that multiple evaluation questions covering an entire programme is a recipe for disaster (and a waste of money) and likely to result in an evaluation report impressive in equal measure for its breath and its shallowness.

    Here at DFID we have attempted to deal with the problem by trying to reduce the number of evaluation questions to a bare minimum (and commensurate with the scale and budget of the evaluation) and then verifying exactly how the evidence generated against each evaluation will be used.

    Part of the problem of course is the consultation process that evaluation terms of reference go through as they develop which involves multiple advisers, programme managers and others all wanting to add a question of interest to them. The net result can quickly become those long lists of evaluation questions that various commentators have referred to. This is not to suggest that consultations should necessarily be pared down, but rather that evaluation commissioners need to be more savvy in discarding, truncating and editing suggestions for additional evaluation questions.

    I like the idea of setting and testing hypotheses which forces the commissioner to focus on and test particular aspects of the theory of change that under pin the intervention.

  14. Hi Jonathan

    Thanks for your comment, it was good to get a DFID perspective here (while recognizing that views may vary widely even within DFID)

    regards, rick