Tuesday, March 03, 2026

Optimising method selection: an introduction to the set covering problem — and a tool to help





Caveat Emptor: I delegated the writing of this blog posting to Claude AI(Sonnet 4.6), based on an extended prior dialogue on the subject below, then a summary prompt of what was wanted. My post production edits were quite limited
--0O0--

There is a class of problem that turns up repeatedly in evaluation planning, and in many other fields, that most of us solve informally and imprecisely. The problem is this: given a set of needs to be addressed, what is the smallest combination of responses that covers all of them?

In evaluation this might be: given a set of evaluation questions I need to answer, what is the minimum combination of methods that gives me adequate coverage across all of them? In public health it might be: which combination of clinics ensures every neighbourhood has access to at least one? In software testing: which set of test cases exercises every code path? In logistics: which depots can serve every delivery zone?

These are all instances of what computer scientists call the Set Covering Problem — a well-studied problem in combinatorial optimisation that dates back to the 1970s. The formal version asks: given a collection of sets, find the smallest sub-collection whose union covers all elements of a target universe.

Why it matters for evaluation practice

When designing an evaluation, practitioners face exactly this structure. We have a list of evaluation questions (or dimensions of quality, or stakeholder concerns), and a repertoire of methods — each of which can address some but not all of those questions to a satisfactory level. Choosing methods one at a time, based on familiarity or habit, rarely produces the most efficient combination. We either end up with redundant overlaps in some areas and blind spots in others, or with far more methods than the budget or timeline can support.

A more systematic approach asks: which combination of methods is both complete (covers all questions at an adequate level) and minimal (uses as few methods, and as little resource, as possible)?

How the optimisation works

For a small number of methods and questions, you could check all possible combinations by hand. But the number of combinations grows exponentially — with ten methods, there are over a thousand possible subsets to evaluate. This is where an algorithm helps.

The simplest approach is a greedy algorithm: at each step, pick the method that covers the most currently-uncovered questions, then repeat until everything is covered. This is fast and usually finds a good solution, but not necessarily the best one.

A more thorough approach — exhaustive search — systematically checks all combinations up to a specified size and returns every minimal solution. This is slower but reveals the full landscape of equally-good options, which is often more useful than a single answer, particularly when cost or other practical constraints come into play.

The Coverage Optimiser

With substantial coding help from Claude AI, I have been developing a small browser-based tool — the Coverage Optimiser — that applies both approaches to a user-defined matrix of methods and question types.

▶  Try the Coverage Optimiser [When you get there click on Introduction, to get suggestions on how to explore the functions of the app]

The default example matrix uses ten foresight methods (Scenario Planning, Delphi, Horizon Scanning, and others) rated against five question types — Descriptive, Valuative, Explanatory, Predictive, and Prescriptive — at HIGH, MEDIUM, or LOW in terms of their usefulness in building a futures perspective into an evaluation. The tool finds the minimal combinations of methods that achieve the desired coverage level across all question types.

Each solution is displayed with:

  • the methods involved, each with its cost rating
  • a question-by-question coverage check
  • the total cost of the combination
  • an overlap score — the number of questions covered by more than one method in the solution

Overlap is worth attending to: it indicates redundancy, which in evaluation terms means resilience. If one method proves impractical in the field, a solution with higher overlap is more likely to remain viable.

The matrix itself is fully editable in a dedicated Matrix Editor tab. You can rename methods and question types, adjust HIGH/MEDIUM/LOW ratings on a choosen criteria, set importance weights for each question type (0–10), set cost ratings for each method (0–10), sort rows by any column, import and export as JSON, and print or save the matrix or results as PDF.

Note: The Coverage Optimiser runs entirely in your browser — no login required, no data sent anywhere.

Beyond methods and questions

The tool is not limited to foresight methods or evaluation questions. The underlying logic applies wherever you have a set of options and a set of requirements, and where each option partially addresses some requirements. Some other framings that could be loaded into the same tool:

  • Solutions × Problems: which combination of policy interventions covers the broadest range of identified problems?
  • Stakeholders × Information needs: which combination of engagement activities ensures all key stakeholder groups have their core information needs met?
  • Data sources × Indicators: which combination of data collection instruments covers all required indicators, at minimum cost?
  • Partners × Geographic areas: which combination of implementing partners ensures all target districts are reached?

In each case the matrix structure is the same, the optimisation logic is identical, and only the labels and rating criteria change.

An invitation to experiment

The tool is best understood by using it. I would encourage anyone planning a multi-method evaluation, or foresight exercise, to load in their own methods and questions — even with rough ratings to start — and see what combinations emerge. The exhaustive search mode is particularly useful for revealing that several equally-minimal combinations exist, which opens up a more deliberate conversation about which is preferable given cost, feasibility, or complementarity.

I am continuing to develop the tool and would welcome feedback on the matrix structure, the rating scales, or applications I have not yet considered.

Those who have used EvalC3 will find a family resemblance here: both tools use systematic search to find efficient combinations — EvalC3 searching for attribute combinations that predict specific individual outcomes, the Coverage Optimiser searching for method combinations that cover multiple requirements. The underlying computational logic is related, even if the problems look different on the surface. 

Accessing the code: In addition to trying out the app, as it already exists online, you can also download a copy of the code and have it working independently on your own website. Go to the app online, right click your mouse, select View Page Source, copy the entire code, paste it into a txt file, rename the text file to end in .html not .txt, click on that file in a directory to open it up in a web browser. Simples, yes?


Further reading


  • A lot of the available reading on set covering algorithms is in the computer science domain. For a more accessible starting point, see the Wikipedia article on the Set cover problem 
  • The next blog posting explores comparisons with QCA.
  • This exercise in method development was an unplanned outcome of the writing of a book chapter on bridging the fields of evaluation and foresight. The book is...
    • Lea Kleinsorg, Jan Tobias Polak, Christian Grünwald (2027): Futures-informed evaluation: Methodological approaches and empirical applications, SpringerNature, Heidelberg.




Thursday, December 25, 2025

Exracting additional knowledge and performance from a configurational model that already has wide coverage

 
A decision tree algorithm, as available within EvalC3, can generate a classification tree  (a set of predictive models) of the kind shown here.


Some of the models (each branch is a model) are very detailed (i.e. has lots of attributes) and have narrow coverage. Such as HasQuotas+NotPost Conflict Situation+High Level of Human Development+ Low Womens Status = Low levels of womens representation in Parliament, which covers two cases (Senegal, Tanzania). 

Others are quite simple, with only two or three attributes and can have much wider coverage. Such as HasQuotas+ IsPostConflict = High levels of womens representation in Parliament, which covers two six cases (Burundi, Ethiopia, Mozambique, Namibia, South Africa, Uganda)

These wide coverage models may have unexplored potential, in the form of unexploited information content within the cases they cover. The raw (i.e. numerical) outcome data for the cases they cover only can be examined and recalibrated i.e re-dichotomised into two new sub-groups representing relatively higher versus lower outcome values within that set only

A new configurational analysis can then focus on that sub-set of cases to see if (a) any of the pre-existing attrubutes could predict membership of the two sub-groups, or (b) if any additional attributes, based on other knowledge of these cases, could do so.The ability to predict such finer grained performance differences would be a significant improvement.

This analytic step is a complementary move to that known as "pruning", where the removal of a mode attribute improves coverage, at the cost of precision.  Here an extra attribute is sought that will improve precision but at the cost of coverage. Perhaps it could be called "grafting"...

Postscript: But how significant will this addition to the model be? If, as above, there are six cases involved, there are 2^6 possible binary groupings of these case i.e 32. So any one grouping of two sets of cases has a 1/32 or 3.125% chance of occuring randomly (if the cases are causally independent). 

Tuesday, December 02, 2025

Objectives as data: The potential uses of updatable outcome targets

 The context

A specialist agency is funding more than 40 different partner organisations, each working in a different part of the country but with the same overall objective of increasing people's levels of physical activity (because of the positive health consequences). These partners are often working with quite different communities, and all have substantial degree of independence about how they work towards the overall objective. 

Some agency representatives have asked about the nature of the target that program as a whole is working towards, and have emphasised how essential it is that there be clarity in this area. By target they mean an actual number. Specifically the percentage of people self-reporting that they achieve a certain level of physical activity each week, as identified by an annual survey that is already underway and will be repeated in future.

Possible responses

In principle it would be possible to set a target for the proportion of the population reporting being physically active. Such as 75%.  But it would be very hard to identify an optimal target percentage, given the diversity of partner localities, and the communities within these. 

Relative targets may be more appropriate. Such as a 25% increase in reported activity levels. Especially if partners were each asked to identify what they think are achievable percentage increases in their own localities within the next survey period. This estimation would take place in a context where these partners already have experience working in those locations, identifying some of the things that work and dont work. My hypothesis, yet to be tested, would be that these partners will make quite conservative estimates. And if so, this might come as some surprise to the donor and perhaps lead to some revision of their own expectations

Taking this idea further, partners could be periodically asked if they wanted to adjust their expectations upwards or downwards , of the change that could be achieved - in the time remaining in the interventions lifespan. Subject to being able to explain the rationale for doing so. My second hypothesis is that this number, and commentary, could be a valid and useful form of progress reporting in its own right.

Making sense of the responses

An assessment of overall progress over longer time scale would need to consider both the scale of ambitions and the extent of their achievement. These can't be combined into one number based on a simple formula because any such number could be achieved by adjustment of expectations and or performance. However it could be usefully represented by a scatterplot, with data points reflecting each of the partners, of the kind shown below.

The location of partners in different quadrants suggests different implications about how the different partners should be managed

  • High ambition/low achievement: May need additional support, capacity building, or problem-solving
  • Low ambition/low achievement: May need fundamental partnership restructuring or exit considerations
  • High ambition/high achievement: Candidates for scaling, sharing learning, reduced support intensity
  • Modest ambition/high achievement: Opportunities to stretch ambitions

This framework also provides plenty of potentially useful analytic questions

  • Are ambitions increasing or decreasing?
  • Is the gap between expected and actual narrowing or widening?
  • For a given level of actual achievement did differences in expectations have any role or consequences
  • For a given level of expected change what might explain the differences in the partners actual achievements
  • How do individual partners positions within this matrix change over time? Are there distinct types of trajectories and how can these differences be explained?

In summary

A single numerical value based on the data in this matrix will provide a meaningless simplification. 

In contrast, a scatterplot visualisation can generate multiple potentially useful perspectives. 

It is more useful to see targets as necessarily malleable responses to changing conditions, than as unarguable reference points.

Postscript

There is a type  of reinforcement learning algorithm known as Temporal Difference Learning (TDD), that embodies a very similar process. It is described as "a model-free reinforcement learning method that learns by updating its predictions based on the difference between future predictions and the current estimate". Model-free means it has no built in model of the world it is working in. 

When implemented as a human process it is vulnerable to gaming, because the agents (humans) are aware of the system's mechanics, unlike the neural networks or simplified agents typically used in computational TD learning. But one adaptation, suggested by Gemini AI, is to "reward partners not just for the +/-gap, but for the accuracy of their final predictions over multiple cycles". Relatively higher accuracy, over multiple time periods, might be indicative of potentially generalisable / replicable delivery capacity, usable beyound the current context.

Tuesday, June 18, 2024

On two types of Theories of Change: Temporal and atemporal, and how they might be bridged



There are two quite different ways of representing theories of change – of the kind that might be useful when planning and monitoring development programmes of one kind or another.

The first kind is seen in representational devices such as the Logical Framework, Logic Models and boxes-and-arrows type diagrams. These differentiate events according to their location at different points over time, taking place between the initial provision of funding, its allocation and use and then it's subsequent effects and final impacts. These are temporal models.

The second kind, seen much less often, are seen in the analyses generated by Qualitative Comparative Analysis (QCA) and simple machine learning methods known as Decision Trees or Classifiers.  Here the theory is in the form of multiple configurations of different attributes that are associated with desired outcome, and its absence. Those attributes may be of the intervention and/or its context.  The defining feature of this approach is the focus on cases and differences between cases, rather than different points or periods in time. These cases are often geographical entities, or groups or persons, which have some persistence over time. They are effectively atemporal models.

Each of these approaches have their own merits. Theory of change which describes the sequence of expected events over time and how they relate to each other is useful for planning, monitoring and evaluation purposes.  But it runs the risk of assuming a homogeneity of effects across all locations where it is is implemented. On the other hand, a QCA-type configurational approach helps us identify diversity in contexts and implementations, and its consequences. But it may not have any immediate management consequences, about what needs to be done when.

One of my current interests is exploring the possibility of combining these two approaches, such that we have theories of change that differentiate those events over time, while also differentiating cases across space where those events may or may not be happening. 

One paper which I've just been told about is exploring these possibilities, as seen from a QCA starting point:Pagliarin, S., & Gerrits, L. (2020). Trajectory-based Qualitative Comparative Analysis: Accounting for case-based time dynamics. Methodological Innovations, 13.  In this paper the authors introduce the innovative idea of cases as different periods of time in the same location, where each of those subsequent periods of time may have various attributes of interest present or absent, along with an outcome of interest being present or absent.  This approach seems to have potential for enabling a systematic approach to within-case investigations complementing what might have been prior cross-case investigations.  There is the potential to identify specific attributes, or combinations of these, which are necessary or sufficient for changes to take place within a given case.

Somewhat tangentially...

The same paper reminded me reminded me of some evaluation fieldwork I did in Burkina Faso in 1992, where I was interviewing farmers about the history of their development of a small market garden using irrigation water obtained from a nearby lake. Looking back at the history of the market, which I think was about six years old at the time, I asked them to identify the most significant change that had taken place during the period of time. They identified installation of the water pump in year 198?, and pointed out how it expanded the scale of their cultivation thereafter. I can remember also asking, but with less recall of what they then said, follow-up questions about the most significant change that it taken place in each smaller time period either side of that event, and then its consequences.  I was in effect asking them to carve up the history of the garden into segments, and sub-segments, of time not defined by calendar, but by key events – each of which had consequences. These were in effect temporal "cases". Each of these had a configuration of multiple attributes, i.e. being attributes of the nested set of time periods that it belonged to. Associated with each of these were differenting judgements about the  the productivity of the market garden.  But with our team's time being short supply, I never got the opportunity to gather a full data set, so to speak. 

Another of my current interests, prompted by the above conjectures, is the possible use of specific form of Hierarchical Card Sorting (HCS) as a means introducing a temporal element into case-based configurational analysis. The HCS process generates a tree structure of nested binary distinctions between cases. It is concievable that different broad criteria could be introduced for the type of differences being identified at each level of the branching structure. For example, at the top level the "most significant difference" being sought could be specified as being  "in terms of funding received", then at the next level, "in terms of outputs generated" , and so on (Criteria 1,2,3 etc in Figure 1 below) .

Figure 1 below