Wednesday, December 02, 2009

Reflections on Dave Snowden’s presentations on sense-making and complexity

... at the Wageningen Innovation Dialogue, 30 November -1st December 2009

From my point of view, one of the most interesting and important challenges is how to create useful representations of large, complex, dynamic structures, especially as seen by participants in those structures. For example, multi stakeholder processes in operation at national and international levels. Behind this view is an assumption, that if we have better representations then this will provide us with more informed choices about how to respond to that complexity. Note that the key word here is respond to, not manage. The scale of ambition is more modest. Management of complexity only seems feasible when it is on a small scale, such as the children’s play group example cited by Dave Snowden (DS).

I have had a long standing interest in one particular set of tools that can be used for producing representations of complex structures. These are social network analysis (SNA) methods and associated software. During the workshop Steve Waddell provided a good introduction to SNA and related tools.

DS’s presentations on the sense-making approach provided a useful complementary perspective. This was all about making use of large sets of qualitative data, of a kind that cannot be easily used by SNA tools. Many of this data was about people’s voices, values and concerns, all in the form of fairly unstructured and impromptu responses to questions asked by their peers (who were trained to do so). These are called “micro-narratives” (MNs).

DS’s sense-making process (and associated software) is innovative in at least three respects. Firstly, in terms of the huge scale. Up to 30,000 items of text collected and analysed in one application.In many cases this would be more like a census than a sample survey. I have never heard of qualitative data being collected on this scale before. Nor as promptly, including the time spent on analysis, in the case of the Pakistan example. Secondly, and related to this, is the sophistication and apparent user friendliness of the bespoke software and hardware that was used.

More interesting, and more important, was the decision to ask respondents to “self-signify” the qualitative information they had provided. This was done by asking the respondents to describe their own MNs by using two different kinds of scales, to rate the presence of different attributes already identified by the researchers as being of concern. The consequence of respondents providing this meta-data was that all the MNs could be given a location in a three dimensional space. In fact a number of different kinds of three dimensional spaces, if many self-signifiers were used. Within that space it was then possible for the researcher to look for clusters of MNs. Of special interest were clusters of MNs that were outliers, i.e. those that were not part of the centre of the overall distribution of MNs.

There are echoes here of the expectation that the collection and analysis of Most Significant Change (MSC) stories will help organisations identify “the edges of experience”, which they wanted to see more examples of in future (if positive), or less (if negative). The difference is DS's use of quantitative data to make these outliers more identifiable, in a transparent manner.

As far as I understand it, an additional purpose of using self-signifiers to identify clusters of MNs is to prevent premature completion of the process of interpretation by the researcher, and thus to strengthen the trustworthiness of the analysis that is made.

On the first day of the workshop I had two reservations about the approach that had been described. The first was about the “fitness landscape” that was drawn within the three dimensional space. How was it constructed, and why it was needed, this was unclear to me. My understanding now is that this surface is a mathematical projection from the 30,000 data points in that 3-D space (in the Pakistan example). A bit like a regression line in a 2D graph. One advantage of this constructed landscape is that it enable observers to have a clearer understanding of how these numerous MNs relate to each other on the three dimensions. When they are simply dots hanging in space this is much more difficult to do so.

I also wondered why “peak” locations were designated as peaks, and not troughs, and vice versa. This seems to be a matter of researcher choice. This seems okay, if the landscape has no more significance than a visual aid, as suggested above. But in some complexity studies peaks in landscapes are presented as unstable locations, and troughs as stable points, acting as “attractors”. Is it likely that any pole of any of the self-signifying scales will show this type of behaviour? If not, might it be better not to talk about fitness landscapes, or at least be very careful about not giving them more apparent significance than they merit? A related claim seems to have been made when DS said “Fitness landscapes show people where change is possible”. But is this really the case? I can’t see how it can be, unless desirable/undesirable attributes are built into the self-signifying scales chosen to create the 3D space. There is a risk that the technical language that is being used imputes more independent analytic capacity than the software has in reality.

The other concern I had was about who chooses the scales used to self-signify? I should say that I do think it is okay to derive these from a relevant academic field, or from the concerns of the client for the research. But might it provide an even more independent structuring of the MN data, if these scales were somehow also derived from the respondents themselves? On reflection, there seems to be no way of doing this when the sense- maker approach is applied on a large scale.

But on a much smaller scale I think there may be ways of doing this, by using a reiterated process of inquiry, rather than a once off process. I can provide an example by using data borrowed from a stakeholder consultation process held in rural Australia a few years ago. In the first stage respondents generated the equivalent of MNs. In this case they were short statements about how they expected a new fire prevention programme to help them and their community. These statements were in effect informal “objectives”, written in ordinary day-to-day language, on small filing cards. In the next stage the same individual stakeholders were each asked to sort these statements into a number of groups (of their own choosing), each group describing a different kind of expectation. Each of these groups was then labelled by the respondent who created it. The data from these card-sorting exercises was then aggregated into a single cards x cards matrix, where each cell value described how often the row card had been placed in the same group as the column card.

Here the card sorting exercise was in effect another means of self-signifying. It was generating meta-data, statements (group labels) about the statements (individual expectations). Unlike the tripolar and bipolar scales used in David’s sense-making approach, it did not enable a 3D space to be generated where all the 30 statements could be given a specific location. However, the cards x cards matrix was a data set that many SNA software tools can easily use to construct a network diagram, which is a 2D presentation of complex structures. The structure that was generated is shown below. Each node is a card, each link between two cards represents the fact that those two cards were placed in the same group one or more times (shown by line thickness). Clusters of cards all linked to each other were all placed in the same group one or more times.When using one software package (Visualyzer), a “mouseover” on any node can be used to show not only the original card contents (the expectation), but also the labels of the one or more groups that the card was later placed in.In this adapted use of self-signifiers the process of grouping cards helps add additional qualitative information and meaning to that already there in the card contents.

As well as being able to identify respondent defined clusters of statements, we can also sometimes see links between these clusters. The links are like a more skeletal version of the landscape surface discussed above. The “peaks” of that landscape are the nodes connected by strong links (i.e. the two cards were placed the same groups multiple times). These can be made easier to identify by applying a filter to screen out the weaker links. This is the metaphorical equivalent of raising the sea level, and covering the lower levels of the landscape.

The virtue of this network approach to analysing MNs is its very participative nature. Its limitation is its modest scalability. The literature on sorting methods suggests an upper limit of between 50 or so cards (I will investigate this further).While this is much less than 30,000, many structured stakeholder consultation processes can involve a smaller numbers of participants than this.

Key: Numbers represent the IDs of each card. Links indicate that the two cards were placed in the same group, one or more times. Thicker links = placed in the same group more often. Yellow nodes = most conspicuous cliques of cards (all often co-occuring).This image shows the strongest links only(i.e. above the average number. The mouseover function is not available for this image copy.

My final set of comments are about some of the risks and possible limitations of DS’s sense-making approach. The first concern is about transparency of method. To newcomers, the complexity terminology that is used when introducing the method was challenging, to say the least. At worst I wonder whether it is an unnecessary obstruction, and whether a shorter route to understanding the method would exist, if less complexity sciences terminology was used. The proprietary nature of the associated software is also a related concern to me, though I have been told that there is an intention to make an open source version available. Open source means open to critique and open to improvement, through collective effort, which is what the progress of science is ideally all about. The extensive use of complexity science terms also seems to make the approach vulnerable to corruption and possible ridicule, as people decided to “pick and mix” the bits and pieces of complexity ideas they are interested in, without understanding the basics of the whole idea of complexity.

Another issue is commensurate benefits. After seeing the scale of the data gathering involved, and the sophistication of the software used, both of which are impressive, I did also wonder whether the benefits obtained from the analysis were commensurate with the costs and efforts that had been invested, at least in the examples we were told about. Other concerns are not exclusive to the sense-making approach. What about the stories not told? Perhaps with almost census like coverage of some groups of concern this is less of a concern than with other large scale ethnographic inquiries. What about unexpected stories? Is the search for outliers leading to the discovery of views which are a surprise to clients of the research, and of possible consequence to their plans on how to relate to the respondents in the future? And are these surprises enough in number, or are they dramatic enough, to counterbalance the resource invested to find them?

At the heart of all major discoveries in the physical sciences is the discovery of novel methods of representationSteven Toulmin

Friday, October 30, 2009

On the poverty of baselines and targets...

I have been surprised to see how demanding DFID has become on the subject of baseline data. On page 13 of the new DFID Guidance (on using the new formatted Logical Framework) it is stated that ” All projects should have baseline data at all levels before they are approved. In exceptional circumstances, projects may be approved without baseline data at Output level..." Closer to the ground I have witnessed an UK NGO being pressed by DFID-appointed managers of a funding mechanism to deliver the required baseline data. This is despite the fact that the NGO's project will be implemented in a number of countries over a period of years, not all at once.

Meanwhile, in Uganda and Indonesia, I am watching two projects coming to an end. Both had baseline data collected shortly after they started. Neither is showing any signs of intending to do a re-survey at the end of the project period. Is anyone bothered? Not that I can see. Including DFID, who is a donor supporting one of the projects. And in both cases baseline surveys were expensive investments.To make matters worse, in one country the project performance targets were set before the baseline study, and in the other they have never really been agreed on.

I have just completed the final review of one project. We have diligently compared progress made on a set of indicators, against all the original targets. There are of course the usual problems of weak and missing data, and questionable causal links with project interventions. But what bothers me more is how outdated and ill-fitting some of these initial performance measures are. And how little justice this mode of assessment seems to be doing to what the project has been able to do since it started, especially the flexibility of its response in the face of the changing needs of the main partner organisation. Of even greater concern is the fact that this project is being implemented in a large number of districts, in a country that has been going through a significant process of decentralisation. Each district's capacities and needs are different, and not surprisingly the project's activities and results have varied from district to district. There is fact no one single project. Yet our review process, like many others, has in effect treated these district variations as "noise", obscuring what were expected to be region-wide trends over time.

I am now working on some ideas of how to do things differently in my next project review, in the same country. This time the focus will be more on internal comparisons: (a) between locations, (b) between time periods during the project period.

Tuesday, October 27, 2009

Why we should make economists work harder

"Why we should make life harder for aid agencies" is the title of an article by Tim Harford ("The Undercover Economist") in last weekend's Financial Times magazine section.

I agree with the sentiment, but not with the analysis. I was expecting better, given what I have read of Tim in the past.

Tim's article starts with the problem of how can we, as individual donors, be sure that our aid goes in the right direction and have the expected impact. The next problem, as seen by Tim, is that aid agencies are bureacracies. The solution is competition via a more open market. From within this perspective recent efforts at aid "harmonisation" are viewed by Tim with suspicion, and seen as almost the equivalent to establishing a cartel.

He then asks could agencies be made to compete , not only with each other, but even with private companies, to get funding from donor organisations. And could money (or rather vouchers) be given directly to aid recipients to spend, redeemable for services provided by a range of charities and aid agencies. These ideas he seeas as "radical" and possibly "far fetched" More immediately, he suggests we could "start by asking simple questions about where aid comes from, where it goes, how effective it is and how much is lost to administration – or worse."

I hope Tim will be pleasantly surprised to find his ideas are not seen as radical or far fetched, and in fact have been in play now for quite come time. What Tim really needs to do (apart from more homework before writing articles like this) is to start questioning the assumptions behind his analysis of the nature and benefits of competition amongst aid agencies.

1. In most ordinary markets the purchaser and user are one and the same person. The purchaser and user of aid agency services are different parties, seperated by continents and cultures.

2. In between them is not a single supplier, but a large and complex international aid supply network. See my map of one of the simpler aid supply networks, in a Guardian funded development project in Uganda (map is at the end of the article)

3. The quality of the product/service being provided is much more difficult to assess than that found in many goods and services markets in the UK. Measurement of poverty reduction is a field of its own, improved governance is another order of magnitude more difficult to assess, but nevertheless a common development objective. There are some more measuable oucomes, such as those captured via the Millenium Development Goals e.g. reduced maternal mortality. But these usually require changes in the performance of institutions e.g. national health services. These sorts of change are not simple to measure, let alone achieve. Aid agencies can avoid this challenge by directly supplying health services to poor communities, but they will then fail on another performance metric: sustainability

Tim's idea of vouchers (above) could best be described as quaint. It is now common place for aid agencies in humanitarian emergencies to give cash handouts to families in need, not just vouchers. So they can buy what they need from anyone, not simply "a range of charities and aid agencies" Cash transfers are also being tested for their usefullness in development programmes, where there is no emergency present.

Competition between aid agencies is happening all over the place. DFID has, for years, invited tenders from a wide range of organisations to implement its aid programmes. See their Current Contract Opportunities page But what difference is this making, that is the question. By contracting out work to others DFID moves its own "overhead costs" off its own books, onto others. But the overheads are still there. In fact they are multiplied, because in order to win contracts multiple organsiations invest substantial amounts of time and effort into producing complex documents, but only one wins. Those loosing bids are not products that can easily be sold to other possible buyers, like unused factory stock. Instead the costs of their non-use is figured in to the subsequent bids, including those that win.

So, costs will have gone up, but what about effectiveness? If that has improved, then the increased costs would be justifiable. The problem is, as touched upon above, it is very diffifuclt to measure the effectiveness of many contracted-out projects, because of the scale and complexity of the changes they are trying to achieve.

Wednesday, September 23, 2009

Constructing longer term perspectives

A few weeks ago a friend asked me for help with ideas for a presentation that needed to be made on "challenges for the international development sector..."

Not an easy task, where do you start? But I knuckled down and did some reflection. Work I am doing on DFID and AusAID funded projects in Indonesia ended up as the source of some ideas that may be useful. I have been working on these since late 2005 and the work continues until early 2010.

My short reply to my friend was as follows:

How to ensure that development interventions are designed and implemented within a long term perspective, that extends way beyond the typical 3-5 year planning cycles

There is a massive contradiction between the short term nature of project designs and what most people know about how long development can take (both technological and social).

If project planning cycles cannot be lengthened (e.g. because of goverment budget cycles and election cycles) then how can we make sure that these planning cycles are better linked up, into a more coherent longer term intervention? This is n easy task when there is constant staff turnover both within government and aid agencies. Strategy papers by themselves are not much use, because they have their own continuity problems, no new boss wants to simply say, yes, we will do more of same. Everyone wants to re-write the strategy in their own image

In Jakarta in October we will be holding an end-of-project review workshop for the DFID funded, GTZ implemented, AusAID monitored, government owned SISKES project (maternal and neonatal health). One of the two workshop objectives is

To engage participants in a longer term perspective on MNH development, that exceeds the typical 3-5 year project lifespan

  • By looking back, on developments since the beginning of the decade
  • By looking forward to up to four years in the future.
We will be including some people associated with a new AusAID MNH project in one of the 2 districts that GTZ are pulling out of. Plus a caste of thousands (well, 52 other participants so far).

One of the workshop exercises will be to engage participants in predicting trends in key service provision indicators over the next four years, based on their knowledge aquired through the SISKES project, and other sources. And then analysing the implications of these expected trends for the incoming projects, including the new AusAID MNH project

There is also a need for more connecting events at the design stage as well, where various stakeholders from prior and parallel related developments are brought in to inform planning decisions, or at least the choices to be considered. Often the consultants on design missions are about the only bridges to the past.

For other people's efforts to promote really long term thinking, see the Long Now Foundation

Monday, August 10, 2009

Bibliographic Timelines

It is a simple idea, but one that looks useful

During a recent mid-term review of AMREF's Katine Community Partnerships Project, I started to create a bibliography of project related documents, with a difference. Normally documents listed in a bibliography are structured in alphabetical order, by the authors' name. This helps you find the document if you know the authors name, but not much more.

This time I listed all the project documents in time order, by the year and the month when they were produced, starting with the oldest. In the text of most reports referenced documents are usually referred to by their author and date, so it is still easy to find cited documents in this chronologically ordered list. The added advantage of this "bibliographic timeline" is that it also gives you (the reader and/or writer) a quick sense of the history of the project. Most document titles make some reference to the event they are describing (e.g. baseline studies, needs assessments, workplans, annual reports, etc), so by scanning down the bibliography you can quickly get a rough sense of the sequence of activities that have taken place. Even though there may be a time lag between an event and when it is documented (say in the next month).

I have attached below a graphic image of the "bibliographic timeline" that was produced this way. Click on the image to get more detail.