Over the past years, I have read quite a few guidance documents on how to do M&E. Looking back at this literature, one thing that strikes me is how little attention is given to data analysis, relative to data collection. There are gaps, both in (a) guidance on "how to do it" and (b) how to be transparent and accountable for what you planned to do and then actually did. In this blog, I want to provide some suggestions that might help fill that gap.
But first a story, to provide some background. In 2015 I did some data analysis for a UK consultancy firm. They had been managing a "Challenge Fund" a grant making facility funded by DFID, for the previous five years, and in the process had accumulated lots of data. When I looked at the data I found sapproximately170 fields. There were many different analyses that could be made from this data, even bearing mind one approach we had discussed and agreed on - the development of some predictive models, concerning the outcomes of the funded projects.
I resolved this by developing a "data analysis matrix", seen below. The categories on the left column and top row referred to different sub-groups of fields in the data set. The cells referred to the possibility of analyzing the relationship between the row sub-group of data and the column sub-group of data. The colored cells are those data relationships the stakeholders decided would be analyzed, and the initials in the cells referred to the stakeholder wanting that analysis. Equally importantly, the blank cells indicate what will not be analyzed.
We added a summary row at the bottom and a summary column to the right. The cells in the summary row signal the relative importance given to the events in each column. The cells in the summary column signal the relative confidence in the quality of data available in the row sub-groups. Other forms of meta-data could also have been provided in such summary rows and columns, which could help inform stakeholders choice of what relationships between the data should be analyzed.
A more general version of the same kind of matrix can be used to show the different kinds of analysis that can be carried out with any set of data. In the matrices below, the row and column letters refer to different variables / attributes / fields in a data set. There are three main types of analysis illustrated in these matrices, and three sub-types:
- Univariate - looking at one measure only
- Bivariate - looking at the relationships between two measures
- Multivariate - looking at the relationship between multiple measures
But within the multivariate option there three alternatives, to look at:
- Many to one relationships
- One to many relationships
- Many to many relationships
On the right side of each matrix below, I have listed some of the forms of each kind of analysis.
What I am proposing is that studies or evaluations that involve data collection and analysis should develop a transparent analysis plan, using a "data analysis matrix" of the kind shown above. At a minimum, cells should contain data about which relationships will be investigated. This does not mean investigators can't change their mind later on as the study or evaluation progresses. But it does mean that both original intentions and final choices will be more visible and accountable.
Postscript: For details of the study mentioned above, see LEARNING FROM THE CIVIL SOCIETY CHALLENGE FUND: PREDICTIVE MODELLING Briefing Paper. September 2015