Monday, May 25, 2015

Characterising purposive samples



In some situations it is not possible to develop a random sample of cases to examine for evaluation purposes. There may be more immediate challenges, such as finding enough cases with sufficient information and sufficient quality of information.

The problem then is knowing to what extent, if at all, the findings from this purposive sample can be generalised, even in the more informal sense of speculating on the relevance of findings to other cases in the same general population.

One way this process can be facilitated is by "characterising" the sample, a term I have taken from elsewhere. It means to describe the distinctive features of something. This could best be done using attributes or measures that can, and probably already have been, used to describe the wider population where the sample came from. For example, the sample of people could be described as being of average age of 35 versus 25 in the whole population, and 35% women versus 55% in the wider population. This seems a rather basic idea, but it is not always applied.

Another more holistic way of doing so is to measure the diversity of the sample. This is relatively easy to do when the data set associated with the sample is in binary form, as for example is used in QCA analysis (i.e. cases are rows, columns are attributes and cell values of 0 or 1 indicate if the attributes was absent or present)

As noted in earlier blog postings,Simpsons Reciprocal Index is a useful measure of diversity. This takes into account two aspects of diversity: (a) richness, which in a data set could be seen in the number of unique configurations of attributes found across all the cases( think metaphorically of organisms - cases, chromosomes-configurations and genes-attributes) and (b) evenness, which could be seen in the relative number of cases having particular configurations. When the number of cases is evenly distributed across all configurations this is seen as being more diverse than when the number of cases per configuration varies.

The degree of diversity in a data base can have consequences. Where a data set that has little diversity in terms of "richness" there is a possibility that configurations that are identified by QCA or other algorithmic based methods, will have limited external validity, because they may easily be contradicted by cases outside the sample data set that are different from already encountered configurations. A simple way of measuring this form of diversity is to calculate the original number of unique configurations in the sample data set as a percentage of the total number possible, given the number of binary attributes in the sample data set (which is 2 to the power of the number of attributes). The higher the percentage, the less risk that the findings will be contradicted by configurations found in new sets of data (all other things being constant).

Where a data set has little diversity in terms of "balance" it will be more difficult to assess the consistency of any configuration's association with an outcome, compared to others, because there will be more cases associated with some configurations than others. Where there are more cases of a given configuration there will be more opportunities for its consistency of association with an outcome to be challenged by contrary cases.

My suggestion therefore is that when results are published from the analysis of purposive samples there should be adequate characterisation of the sample, both in terms of: (a) simple descriptive statistics available on the sample and wider population, and (b) the internal diversity of the sample, relative to the maximum scores possible on the two aspects of diversity.