Wednesday, July 29, 2020

Converting a continuous variable into a binary variable i.e. dichotomising


If you Google "dichotomising data" you will find lots of warnings that this is basically a bad idea!. Why so? Because if you do so you will lose information. All those fine details of differences between observations will be lost.

But what if you are dealing with something like responses to an attitude survey? Typically these have five-pointed scales ranging from disagree to neutral to agree, or the like. Quite a few of the fine differences in ratings on this scale may well be nothing more than "noise", i.e. variations unconnected with the phenomenon you are trying to measure. A more likely explanation is that they reflect differences in respondents "response styles", or something more random

Aggregation or "binning" of observations into two classes (higher and lower) can be done in different ways. You could simply find the median value and split the observations at that point. Or, you could look for a  "natural" gap in the frequency distribution and make the split there. Or, you may have a prior theoretical reason that it makes sense to split the range of observations at some other specific point.

I have been trying out a different approach. This involved not just looking at the continuous variable I wanted to dichotomise, but also its relationship with an outcome that will be of interest in subsequent analyses. This could also be a continuous variable or a binary measure.

There are two ways of doing this. The first is a relatively simple manual approach. In the first approach, the cut-off point for the outcome variable has already been decided, by one means or another.  We then needed to vary the cut-off point in the range of values for the independent variable to see what effect they had on the numbers of observations of the outcome above and below its cut-off value. For any specific cut-off value for the independent variable an Excel spreadsheet will be used to calculate the following:
  1. # of True Positives - where the independent variable value was high and so was the outcome variable value
  2. # of False Positives - where the independent variable value was high but the outcome variable value was low
  3. # of False Negative - where the independent variable value was low but the outcome variable value was high
  4. # of True Negatives - where the independent variable value was low and the outcome variable value was low
When doing this we are in effect treating cut-off criteria for the independent variable as a predictor of the dependent variable.  Or more precisely, a predictor of the prevalence of observations with values above a specified cut-off point on the dependent variable.

In Excel, I constructed the following:
  • Cells for entering the raw data - the values of each variable for each observation
  • Cells for entering the cut-off points
  • Cells for defining the status of each observation  
  • A Confusion Matrix, to summarise the total number of observations with each of the four possible types described above.
  • A set of 6 widely used performance measures, calculated using the number of observations in each cell of the Confusion Matrix.
    • These performance measures tell me how good the chosen cut-off point is as a predictor of the outcome as specified. At best, all those observations fitting the cut-off criterion will be in the True Positive group and all those not fitting it would be in the True Negative group. In reality, there are also likely to be observations in the False Positive and False Negative groups.
By varying the cut-off points it is possible to find the best possible predictor i.e. one with very few False Positive and very few False Negatives. This can be done manually when the cut-off point for the outcome variable has already been decided.

Alternatively, if the cut-off point has not been decided for the outcome variable, a search algorithm can be used to find the best combination of two cut-off points (one for the independent and one for the dependent variable).  Within Excel, there is an add-in called Solver, which uses an evolutionary algorithm to do such a search, to find the optimal combination

Postscript 2020 11 05: An Excel file with the dichotomization formula and a built-in example data set is available here  

  
2020 08 13: Also relevant: 

Hofstad, T. (2019). QCA and the Robustness Range of Calibration Thresholds: How Sensitive are Solution Terms to Changing Calibrations? COMPASSS Working Papers, 2019–92. http://www.compasss.org/wpseries/Hofstad2019.pdf  

 This paper emphasises the importance of declaring the range of original (pre-dichotomised) values over which the performance of a predictive model remains stable 

Tuesday, April 07, 2020

Rubrics? Yes, but...




The blog posting is a response to Tom Aston's blog posting: Rubrics as a harness for complexity

I have just reviewed an evaluation of the effectiveness of policy influencing activities of programs funded by HMG as part of the International Carbon Finance Initiative.  In the technical report there are a number of uses of rubrics to explain how various judgements were made.  Here, for example, is one summarising the strength of evidence found during process tracing exercises:
  • Strong support – smoking gun (or DD) tests passed and no hoop tests (nor DDs) failed.
  • Some support – multiple straw in the wind tests passed and no hoop tests (nor DDs) failed; also, no smoking guns nor DDs passed.
  • Mixed – mixture of smoking gun or DD tests passed but some hoop tests (or DDs) failed – this required the CMO to be revised.
  • Failed – some hoop (or DD) tests failed, no double decisive or smoking gun tests passed – this required the theory to be rejected and the CMO abandoned or significantly revised. 

Another rubric described in great detail how three different levels of strength of evidence were differentiated (Convincing Plausible, Tentative).  There was no doubt in my mind that these rubrics contributed significantly to the value of the evaluation report.  Particularly by giving readers confidence in the judgements that were made by the evaluation team.

But… I can't help feel that the enthusiasm for rubrics seems to be out of proportion with their role within an evaluation.  They are a useful measurement device that can make complex judgements more transparent and thus more accountable.  Note the emphasis on the ‘more‘… There are often plenty of not necessarily so transparent judgements present in the explanatory text which is used to annotate each point in a rubric scale.  Take, for example, the first line of text in Tom Aston’s first example here, which reads “Excellent: Clear example of exemplary performance or very good practice in this domain: no weakness”

As noted in Tom’s blog it has been argued that rubrics have a wider value i.e. rubrics are useful when trying to describe and agree what success looks like for tracking changes in complex phenomena”.  This is where I would definitely argue “Buyer beware” because rubrics have serious limitations in respect of this task.

The first problem is that description and valuation are separate cognitive tasks.  Events that take place can be described, they can also be given a particular value by observers (e.g. good or bad).  This dual process is implied in the above definition of how rubrics are useful.  Both of these types of judgements are often present in a rubrics explanatory text e.g. Clear example of exemplary performance or very good practice in this domain: no weakness”

The second problem is that complex events usually have multiple facets, each of which has a descriptive and value aspect.  This is evident in the use of multiple statements linked by colons in the same example rubric I refer to above.

So for any point in a rubric’s scale the explanatory text has quite a big task on its hands.  It has to describe a specific subset of events and give a particular value to each of those.  In addition, each adjacent point on the scale has to do the same in a way that suggests there are only small incremental differences between each of these points judgements. And being a linear scale, this suggests or even requires, that there is only one path from the bottom to the top of the scale. Say goodbye to equifinality!

So, what alternatives are there, for describing and agreeing on what success looks like when trying to track changes in complex phenomena?  One solution which I have argued for, intermittently, over a period of years, is the wider use of weighted checklists.  These are described at length here.  

Their design addresses three problems mentioned above.  Firstly, description and valuation are separated out as two distinct judgements.  Secondly, the events that are described and valued can be quite numerous and yet each can be separately judged on these two criteria.  There is then a mechanism for combining these judgements in an aggregate scale. And there is more than one route from the bottom to the top of this aggregate scale.

“The proof is in the pudding”.  One particular weighted checklist, known as the Basic Necessities Survey, was designed to measure and track changes in household-level poverty.  Changes in poverty levels must surely qualify as ‘complex phenomena ‘.  Since its development in the 1990s, the Basic Necessities Survey has been widely used in Africa and Asia by international environment/conservation organisations.  There is now a bibliography available online describing some of its users and uses. https://www.zotero.org/groups/2440491/basic_necessities_survey/library








 [RD1]Impressive rubric