Tuesday, November 05, 2019

Combining the use of the Confusion Matrix as a visualisation tool with a Bayesian view of probability

Caveat: This blog posting is total re-write of an earlier version on the same subject. Hopefully, this one will be more coherent and more useful!

Quick Summary
In this revised blog I:
1. Explain what a Confusion Matrix is and what Bayes Theorem says
2. Explain three possible uses for Bayes Theorem when combined with a Confusion Matrix

What is a Confusion Matrix?

A Confusion Matrix is a tabular structure that displays four possible combinations of two types of events, each of which may have happened, or not happened. Wikipedia provides a good description.

Here is an example, with real data, taken from an EvalC3 analysis.

    TP = True Positive, FP = False Positive, FN = False Negative, TN = True Negative

In this example, the top row of the table tells us that when the attributes of a particular predictive model (as was identified by EvalC3) are present there are 8 cases where the expected outcome is also present (True Positives). But there are also 4 cases where the expected outcome is not also present (False Positives). In all the remaining cases (all in the bottom row), which do not have the attributes of the predictive model, there is one case where the outcome is nevertheless present (False Negative) and 13 cases where the outcome is not present (True Negative). As can be seen in the Wikipedia article, and also in EvalC3, there is a range of performance measures which can be used to tell is how well this particular predictive model is performing – and all of these measures are based on particular combinations of the types of values in this Confusion Matrix.

Bayes theorem

According to Wikipedia, 'Bayes' theorem (alternatively Bayes's theorem, Bayes's law or Bayes's rule) describes the probability of an event, based on prior knowledge of conditions that might be related to the event '

The Bayes formula reads as follows:

P(A|B) = The probability of A, given the presence of B
P(B|A) = The probability of B, given the presence of A
P(A) = The probability of A
P(B) = The probability of B

This formula can be calculated using data represented within a Confusion Matrix. Using the example above, the outcome being present = A in the formula, and the model attributes being present = B in the formula.  So this formula could tell us the probability of finding True Positives i.e when these are both present. Here is how the various parts of the formula are calculated, along with some alternate names for their parts:

So far, in this blog posting, the Bayes formula simply provides one additional means of evaluating the usefulness of prediction models found through the use of machine learning algorithms, using EvalC3 or other means.

Process Tracing application

But I'm more interested here in the use of the Bayes formula for process tracing purposes, something that Barbara Befani has written about. Process tracing is all about articulating and evaluating conjectured causal processes in detail. A process tracing analysis is usually focused on one case or instance, not multiple cases. It is a within-case rather than cross-case method of analysis. 

In this context, the rows and columns of the Confusion Matrix have slightly different names. The columns described whether a theory is true or not, and the rows described whether evidence of a particular kind is present or not. More importantly, the values in the cells are not numbers of actual observed cases.  Rather, they are the analyst's interpretation of what is described as the "conditional probabilities" of what is happening in one case or instance.  In the two cells in the first column, the analyst puts their own probability estimates, between zero and one, reflecting the likelihood: (a) that if the evidence is present in the theory is true, that a man was the murderer, and (B) that if the evidence is absent the theory is true. In the two cells in the second column, the analyst puts their own probability estimates, between one and two, reflecting the likelihood: (a) that if the evidence is theory is not true, that a man was not the murderer, and (B) that if the evidence is absent the theory is not true. 

Here is a notional example. The theory is that a man was the murderer. The available evidence suggests that the murderer would have needed exceptional strength.
The analyst also needs to enter their "priors". That is, their belief about the overall prevalence of the theory being true i.e. men are most often the murderers. Wikipedia suggests that 80% of murders are committed by men.These prior probabilities are entered in the third row of the Confusion Matrix, as shown below. The main cell values are then updated in the light of those new value, as also shown below

Using the Bayes formula provided above, we can now calculate P(A|B),i.e. man being the murderer if the evidence x was found..  P(A|B) = TP/(TP+FP) = 0.97

"Naive Bayes" 

This is another useful application, based on an algorithm of this name, described here. 
On that web page, an example is given of a data set where each row describes three attributes of a car (color, type and origin) and whether the car was stolen or not. Predictive models (Bayesian or otherwise)  could be developed to identify how well each of these attributes predicts whether a car is stolen or not. In addition, we may want to know how good a predictor the combination of all these three individual predictors is. But the dataset does not include any examples of these types of cases.

The article then explains how the probability of a combination of all three of these attributes can be used to predict whether a car is stolen or not.

1. Calculate (TP/(TP+FP)) for color * (TP/(TP+FP)) for type * (TP/(TP+FP)) for origin 
2. Calculate (FP/(TP+FP)) for color * (FP/(TP+FP)) for type * (FP/(TP+FP)) for origin
3. Compare the two calculated values. If the first is higher classify a car as most likely stolen. If it is lower, classify a car as most likely not stolen.

A caution: Naive Bayes calculations assume (as its name suggests) that each of the attributes of the predictive model is causally independent. This may not always be the case.

In summary

Bayes formula seems to have three uses:

1. As an additional performance measure when evaluating predictive models generated by any algorithm, or other means. Here the cell values do represent numbers of individual cases.

2.  As a way of measuring the probability of a particular causal mechanism working as proposed, within the context of a process-tracing exercise. Here the cell values are conjectures about relative probabilities relating to a specific case, not numbers of individual cases.

3.  As a way of measuring the probability of a combination of predictive models being a good predictor of an outcome of concern. Here the cell values could represent either multiple real cases or conjectured probabilities (part of a Bayesian analysis of a causal mechanism) regarding events within one case only.