Tuesday, November 05, 2019

Using Bayesian probability measures to evaluate Confusion Matrix data


A few years ago, thanks to Barbara Befani, I was introduced to Bayesian probability as a way of thinking about the analysis of causes. But I can't say I found Bayes Theorem an easy idea to get my head around. More recently, I was encouraged to try again, while reading Pedro Domingos' "The Master Algorithm", which has a useful chapter titled "In the Church of Reverend Bayes".

I was especially interested in how a Bayesian measure known as the Posterior Probability of two events being associated could be calculated, given the values that can be found in a Confusion Matrix.  The Confusion Matrix is a simple 2 x 2 matrix (i.e. a truth table of possible combinations) that is the basis of a wide range of measures that can be used to measure the accuracy of predictive models (aka classifiers), amongst other things. A Confusion Matrix is built into EvalC3, as a means of evaluating the performance of search algorithms used to find combinations of attributes that are the best predictors of an outcome of interest. My idea was that a formula for Posterior Probability could be added in as an additional optional performance measure.

The challenge for me was how to translate Bayesian terminology into Confusion Matrix terminology, and thus be able to implement the Bayes formula using the contents of a Confusion Matrix.

The Bayes formula is: P(A|B) = (P(B|A)*P(A))/ P(B),
              which can be read as saying...
                       P(cause|effect)  = ((P(effect|cause) *P(cause)) / P(effect)

The expression "cause|effect" refers to the probability of a proposed cause being present amongst all those cases where the effect is present

In a Confusion Matrix, as shown below in an example from EvalC3, the top row represents cases where a proposed cause is present, and the bottom row represents where it is absent. The left column represents where the effect is present in the right column represents where the effect is absent.


So, the Bayes formula above can be converted into a Confusion Matrix-based formula like this

P(TP/(TP+FN) =((TP/(TP+FP) * (TP+FP)/(TP+FP+FN+TN)) / (TP+FN)/(TP+FP+FN+TN)

For the above Confusion Matrix example, this translates as 0.88 = (0.67 * 0.46) / 0.35

So what?


How useful is any of this, as a means of evaluating the usefulness of search algorithm results? I tried it out as a performance measure using the Krook data set built into EvalC3.  It led to the presence of quotas being identified as the best single predictor of where there is a higher  level of women representation in parliaments in 26 African countries. As did seven other performance measures built into EvalC3.

When this search was reiterated, to look for a combination of two attributes that were the best predictors, the Bayes formula came up with the same model that was found by the other measures. But this model performed worst, not better, than the single attribute model, when measured in terms of posterior probability. Whereas the same two attribute model performed better than the single attribute models when other performance measures were used. Right now, I am not sure how to interpret this difference! It may be that there is no inconsistency, in that different things are being measured.