I was especially interested in how a Bayesian measure known as the Posterior Probability of two events being associated could be calculated, given the values that can be found in a Confusion Matrix. The Confusion Matrix is a simple 2 x 2 matrix (i.e. a truth table of possible combinations) that is the basis of a wide range of measures that can be used to measure the accuracy of predictive models (aka classifiers), amongst other things. A Confusion Matrix is built into EvalC3, as a means of evaluating the performance of search algorithms used to find combinations of attributes that are the best predictors of an outcome of interest. My idea was that a formula for Posterior Probability could be added in as an additional optional performance measure.

The Bayes formula is: P(A|B) = (P(B|A)*P(A))/ P(B),

which can be read as saying...

P(cause|effect) = ((P(effect|cause) *P(cause)) / P(effect)

The expression "cause|effect" refers to the probability of a proposed cause being present amongst all those cases where the effect is present

In a Confusion Matrix, as shown below in an example from EvalC3, the top row represents cases where a proposed cause is present, and the bottom row represents where it is absent. The left column represents where the effect is present in the right column represents where the effect is absent.

So, the Bayes formula above can be converted into a Confusion Matrix-based formula like this

P(TP/(TP+FN) =((TP/(TP+FP) * (TP+FP)/(TP+FP+FN+TN)) / (TP+FN)/(TP+FP+FN+TN)

For the above Confusion Matrix example, this translates as 0.88 = (0.67 * 0.46) / 0.35

### So what?

When this search was reiterated, to look for a combination of two attributes that were the best predictors, the Bayes formula came up with the same model that was found by the other measures. But this model performed worst, not better, than the single attribute model, when measured in terms of posterior probability. Whereas the same two attribute model performed better than the single attribute models when other performance measures were used. Right now, I am not sure how to interpret this difference! It may be that there is no inconsistency, in that different things are being measured.