NeuroCOLT

Neural Networks and Computational Learning Theory

 

About NeuroCOLT

Papers Archive

1994 1995
1996 1997
1998 1999
2000 2001

Books

info@neurocolt.org

NeuroCOLT Technical Report NC-TR-98-008


Bayesian Classifiers are Large Margin Hyperplanes in a Hilbert Space


Nello Cristianini
Bristol

John Shawe-Taylor
RHUL

Peter Sykacek
ARIAI
Vienna

Keywords: Bayesian classifiers; Hilbert space; large margin

Received: 06-MAR-1998


Abstract
Bayesian algorithms for Neural Networks are known to produce classifiers which are very resistant to overfitting. It is often claimed that one of the main distinctive features of Bayesian Learning Algorithms is that they don't simply output one hypothesis, but rather an entire distribution of probability over an hypothesis set: the Bayes posterior. An alternative perspective is that they output a linear combination of classifiers, whose coefficients are given by Bayes theorem. One of the concepts used to deal with thresholded convex combinations is the `margin' of the hyperplane with respect to the training sample, which is correlated to the predictive power of the hypothesis itself.   We provide a novel theoretical analysis of such classifiers, based on Data-Dependent VC theory, proving that they can be expected to be large margin hyperplanes in a Hilbert space. We then present experimental evidence that the predictions of our model are correct, i.e. that bayesian classifiers really find hypotheses which have large margin on the training examples.  This not only explains the remarkable resistance to overfitting exhibited by such classifiers, but also co-locates them in the same of other systems, like Support Vector machines and class Adaboost, which
have a similar performance.

Download Compressed Postscript