|
NeuroCOLT
Technical Report NC-TR-98-008
Bayesian
Classifiers are Large Margin Hyperplanes in a Hilbert Space
Nello Cristianini
Bristol
John
Shawe-Taylor
RHUL
Peter
Sykacek
ARIAI
Vienna
Keywords:
Bayesian classifiers; Hilbert space; large margin
Received:
06-MAR-1998
Abstract
Bayesian algorithms for Neural Networks are known to produce classifiers
which are very resistant to overfitting. It is often claimed that
one of the main distinctive features of Bayesian Learning Algorithms
is that they don't simply output one hypothesis, but rather an entire
distribution of probability over an hypothesis set: the Bayes posterior.
An alternative perspective is that they output a linear combination
of classifiers, whose coefficients are given by Bayes theorem. One
of the concepts used to deal with thresholded convex combinations
is the `margin' of the hyperplane with respect to the training sample,
which is correlated to the predictive power of the hypothesis itself.
We provide a novel theoretical analysis of such classifiers, based
on Data-Dependent VC theory, proving that they can be expected to
be large margin hyperplanes in a Hilbert space. We then present experimental
evidence that the predictions of our model are correct, i.e. that
bayesian classifiers really find hypotheses which have large margin
on the training examples. This not only explains the remarkable
resistance to overfitting exhibited by such classifiers, but also
co-locates them in the same of other systems, like Support Vector
machines and class Adaboost, which
have a similar performance.
Download Compressed Postscript
|