Abstract
Maximum a posteriori optimization of parameters and the Laplace approximation for the marginal likelihood are both basis-dependent methods. This note compares two choices of basis for models parameterized by probabilities, showing that it is possible to improve on the traditional choice, the probability simplex, by transforming to the 'softmax' basis.
Article PDF
Similar content being viewed by others
References
Bridle, J.S. (1989). Probabilistic interpretation of feedforward classification network outputs,with relationships to statistical pattern recognition. In F. Fougelman-Soulie & J. Hérault (Eds.), Neuro-computing: Algorithms, architectures and applications, Springer-Verlag.
Chickering, D.M., & Heckerman, D. (1996). Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables (Microsoft Research Technical Report MSR-TR-96-08).
Gelman, A. (1996). Bayesianmodel-building by pure thought: Some principles and examples. Statistica Sinica, 6, 215–232.
Gelman, A., Carlin, J., Stern, H., & Rubin, D. (1995). Bayesian data analysis. London: Chapman andHall.
Jeffreys, H. (1939). Theory of probability. Oxford Univ. Press (3rd edition reprinted in paperback1985).
Lee, C.H., & Gauvain, J.L. (1993). Speaker adaptation based on MAP estimation of HMM parameters.IEEE Proceedings (pp. II-558–561).
Lindley, D.V. (1980). Approximate Bayesian methods. InJ.M. Bernardo, M.H. DeGroot, D.V. Lindley, & A.F.M. Smith (Eds.), Bayesian statistics (pp. 223–237). Valencia: Valencia University Press.
MacKay, D.J.C. (1992). A practical Bayesian framework forbackpropagation networks. Neural Computation, 4(3), 448–472.
MacKay, D.J.C., (1997).Ensemble learning for hidden Markov models. Available from http://wol.ra.phy.cam.ac.uk/.
MacKay, D.J.C., & Peto, L. (1995). A hierarchical Dirichlet language model. Natural LanguageEngineering, 1(3), 1–19.
Neal, R.M. (1992). Bayesian mixture modelling. In C. Smith, G. Erickson, & P. Neudorfer (Eds.), Maximum Entropy and Bayesian Methods, Seattle 1991. (pp. 197–211). Dordrecht: Kluwer.
O'Hagan, A. (1994). Bayesian Inference, volume 2B of Kendall's AdvancedTheory of Statistics. Edward Arnold.
Ripley, B.D. (1995). Pattern recognition and neural networks.Cambridge.
Smith, A., & Spiegelhalter, D. (1980). Bayes factors and choice criteria for linear models.Journal of the Royal Statistical Society B, 42(2), 213–220.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
MacKay, D.J. Choice of Basis for Laplace Approximation. Machine Learning 33, 77–86 (1998). https://doi.org/10.1023/A:1007558615313
Issue Date:
DOI: https://doi.org/10.1023/A:1007558615313