Abstract
Several scoring metrics are used in different search procedures for learning probabilistic networks. We study the properties of cross entropy in learning a decomposable Markov network. Though entropy and related scoring metrics were widely used, its “microscopic” properties and asymptotic behavior in a search have not been analyzed. We present such a “microscopic” study of a minimum entropy search algorithm, and show that it learns an I-map of the domain model when the data size is large.
Search procedures that modify a network structure one link at a time have been commonly used for efficiency. Our study indicates that a class of domain models cannot be learned by such procedures. This suggests that prior knowledge about the problem domain together with a multi-link search strategy would provide an effective way to uncover many domain models.
Article PDF
Similar content being viewed by others
References
Beinlich, I.A., Suermondt, H.J., Chavez, R.M., & Cooper, G.F. (1989). The alarm monitoring system: A case study with two probabilistic inference techniques for belief networks. Technical Report KSL-88-84, Knowledge Systems Lab, Medical Computer Science, Stanford University.
Bouckaert, R.R. (1994). Properties of Bayesian belief network learning algorithms. In R. Lopez de Mantaras & D. Poole (Eds.), Proc. of 10th Conf. on Uncertainty in Artificial Intelligence (pp. 102-109). Seattle, Washington: Morgan Kaufmann.
Buntine, W. (1991). Classifiers: A theoretical and empirical study. In R. Lopez de Mantaras & D. Poole (Eds.), Proc. of 1991 Inter. Joint Conf. on Artificial Intelligence (pp. 638-644), Sydney.
Buntine, W. (1991). Theory refinement on Bayesian networks. In B.D. D’Ambrosio, P. Smets, & P.P. Bonissone (Eds.), Proc. of 7th Conf. on Uncertainty in Artificial Intelligence (pp. 52-60).
Buntine, W. (1994). Operations for learning with graphical models. Journal of Artificial Intelligence Research, (2):159-225.
Charniak, E. (1991). Bayesian networks without tears. AI Magazine, 12(4):50-63.
Cheeseman, P. (1993). Overview of model selection. In Proc. of 4th Inter. Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale: Society for AI and Statistics.
Chickering, D., Geiger, D., & Heckerman, D. (1995). Learning Bayesian networks: Serach methods and experimental results. In Proc. of 5th Conf. on Artificial Intelligence and Statistics (pp. 112-128). Ft. Lauderdale: Society for AI and Statistics.
Chow, C.K., & Liu, C.N. (1968). Approximating discrete probability distributions with dependence trees. IEEE Trans. on Information Theory, (14):462-467.
Cooper, G.F., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, (9):309-347.
Dawid, A.P., & Lauritzen, S.L. (1993). Hyper Markov laws in the statistical analysis of decomposable graphical models. Annals of Statistics, 21(3):1272-1317.
Edwards, D., & Havranek, T. (1985). A fast procedure for model search in multidimensional contingency tables. Biometrika, 72(2):339-351.
Frydenberg, M., & Lauritzen, S.L. (1989). Decomposition of maximum likelihood in mixed graphical interaction models. Biometrika, 76(3):539-555.
Fung, R.M., & Crawford, S.L. (1990). Constructor: A system for the induction of probabilistic models. In Proc. of AAAI (pp. 762-769). Boston: MA MIT Press.
Gallager, R.G. (1968). Information Theory and Reliable Communication. John Wiley and Sons.
Golumbic, M.C. (1980). Algorithmic Graph Theory and Perfect Graphs. Academic Press.
Green, P.J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4):711-732.
Hajek, P., Hovranek, T., & Jirousek, R. (1992). Uncertain Information Processing in Expert Systems. CRC Press.
Heckerman, D. (1995). A tutorial on learning Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research, Mocrisoft.
Heckerman, D., Geiger, D., & Chickering, D.M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20:197-243.
Henrion, M. (1988). Propagating uncertainty in Bayesian networks by probabilistic logic sampling. In J.F. Lemmer, & L.N. Kanal (Eds.), Uncertainty in Artificial Intelligence 2 (pp. 149-163). Elsevier Science Publishers.
Herskovits, E.H., & Cooper, G.F. (1990).Kutato: An entropy-driven system for construction of probabilistic expert systems from database. In Proc. 6th Conf. on Uncertainty in Artificial Intelligence (pp. 54-62). Cambridge.
Jensen, F.V. (1988). Junction tree and decomposable hypergraphs. Technical Report, JUDEX, Aalborg, Denmark.
Jensen, F.V., Lauritzen, S.L., & Olesen, K.G. (1990). Bayesian updating in causal probabilistic networks by local computations. Computational Statistics Quarterly, (4):269-282.
John, G.H., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Proc. 11th Inter. Conf. on Machine Learning (pp. 121-129).
Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22:79-86.
Lam, W., & Bacchus, F. (1994). Learning Bayesian networks: An approach based on the MDL principle. Computational Intelligence, 10(3):269-293.
Lauritzen, S.L., & Spiegelhalter, D.J. (1988). Local computation with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, Series B, (50):157-244.
Madigan, D., & Raftery, A.E. (1994). Model selection and accounting for model uncertainty in graphical models using Occam's window. Journal of American Statistical Association, 89(428):1535-1546.
Madigan, D., & York, J. (1995). Bayesian graphical models for discrete data. International Statistical Review, 63:215-232.
Manber, U. (1989). Introduction to Algorithms: a Creative Approach. Addison-Wesley.
Neapolitan, R.E. (1990). Probabilistic Reasoning in Expert Systems. John Wiley and Sons.
Pagallo, G., & Haussler, D. (1990). Boolean feature discovery in empirical learning. Machine Learning, (5):71-99.
Pearl, J. (1986). Fusion, propagation, and structuring in belief networks. Artificial Intelligence, (29):241-288.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.
Rebane, G., & Pearl, J. (1987). The recovery of causal ploy-trees from statistical data. In Proc. of Workshop on Uncertainty in Artificial Intelligence (pp. 222-228). Seattle.
Reinis, Z., Pokorny, J., Basika, V., Tiserova, J., Gorican, K., Horakova, D., Stuchlikova, E., Havranek, T., & Hrabovsky, F. (1981). Prognostic significance of the risk profile in the prevention of coronary heart disease. Bratis. lek Listy, 76:137-150.
Sclove, S.L. (1994). Small-sample and large-sample statistical model selection criteria. In P. Cheeseman & R.W. Oldford (Eds.), Selecting Models from Data (pp. 31-39). Springer-Verlag.
Spirtes, P., & Glymour, C. (1991). An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 9(1):62-73.
Wong, S.K.M., Butz, C.J., & Xiang, Y. (1995). A method for implementing a probabilistic model as a relational database. In Proc. 11th Conf. on Uncertainty in Artificial Intelligence (pp. 556-564). Montreal.
Wong, S.K.M., & Xiang, Y. (1994). Construction of a Markov network from data for probabilistic inference. In Proc. 3rd Inter. Workshop on Rough Sets and Soft Computing (pp. 562-569). San Jose.
Wong, S.K.M., Xiang, Y., & Nie, X. (1994). Representation of Bayesian networks as relational databases. In Proc. 5th Inter. Conf. Information Processing and Management of Uncertainty in Knowledge-Based Systems(IPMU) (pp. 159-165). Paris.
Xiang, Y. (1996).Aprobabilistic framework for cooperative multi-agent distributed interpretation and optimization of communication. Artificial Intelligence, to appear in fall.
Xiang, Y., Pant, B., Eisen, A., Beddoes, M.P., & Poole, D. (1993a). Multiply sectioned Bayesian networks for neuromuscular diagnosis. Artificial Intelligence in Medicine, 5:293-314.
Xiang, Y., Poole, D., & Beddoes, M.P. (1993b). Multiply sectioned Bayesian networks and junction forests for large knowledge based systems. Computational Intelligence, 9(2):171-220.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Xiang, Y., Wong, S. & Cercone, N. A “Microscopic” Study of Minimum Entropy Search in Learning Decomposable Markov Networks. Machine Learning 26, 65–92 (1997). https://doi.org/10.1023/A:1007324100110
Issue Date:
DOI: https://doi.org/10.1023/A:1007324100110