III. Probablistic models
A probability distribution model for information retrieval

https://doi.org/10.1016/0306-4573(89)90090-3Get rights and content

Abstract

A probability distribution model for information retrieval is proposed in this article. In contrast to other approaches, each document in the new model is characterized by a simple probability measure on the set of index terms. Based on this interpretation of document representation, two different retrieval strategies are discussed. One is based on utility theory and the other is derived from information theory. The new model not only enhances retrieval effectiveness as demonstrated by experiments, but also provides valuable insight into many fundamental concepts introduced over the years in a variety of retrieval models.

References (22)

  • C.R. Rao

    Diversity and dissimilarity coefficients: A unified approach

    Theoretical Population Biology

    (1982)
  • V.V. Raghavan et al.

    A critical analysis of vector space model in information retrieval

    Journal of the American Society for Information Science

    (1986)
  • S.K.M. Wong et al.

    Generalized vector space model in information retrieval

    Proceedings of the ACM SIGIR conference on research and development in information retrieval

    (1985)
  • S.K.M. Wong et al.

    On extending the vector space model for Boolean query processing

    Proceedings of the ACM SIGIR conference on research and development in information retrieval

    (1986)
  • C.T. Yu

    A formal construction of term classes

    Journal of the ACM

    (1975)
  • V.V. Raghavan et al.

    Experiments on the determination of the relationships between terms

    ACM Transactions on Database Systems

    (1979)
  • P. Bollmann et al.

    Adaptive linear information retrieval models

    Proceedings of the ACM SIGIR conference on research and development in information retrieval

    (1987)
  • G. Salton et al.

    Introduction to modern information retrieval

    (1983)
  • C.T. Yu et al.

    Precision weighting—an effective automatic indexing method

    Journal of the ACM

    (1976)
  • C.T. Yu et al.

    A statistical model for relevance feedback in information retrieval

    Journal of the ACM

    (1976)
  • Cited by (26)

    • A risk minimization framework for information retrieval

      2006, Information Processing and Management
    • Combining latent semantic indexing and call graphs to improve feature location

      2009, Proceedings of the 13th IASTED International Conference on Software Engineering and Applications, SEA 2009
    View all citing articles on Scopus

    An early version of this work was presented during the New Orleans ACM SIGIR meeting, June 3–5, 1987, and appeared as “A statistical similarity measure” on pages 3–12 in Proceedings of the Tenth Annual International ACM SIGIR Conference on Research & Development in Information Retrieval, edited by C.T. Yu and C.J. van Rijsbergen. This final version was submitted January 27, 1988.

    View full text