Abstract
This paper discusses a novel distributed adaptive algorithm and representation used to construct populations of adaptive Web agents. These InfoSpiders browse networked information environments on-line in search of pages relevant to the user, by traversing hyperlinks in an autonomous and intelligent fashion. Each agent adapts to the spatial and temporal regularities of its local context thanks to a combination of machine learning techniques inspired by ecological models: evolutionary adaptation with local selection, reinforcement learning and selective query expansion by internalization of environmental signals, and optional relevance feedback. We evaluate the feasibility and performance of these methods in three domains: a general class of artificial graph environments, a controlled subset of the Web, and (preliminarly) the full Web. Our results suggest that InfoSpiders could take advantage of the starting points provided by search engines, based on global word statistics, and then use linkage topology to guide their search on-line. We show how this approach can complement the current state of the art, especially with respect to the scalability challenge.
Article PDF
Similar content being viewed by others
References
Armstrong, R., Freitag, D., Joachims, T., & Mitchell, T. (1995). Webwatcher: A learning apprentice for the world wide web. AAAI Spring Symposium Information Gathering from Heterogeneous, Distributed Environments.
Arocena, G.O., Mendelzon, A.O., & Mihaila, G.A. (1997). Applications of a web query language. Proc. 6th International World Wide Web Conference.
Balabanović, M. (1997). An adaptive web page recommendation service. Proc. 1st International Conference on Autonomous Agents.
Belew, R.K. (forthcoming). Finding out about: Search engine technology from a cognitive perspective. Cambridge University Press.
Belew, R.K. & Mitchell, M. (Eds.) (1996), Adaptive individuals in evolving populations: models and algorithms. Santa Fe Institute Studies in the Sciences of Complexity. Reading, MA: Addison Wesley.
Bollacker, K.D., Lawrence, S., & Giles, C.L. (1998). CiteSeer: An autonomous web agent for automatic retrieval and identification of interesting publications. Proc. 2nd International Conference on Autonomous Agents.
Bowman, C.M., Danzig, P.B., Manber, U., & Schwartz, M.F. (1994). Scalable internet resource discovery: Research problems and approaches. Communications of the ACM, 37(8), 98–107.
Chakrabarti, S., Dom, B., Gibson, D., Kumar, S.R., Raghavan, P., Rajagopalan, S., & Tomkins, A. (1998a). Experiments in topic distillation. ACM SIGIR Workshop on Hypertext Information Retrieval on the Web.
Chakrabarti, S., Dom, B., & Indyk, P. (1998b). Enhanced hypertext categorization using hyperlinks. Proc. ACM SIGMOD, Seattle, WA.
Chakrabarti, S., Dom, B., Raghavan, P., Rajagopalan, S., Gibson, D., & Kleinberg, J. (1998c). Automatic resource compilation by analyzing hyperlink structure and associated text. Proc. 7th International World Wide Web Conference.
Cohen, W.W. (1996). Learning trees and rules with set-valued features. Proc. 13th AI Conference (pp. 709–716).
Cooper, W.S. (1968). Expected search length: A single measure of retrieval effectiveness based on weak ordering action of retrieval systems. Journal of the American Society for Information Science, 19, 30–41.
Croft, W.B. & Turtle, H.R. (1993). Retrieval strategies for hypertext. Information Processing and Management, 29(3), 313–324.
De Bra, P.M.E. & Post, R.D.J. (1994). Information retrieval in the world wide web: Making client-based searching feasible. Proc. 1st International World Wide Web Conference, Geneva.
De Jong, K.A. (1975). An analysis of the behavior of a class of genetic adaptive systems. Ph.D. Thesis, University of Michigan.
De Jong, K.A. & Sarma, J. (1995). On decentralizing selection algorithms. Proc. 6th International Conference on Genetic Algorithms.
Digital Equipment Corporation. http://altavista.digital.com.
Eichmann, D. (1994). The RBSE spider-Balancing effective search against Web load. Proc. 1st International World Wide Web Conference. Encyclopaedia Britannica, Inc. http://www.eb.com. Excite. http://www.excite.com.
Fox, C. (1992). Lexical analysis and stop lists. Information retrieval: Data structures and algorithms. Prentice-Hall.
Frakes, W.B. (1992). Stemming algorithms. Information retrieval: Data structures and algorithms. Prentice-Hall.
Goldberg, D.E. & Richardson, J. (1987). Genetic algorithms with sharing for multimodal function optimization. Proc. 2nd International Conference on Genetic Algorithms.
Harman, D. (1992). Relevance feedback and other query modification techniques. Information retrieval: Data structures and algorithms. Prentice-Hall.
Hart, W.E. & Belew, R.K. (1996). Optimization with genetic algorithm hybrids that use local search. Adaptive individuals in evolving populations: Models and algorithms. Addison Wesley.
Huberman, B.A., Pirolli, P.L.T., Pitkow, J.E., & Lukose, R.M. (1998). Strong regularities in world wide web surfing. Science, 280(5360), 95–97.
Koutsoupias, E., Papadimitriou, C., & Yannakakis, M. (1996). Searching a fixed graph. Proc. 23rd International Colloquium on Automata, Languages and Programming, Berlin, Germany (pp. 280–289). Springer-Verlag.
Kwok, K.L. (1988). On the use of bibliographically related titles for the enhancement of document representations. Information Processing and Management, 24(2), 123–131.
Larson, R.R. (1996). Bibliometrics of the world wide web: An exploratory analysis of the intellectual structure of cyberspace. Proc. 1996 Annual ASIS Meeting.
Lawrence, S.R. & Giles, C.L. (1998). Searching the world wide web. Science, 280, 98–100.
Lewis, D.D. (1995). Active by accident: Relevance feedback in information retrieval. AAAI Fall Symposium on Active Learning.
Lewis, D.D. (1996). Information retrieval and the statistics of large data sets. Proc. NRC Massive Data Sets Workshop, Washington, DC.
Lewis, D.D. (1997). Challenges in machine learning for text classification. Proc. 9th Annual Conference on Computational Learning Theory, New York, NY.
Lewis, D.D. & Hayes, P.H. (1994). Special issue on text classification (guest editorial). ACM Transactions on Information Systems, 12(3), 231.
Lewis, D.D., Schapire, R.E., Callan, J.P., & Papka, R. (1996). Training algorithms for linear text classifiers. Proc. ACM SIGIR (pp. 298–306).
Lieberman, H. (1997). Autonomous interface agents. Proc. ACMConference on Computers and Human Interface, Atlanta, GA.
Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning, and teaching. Machine Learning, 8, 293–321.
Lukose, R.M. & Huberman, B.A. (1998). Surfing as a real option. Proc. 4th Intl. Conf. on Computational Economics.
Lycos. http://www.lycos.com.
Mahfoud, S.W. (1994). Population sizing for sharing methods. Foundations of Genetic Algorithms 3.
Mahfoud, S.W. (1995). A comparison of parallel and sequential niching methods. Proc. 6th International Conference on Genetic Algorithms.
Menczer, F. (1997). ARACHNID: Adaptive retrieval agents choosing heuristic neighborhoods for information discovery. Proc. 14th International Conference on Machine Learning.
Menczer, F. (1998). Life-like agents: Internalizing local cues for reinforcement learning and evolution. Ph.D. Thesis, University of California, San Diego.
Menczer, F. & Belew, R.K. (1996). Latent energy environments. Adaptive individuals in evolving populations: Models and algorithms. Addison Wesley.
Menczer, F. & Belew, R.K. (1998a). Adaptive information agents in distributed textual environments. Proc. 2nd International Conference on Autonomous Agents, Minneapolis, MN.
Menczer, F. & Belew, R.K. (1998b). Local selection. Proc. 7th Annual Conference on Evolutionary Programming, San Diego, CA.
Menczer, F. & Monge, A.E. (1999). Scalable web search by adaptive online agents: An InfoSpiders case study. In M. Klusch (Ed.), Intelligent information agents: Agent-based Information discovery and management on the internet. Springer.
Menczer, F., Willuhn, W., & Belew, R.K. (1994). An endogenous fitness paradigm for adaptive information agents. CIKM Workshop on Intelligent Information Agents.
Mitchell, T. (1997). Machine Learning (Ch. 4). New York, NY: McGraw-Hill.
Monge, A.E. & Elkan, C.P. (1996). The WEBFIND tool for finding scientific papers over the worldwide web. Proceedings of the 3rd International Congress on Computer Science Research.
Moukas, A. & Zacharia, G. (1997). Evolving a multi-agent information filtering solution in amalthaea. Proc. 1st International Conference on Autonomous Agents.
Pack Kaelbling, L., Littman, M.L., & Moore, A.W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.
Pasquale, J. (1998). The UCSD Active Web. http://www.cs.ucsd.edu/Department/Infra.
Pinkerton, B. (1994). Finding what people want: Experiences with the Webcrawler. Proc. 1st International World Wide Web Conference.
Rivlin, E., Botafogo, R., & Shneiderman, B. (1994). Navigating in hyperspace: Designing a structure-based toolbox. Communications of the ACM, 37(2), 87–96.
Robertson, S.E. & Spark Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science (pp. 129–146).
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart & J.L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1). Cambridge MA: Bradford Books (MIT Press).
Salton, G. (1963). Associative document retrieval techniques using bibliographic information. Journal of the ACM, 10(4), 440–457.
Salton, G. & McGill, M.J. (1983). An introduction to modern information retrieval. New York, NY: McGraw-Hill.
Savoy, J. (1996). An extended vector processing scheme for searching information in hypertext. Information Processing and Management, 32(2), 155–170.
Shakes, J., Langheinrich, M., & Etzioni, O. (1997). Dynamic reference sifting: A case study in the homepage domain. Proc. 6th International World Wide Web Conference.
Sparck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 111–121.
Sparck Jones, K. (1979). Experiments in relevance weighting of search terms. Information Processing and Management, 15, 133–144.
Spertus, E. (1997). Parasite: Mining structural information on the web. Proc. 6th International World Wide Web Conference.
Srinivasan, P. (1998). Personal communication.
Steier, A.M. (1994). Statistical semantics of phrases in hierarchical contexts. Ph.D. Thesis, Computer Science and Engineering Department, U.C. San Diego.
Steier, A.M. & Belew, R.K. (1994). Exporting phrases: A statistical analysis of topical language. In R. Casey & B. Croft (Eds.), 2nd Symposium on Document Analysis and Information Retrieval.
Sutton, R. (1996). Reinforcement learning and information access. AAAI Spring Symposium on Machine Learning and Information Access.
van Rijsbergen, C.J. (1979). Information Retrieval, second edition, London: Butterworths.
Weiss, R., Velez, B., Sheldon, M., Nemprempre, C., Szilagyi, P., & Giffor, D.K. (1996). Hypursuit: A hierarchical network search engine that exploits content-link hypertext clustering. Proc. Seventh ACM Conference on Hypertext.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Menczer, F., Belew, R.K. Adaptive Retrieval Agents: Internalizing Local Context and Scaling up to the Web. Machine Learning 39, 203–242 (2000). https://doi.org/10.1023/A:1007653114902
Issue Date:
DOI: https://doi.org/10.1023/A:1007653114902