Bibliothek

feed icon rss

Ihre E-Mail wurde erfolgreich gesendet. Bitte prüfen Sie Ihren Maileingang.

Leider ist ein Fehler beim E-Mail-Versand aufgetreten. Bitte versuchen Sie es erneut.

Vorgang fortführen?

Exportieren
  • 1
    Digitale Medien
    Digitale Medien
    Palo Alto, Calif. : Annual Reviews
    Annual Review of Neuroscience 26 (2003), S. 381-410 
    ISSN: 0147-006X
    Quelle: Annual Reviews Electronic Back Volume Collection 1932-2001ff
    Thema: Biologie , Medizin
    Notizen: Abstract In the vertebrate nervous system, sensory stimuli are typically encoded through the concerted activity of large populations of neurons. Classically, these patterns of activity have been treated as encoding the value of the stimulus (e.g., the orientation of a contour), and computation has been formalized in terms of function approximation. More recently, there have been several suggestions that neural computation is akin to a Bayesian inference process, with population activity patterns representing uncertainty about stimuli in the form of probability distributions (e.g., the probability density function over the orientation of a contour). This paper reviews both approaches, with a particular emphasis on the latter, which we see as a very promising framework for future modeling and experimental work.
    Materialart: Digitale Medien
    Bibliothek Standort Signatur Band/Heft/Jahr Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 2
    Digitale Medien
    Digitale Medien
    [s.l.] : Nature Publishing Group
    Nature 377 (1995), S. 725-728 
    ISSN: 1476-4687
    Quelle: Nature Archives 1869 - 2009
    Thema: Biologie , Chemie und Pharmazie , Medizin , Allgemeine Naturwissenschaft , Physik
    Notizen: [Auszug] Real and colleagues8 '] performed a series of experiments on bumblebees foraging on artificial blue and yellow flowers whose colours were the only predictor of the nectar delivery. They examined how bees respond to the mean and variability of this delivery in a foraging version of a ...
    Materialart: Digitale Medien
    Bibliothek Standort Signatur Band/Heft/Jahr Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 3
    ISSN: 1476-4687
    Quelle: Nature Archives 1869 - 2009
    Thema: Biologie , Chemie und Pharmazie , Medizin , Allgemeine Naturwissenschaft , Physik
    Notizen: [Auszug] The ability to use environmental stimuli to predict impending harm is critical for survival. Such predictions should be available as early as they are reliable. In pavlovian conditioning, chains of successively earlier predictors are studied in terms of higher-order relationships, and have ...
    Materialart: Digitale Medien
    Bibliothek Standort Signatur Band/Heft/Jahr Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 4
    Digitale Medien
    Digitale Medien
    [s.l.] : Nature Publishing Group
    Nature 441 (2006), S. 876-879 
    ISSN: 1476-4687
    Quelle: Nature Archives 1869 - 2009
    Thema: Biologie , Chemie und Pharmazie , Medizin , Allgemeine Naturwissenschaft , Physik
    Notizen: [Auszug] Decision making in an uncertain environment poses a conflict between the opposing demands of gathering and exploiting information. In a classic illustration of this ‘exploration–exploitation’ dilemma, a gambler choosing between multiple slot machines balances the desire to select what seems, on ...
    Materialart: Digitale Medien
    Bibliothek Standort Signatur Band/Heft/Jahr Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 5
    Digitale Medien
    Digitale Medien
    Springer
    Machine learning 25 (1996), S. 5-22 
    ISSN: 0885-6125
    Schlagwort(e): Reinforcement learning ; dynamic programming ; exploration bonuses ; certainty equivalence ; non-stationary environment
    Quelle: Springer Online Journal Archives 1860-2000
    Thema: Informatik
    Notizen: Abstract Finding the Bayesian balance between exploration and exploitation in adaptive optimal control is in general intractable. This paper shows how to compute suboptimal estimates based on a certainty equivalence approximation (Cozzolino, Gonzalez-Zubieta Miller, 1965) arising from a form of dual control. This systematizes and extends existing uses of exploration bonuses in reinforcement learning (Sutton, 1990). The approach has two components: a statistical model of uncertainty in the world and a way of turning this into exploratory behavior. This general approach is applied to two-dimensional mazes with moveable barriers and its performance is compared with Sutton‘s DYNA system.
    Materialart: Digitale Medien
    Bibliothek Standort Signatur Band/Heft/Jahr Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 6
    Digitale Medien
    Digitale Medien
    Springer
    Machine learning 14 (1994), S. 295-301 
    ISSN: 0885-6125
    Schlagwort(e): reinforcement learning ; temporal differences ; $$\mathcal{Q}$$ -learning
    Quelle: Springer Online Journal Archives 1860-2000
    Thema: Informatik
    Notizen: Abstract The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future. Sutton (1988) proved that for a special case of temporal differences, the expected values of the predictions converge to their correct values, as larger samples are taken, and Dayan (1992) extended his proof to the general case. This article proves the stronger result than the predictions of a slightly modified form of temporal difference learning converge with probability one, and shows how to quantify the rate of convergence.
    Materialart: Digitale Medien
    Bibliothek Standort Signatur Band/Heft/Jahr Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 7
    Digitale Medien
    Digitale Medien
    Springer
    Machine learning 14 (1994), S. 295-301 
    ISSN: 0885-6125
    Schlagwort(e): reinforcement learning ; temporal differences ; Q-learning
    Quelle: Springer Online Journal Archives 1860-2000
    Thema: Informatik
    Notizen: Abstract The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future. Sutton (1988) proved that for a special case of temporal differences, the expected values of the predictions converge to their correct values, as large samples are taken, and Dayan (1992) extended his proof to the general case. This article proves the stronger result that the predictions of a slightly modified form of temporal difference learning converge with probability one, and shows how to quantify the rate of convergence.
    Materialart: Digitale Medien
    Bibliothek Standort Signatur Band/Heft/Jahr Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 8
    Digitale Medien
    Digitale Medien
    Springer
    Machine learning 32 (1998), S. 5-40 
    ISSN: 0885-6125
    Schlagwort(e): reinforcement learning ; temporal difference ; Monte Carlo ; MSE ; bias ; variance ; eligibility trace ; Markov reward process
    Quelle: Springer Online Journal Archives 1860-2000
    Thema: Informatik
    Notizen: Abstract We provide analytical expressions governing changes to the bias and variance of the lookup table estimators provided by various Monte Carlo and temporal difference value estimation algorithms with offline updates over trials in absorbing Markov reward processes. We have used these expressions to develop software that serves as an analysis tool: given a complete description of a Markov reward process, it rapidly yields an exact mean-square-error curve, the curve one would get from averaging together sample mean-square-error curves from an infinite number of learning trials on the given problem. We use our analysis tool to illustrate classes of mean-square-error curve behavior in a variety of example reward processes, and we show that although the various temporal difference algorithms are quite sensitive to the choice of step-size and eligibility-trace parameters, there are values of these parameters that make them similarly competent, and generally good.
    Materialart: Digitale Medien
    Bibliothek Standort Signatur Band/Heft/Jahr Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 9
    Digitale Medien
    Digitale Medien
    Springer
    Machine learning 8 (1992), S. 341-362 
    ISSN: 0885-6125
    Schlagwort(e): Reinforcement learning ; temporal differences ; asynchronous dynamic programming
    Quelle: Springer Online Journal Archives 1860-2000
    Thema: Informatik
    Notizen: Abstract The method of temporal differences (TD) is one way of making consistent predictions about the future. This paper uses some analysis of Watkins (1989) to extend a convergence theorem due to Sutton (1988) from the case which only uses information from adjacent time steps to that involving information from arbitrary ones. It also considers how this version of TD behaves in the face of linearly dependent representations for states—demonstrating that it still converges, but to a different answer from the least mean squares algorithm. Finally it adapts Watkins' theorem that $$\mathcal{Q}$$ -learning, his closely related prediction and action learning method, converges with probability one, to demonstrate this strong form of convergence for a slightly modified version of TD.
    Materialart: Digitale Medien
    Bibliothek Standort Signatur Band/Heft/Jahr Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 10
    Digitale Medien
    Digitale Medien
    Springer
    Machine learning 8 (1992), S. 341-362 
    ISSN: 0885-6125
    Schlagwort(e): Reinforcement learning ; temporal differences ; asynchronous dynamic programming
    Quelle: Springer Online Journal Archives 1860-2000
    Thema: Informatik
    Notizen: Abstract The method of temporal differences (TD) is one way of making consistent predictions about the future. This paper uses some analysis of Watkins (1989) to extend a convergence theorem due to Sutton (1988) from the case which only uses information from adjacent time steps to that involving information from arbitrary ones. It also considers how this version of TD behaves in the face of linearly dependent representations for states—demonstrating that it still converges, but to a different answer from the least mean squares algorithm. Finally it adapts Watkins' theorem that Q-learning, his closely related prediction and action learning method, converges with probability one, to demonstrate this strong form of convergence for a slightly modified version of TD.
    Materialart: Digitale Medien
    Bibliothek Standort Signatur Band/Heft/Jahr Verfügbarkeit
    BibTip Andere fanden auch interessant ...
Schließen ⊗
Diese Webseite nutzt Cookies und das Analyse-Tool Matomo. Weitere Informationen finden Sie hier...