ZIB

1

Digitale Medien

INFERENCE AND COMPUTATION WITH POPULATION CODES (2003)

Pouget, Alexandre ; Dayan, Peter ; Zemel, Richard S.

Palo Alto, Calif. : Annual Reviews

Annual Review of Neuroscience 26 (2003), S. 381-410

zur Merkliste hinzufügen auf der Merkliste

Details

ISSN: 0147-006X

Quelle: Annual Reviews Electronic Back Volume Collection 1932-2001ff

Thema: Biologie , Medizin

Notizen: Abstract In the vertebrate nervous system, sensory stimuli are typically encoded through the concerted activity of large populations of neurons. Classically, these patterns of activity have been treated as encoding the value of the stimulus (e.g., the orientation of a contour), and computation has been formalized in terms of function approximation. More recently, there have been several suggestions that neural computation is akin to a Bayesian inference process, with population activity patterns representing uncertainty about stimuli in the form of probability distributions (e.g., the probability density function over the orientation of a contour). This paper reviews both approaches, with a particular emphasis on the latter, which we see as a very promising framework for future modeling and experimental work.

Materialart: Digitale Medien

URL: http://dx.doi.org/10.1146/annurev.neuro.26.041002.131112

Permalink

Bibliothek	Standort	Signatur	Band/Heft/Jahr	Verfügbarkeit

Andere fanden auch interessant ...

Artikel (Nationallizenzen)

Volltext

2

Digitale Medien

Bee foraging in uncertain environments using predictive hebbian learning (1995)

Montague, P. Read ; Dayan, Peter ; Person, Christophe ; [weitere]

[s.l.] : Nature Publishing Group

Nature 377 (1995), S. 725-728

zur Merkliste hinzufügen auf der Merkliste

Details

ISSN: 1476-4687

Quelle: Nature Archives 1869 - 2009

Thema: Biologie , Chemie und Pharmazie , Medizin , Allgemeine Naturwissenschaft , Physik

Notizen: [Auszug] Real and colleagues8 '] performed a series of experiments on bumblebees foraging on artificial blue and yellow flowers whose colours were the only predictor of the nectar delivery. They examined how bees respond to the mean and variability of this delivery in a foraging version of a ...

Materialart: Digitale Medien

URL: http://dx.doi.org/10.1038/377725a0

Permalink

Bibliothek	Standort	Signatur	Band/Heft/Jahr	Verfügbarkeit

Andere fanden auch interessant ...

Artikel (Nationallizenzen)

Volltext

3

Digitale Medien

Temporal difference models describe higher-order learning in humans (2004)

O'Doherty, John P. ; Dayan, Peter ; Koltzenburg, Martin ; [weitere]

[s.l.] : Macmillian Magazines Ltd.

Nature 429 (2004), S. 664-667

zur Merkliste hinzufügen auf der Merkliste

Details

ISSN: 1476-4687

Quelle: Nature Archives 1869 - 2009

Thema: Biologie , Chemie und Pharmazie , Medizin , Allgemeine Naturwissenschaft , Physik

Notizen: [Auszug] The ability to use environmental stimuli to predict impending harm is critical for survival. Such predictions should be available as early as they are reliable. In pavlovian conditioning, chains of successively earlier predictors are studied in terms of higher-order relationships, and have ...

Materialart: Digitale Medien

URL: http://dx.doi.org/10.1038/nature02581

Permalink

Bibliothek	Standort	Signatur	Band/Heft/Jahr	Verfügbarkeit

Andere fanden auch interessant ...

Artikel (Nationallizenzen)

Volltext

4

Digitale Medien

Cortical substrates for exploratory decisions in humans (2006)

Dayan, Peter ; Seymour, Ben ; Dolan, Raymond J. ; [weitere]

[s.l.] : Nature Publishing Group

Nature 441 (2006), S. 876-879

zur Merkliste hinzufügen auf der Merkliste

Details

ISSN: 1476-4687

Quelle: Nature Archives 1869 - 2009

Thema: Biologie , Chemie und Pharmazie , Medizin , Allgemeine Naturwissenschaft , Physik

Notizen: [Auszug] Decision making in an uncertain environment poses a conflict between the opposing demands of gathering and exploiting information. In a classic illustration of this ‘exploration–exploitation’ dilemma, a gambler choosing between multiple slot machines balances the desire to select what seems, on ...

Materialart: Digitale Medien

URL: http://dx.doi.org/10.1038/nature04766

Permalink

Bibliothek	Standort	Signatur	Band/Heft/Jahr	Verfügbarkeit

Andere fanden auch interessant ...

Artikel (Nationallizenzen)

Volltext

5

Digitale Medien

Exploration Bonuses and Dual Control (1996)

Dayan, Peter ; Sejnowski, Terrence J.

Springer

Machine learning 25 (1996), S. 5-22

zur Merkliste hinzufügen auf der Merkliste

Details

ISSN: 0885-6125

Schlagwort(e): Reinforcement learning ; dynamic programming ; exploration bonuses ; certainty equivalence ; non-stationary environment

Quelle: Springer Online Journal Archives 1860-2000

Thema: Informatik

Notizen: Abstract Finding the Bayesian balance between exploration and exploitation in adaptive optimal control is in general intractable. This paper shows how to compute suboptimal estimates based on a certainty equivalence approximation (Cozzolino, Gonzalez-Zubieta Miller, 1965) arising from a form of dual control. This systematizes and extends existing uses of exploration bonuses in reinforcement learning (Sutton, 1990). The approach has two components: a statistical model of uncertainty in the world and a way of turning this into exploratory behavior. This general approach is applied to two-dimensional mazes with moveable barriers and its performance is compared with Sutton‘s DYNA system.

Materialart: Digitale Medien

URL: http://dx.doi.org/10.1023/A:1018357105171

Permalink

Bibliothek	Standort	Signatur	Band/Heft/Jahr	Verfügbarkeit

Andere fanden auch interessant ...

Artikel (Nationallizenzen)

Volltext

6

Digitale Medien

TD(λ) Converges with Probability 1 (1994)

Dayan, Peter ; Sejnowski, Terrence J.

Springer

Machine learning 14 (1994), S. 295-301

zur Merkliste hinzufügen auf der Merkliste

Details

ISSN: 0885-6125

Schlagwort(e): reinforcement learning ; temporal differences ; $$\mathcal{Q}$$ -learning

Quelle: Springer Online Journal Archives 1860-2000

Thema: Informatik

Notizen: Abstract The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future. Sutton (1988) proved that for a special case of temporal differences, the expected values of the predictions converge to their correct values, as larger samples are taken, and Dayan (1992) extended his proof to the general case. This article proves the stronger result than the predictions of a slightly modified form of temporal difference learning converge with probability one, and shows how to quantify the rate of convergence.

Materialart: Digitale Medien

URL: http://dx.doi.org/10.1023/A:1022657612745

Permalink

Bibliothek	Standort	Signatur	Band/Heft/Jahr	Verfügbarkeit

Andere fanden auch interessant ...

Artikel (Nationallizenzen)

Volltext

7

Digitale Medien

TD(λ) converges with probability 1 (1994)

Dayan, Peter ; Sejnowski, Terrence J.

Springer

Machine learning 14 (1994), S. 295-301

zur Merkliste hinzufügen auf der Merkliste

Details

ISSN: 0885-6125

Schlagwort(e): reinforcement learning ; temporal differences ; Q-learning

Quelle: Springer Online Journal Archives 1860-2000

Thema: Informatik

Notizen: Abstract The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future. Sutton (1988) proved that for a special case of temporal differences, the expected values of the predictions converge to their correct values, as large samples are taken, and Dayan (1992) extended his proof to the general case. This article proves the stronger result that the predictions of a slightly modified form of temporal difference learning converge with probability one, and shows how to quantify the rate of convergence.

Materialart: Digitale Medien

URL: http://dx.doi.org/10.1007/BF00993978

Permalink

Bibliothek	Standort	Signatur	Band/Heft/Jahr	Verfügbarkeit

Andere fanden auch interessant ...

Artikel (Nationallizenzen)

Volltext

8

Digitale Medien

Analytical Mean Squared Error Curves for Temporal Difference Learning (1998)

Singh, Satinder ; Dayan, Peter

Springer

Machine learning 32 (1998), S. 5-40

zur Merkliste hinzufügen auf der Merkliste

Details

ISSN: 0885-6125

Schlagwort(e): reinforcement learning ; temporal difference ; Monte Carlo ; MSE ; bias ; variance ; eligibility trace ; Markov reward process

Quelle: Springer Online Journal Archives 1860-2000

Thema: Informatik

Notizen: Abstract We provide analytical expressions governing changes to the bias and variance of the lookup table estimators provided by various Monte Carlo and temporal difference value estimation algorithms with offline updates over trials in absorbing Markov reward processes. We have used these expressions to develop software that serves as an analysis tool: given a complete description of a Markov reward process, it rapidly yields an exact mean-square-error curve, the curve one would get from averaging together sample mean-square-error curves from an infinite number of learning trials on the given problem. We use our analysis tool to illustrate classes of mean-square-error curve behavior in a variety of example reward processes, and we show that although the various temporal difference algorithms are quite sensitive to the choice of step-size and eligibility-trace parameters, there are values of these parameters that make them similarly competent, and generally good.

Materialart: Digitale Medien

URL: http://dx.doi.org/10.1023/A:1007495401240

Permalink

Bibliothek	Standort	Signatur	Band/Heft/Jahr	Verfügbarkeit

Andere fanden auch interessant ...

Artikel (Nationallizenzen)

Volltext

9

Digitale Medien

The Convergence of TD(λ) for General λ (1992)

Dayan, Peter

Springer

Machine learning 8 (1992), S. 341-362

zur Merkliste hinzufügen auf der Merkliste

Details

ISSN: 0885-6125

Schlagwort(e): Reinforcement learning ; temporal differences ; asynchronous dynamic programming

Quelle: Springer Online Journal Archives 1860-2000

Thema: Informatik

Notizen: Abstract The method of temporal differences (TD) is one way of making consistent predictions about the future. This paper uses some analysis of Watkins (1989) to extend a convergence theorem due to Sutton (1988) from the case which only uses information from adjacent time steps to that involving information from arbitrary ones. It also considers how this version of TD behaves in the face of linearly dependent representations for states—demonstrating that it still converges, but to a different answer from the least mean squares algorithm. Finally it adapts Watkins' theorem that $$\mathcal{Q}$$ -learning, his closely related prediction and action learning method, converges with probability one, to demonstrate this strong form of convergence for a slightly modified version of TD.

Materialart: Digitale Medien

URL: http://dx.doi.org/10.1023/A:1022632907294

Permalink

Bibliothek	Standort	Signatur	Band/Heft/Jahr	Verfügbarkeit

Andere fanden auch interessant ...

Artikel (Nationallizenzen)

Volltext

10

Digitale Medien

The convergence of TD(λ) for general λ (1992)

Dayan, Peter

Springer

Machine learning 8 (1992), S. 341-362

zur Merkliste hinzufügen auf der Merkliste

Details

ISSN: 0885-6125

Schlagwort(e): Reinforcement learning ; temporal differences ; asynchronous dynamic programming

Quelle: Springer Online Journal Archives 1860-2000

Thema: Informatik

Notizen: Abstract The method of temporal differences (TD) is one way of making consistent predictions about the future. This paper uses some analysis of Watkins (1989) to extend a convergence theorem due to Sutton (1988) from the case which only uses information from adjacent time steps to that involving information from arbitrary ones. It also considers how this version of TD behaves in the face of linearly dependent representations for states—demonstrating that it still converges, but to a different answer from the least mean squares algorithm. Finally it adapts Watkins' theorem that Q-learning, his closely related prediction and action learning method, converges with probability one, to demonstrate this strong form of convergence for a slightly modified version of TD.

Materialart: Digitale Medien

URL: http://dx.doi.org/10.1007/BF00992701

Permalink

Bibliothek	Standort	Signatur	Band/Heft/Jahr	Verfügbarkeit

Andere fanden auch interessant ...

Artikel (Nationallizenzen)

Volltext