ZIB

1

Book

Reinforcement learning : an introduction (2018)

Sutton, Richard S. ; Barto, Andrew

Cambridge :MIT Press,

add to mindlist on the mindlist

Details

Title: Reinforcement learning : an introduction

Author: Sutton, Richard S.

Contributer: Barto, Andrew

Edition: Second edition

Publisher: Cambridge :MIT Press,

Year of publication: 2018

Pages: 526 Seiten

ISBN: 978-0-262-03924-6

Type of Medium: Book

Language: English

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

ZIB Catalog

Zuse Institute Berlin

2

Electronic Resource

Associative search network: A reinforcement learning associative memory (1981)

Barto, Andrew G. ; Sutton, Richard S. ; Brouwer, Peter S.

Springer

Biological cybernetics 40 (1981), S. 201-211

add to mindlist on the mindlist

Details

ISSN: 1432-0770

Source: Springer Online Journal Archives 1860-2000

Topics: Biology , Computer Science , Physics

Notes: Abstract An associative memory system is presented which does not require a “teacher” to provide the desired associations. For each input key it conducts a search for the output pattern which optimizes an external payoff or reinforcement signal. The associative search network (ASN) combines pattern recognition and function optimization capabilities in a simple and effective way. We define the associative search problem, discuss conditions under which the associative search network is capable of solving it, and present results from computer simulations. The synthesis of sensory-motor control surfaces is discussed as an example of the associative search problem.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00453370

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

3

Electronic Resource

Reinforcement Learning with Replacing Eligibility Traces (1996)

Singh, Satinder P. ; Sutton, Richard S.

Springer

Machine learning 22 (1996), S. 123-158

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: reinforcement learning ; temporal difference learning ; eligibility trace ; Monte Carlo method ; Markov chain ; CMAC

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, the replacing trace, analyze it theoretically, and show that it results in faster, more reliable learning than the conventional trace. Both kinds of trace assign credit to prior events according to how recently they occurred, but only the conventional trace gives greater credit to repeated events. Our analysis is for conventional and replace-trace versions of the offline TD(1) algorithm applied to undiscounted absorbing Markov chains. First, we show that these methods converge under repeated presentations of the training set to the same predictions as two well known Monte Carlo methods. We then analyze the relative efficiency of the two Monte Carlo methods. We show that the method corresponding to conventional TD is biased, whereas the method corresponding to replace-trace TD is unbiased. In addition, we show that the method corresponding to replacing traces is closely related to the maximum likelihood solution for these tasks, and that its mean squared error is always lower in the long run. Computational results confirm these analyses and show that they are applicable more generally. In particular, we show that replacing traces significantly improve performance and reduce parameter sensitivity on the "Mountain-Car" task, a full reinforcement-learning problem with a continuous state space, when using a feature-based function approximator.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1018012322525

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

4

Electronic Resource

Learning to predict by the methods of temporal differences (1988)

Sutton, Richard S.

Springer

Machine learning 3 (1988), S. 9-44

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Incremental learning ; prediction ; connectionism ; credit assignment ; evaluation functions

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract This article introduces a class of incremental learning procedures specialized for prediction-that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, temporal-difference methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporal-difference methods can be applied to advantage.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00115009

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

5

Electronic Resource

Introduction: The Challenge of Reinforcement Learning (1992)

Sutton, Richard S.

Springer

Machine learning 8 (1992), S. 225-227

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1022620604568

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

6

Electronic Resource

Introduction: The challenge of reinforcement learning (1992)

Sutton, Richard S.

Springer

Machine learning 8 (1992), S. 225-227

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00992695

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

7

Electronic Resource

Reinforcement learning with replacing eligibility traces (1996)

Singh, Satinder P. ; Sutton, Richard S.

Springer

Machine learning 22 (1996), S. 123-158

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: reinforcement learning ; temporal difference learning ; eligibility trace ; Monte Carlo method ; Markov chain ; CMAC

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, thereplacing trace, analyze it theoretically, and show that it results in faster, more reliable learning than the conventional trace. Both kinds of trace assign credit to prior events according to how recently they occurred, but only the conventional trace gives greater credit to repeated events. Our analysis is for conventional and replace-trace versions of the offline TD(1) algorithm applied to undiscounted absorbing Markov chains. First, we show that these methods converge under repeated presentations of the training set to the same predictions as two well known Monte Carlo methods. We then analyze the relative efficiency of the two Monte Carlo methods. We show that the method corresponding to conventional TD is biased, whereas the method corresponding to replace-trace TD is unbiased. In addition, we show that the method corresponding to replacing traces is closely related to the maximum likelihood solution for these tasks, and that its mean squared error is always lower in the long run. Computational results confirm these analyses and show that they are applicable more generally. In particular, we show that replacing traces significantly improve performance and reduce parameter sensitivity on the "Mountain-Car" task, a full reinforcement-learning problem with a continuous state space, when using a feature-based function approximator.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00114726

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

8

Electronic Resource

Learning to Predict by the Methods of Temporal Differences (1988)

Sutton, Richard S.

Springer

Machine learning 3 (1988), S. 9-44

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Incremental learning ; prediction ; connectionism ; credit assignment ; evaluation functions

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, temporal-difference methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporal-difference methods can be applied to advantage.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1022633531479

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

9

Electronic Resource

Landmark learning: An illustration of associative search (1981)

Barto, Andrew G. ; Sutton, Richard S.

Springer

Biological cybernetics 42 (1981), S. 1-8

add to mindlist on the mindlist

Details

ISSN: 1432-0770

Source: Springer Online Journal Archives 1860-2000

Topics: Biology , Computer Science , Physics

Notes: Abstract In a previous paper we defined the associative search problem and presented a system capable of solving it under certain conditions. In this paper we interpret a spatial learning problem as an associative search task and describe the behavior of an adaptive network capable of solving it. This example shows how naturally the associative search problem can arise and permits the search, association, and generalization properties of the adaptive network to bee clearly illustrated.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00335152

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

10

Electronic Resource

Synthesis of nonlinear control surfaces by a layered associative search network (1982)

Barto, Andrew G. ; Anderson, Charles W. ; Sutton, Richard S.

Springer

Biological cybernetics 43 (1982), S. 175-185

add to mindlist on the mindlist

Details

ISSN: 1432-0770

Source: Springer Online Journal Archives 1860-2000

Topics: Biology , Computer Science , Physics

Notes: Abstract An approach to solving nonlinear control problems is illustrated by means of a layered associative network composed of adaptive elements capable of reinforcement learning. The first layer adaptively develops a representation in terms of which the second layer can solve the problem linearly. The adaptive elements comprising the network employ a novel type of learning rule whose properties, we argue, are essential to the adaptive behavior of the layered network. The behavior of the network is illustrated by means of a spatial learning problem that requires the formation of nonlinear associations. We argue that this approach to nonlinearity can be extended to a large class of nonlinear control problems.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00319977

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext