ZIB

1

Electronic Resource

Asynchronous stochastic approximation and Q-learning (1994)

Tsitsiklis, John N.

Springer

Machine learning 16 (1994), S. 185-202

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Reinforcement learning ; Q-learning ; dynamic programming ; stochastic approximation

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00993306

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

2

Electronic Resource

Asynchronous Stochastic Approximation and Q-Learning (1994)

Tsitsiklis, John N.

Springer

Machine learning 16 (1994), S. 185-202

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Reinforcement learning ; Q-learning ; dynamic programming ; stochastic approximation

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1022689125041

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

3

Electronic Resource

Feature-based methods for large scale dynamic programming (1996)

Tsitsiklis, John N. ; Roy, Benjamin

Springer

Machine learning 22 (1996), S. 59-94

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Compact representation ; curse of dimensionality ; dynamic programming ; features ; function approximation ; neuro-dynamic programming ; reinforcement learning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00114724

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

4

Electronic Resource

Estimation of Time-Varying Parameters in Statistical Models: An Optimization Approach (1999)

Bertsimas, Dimitris ; Gamarnik, David ; Tsitsiklis, John N.

Springer

Machine learning 35 (1999), S. 225-245

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: nonparametric regression ; VC dimension ; convex optimization

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We propose a convex optimization approach to solving the nonparametric regression estimation problem when the underlying regression function is Lipschitz continuous. This approach is based on the minimization of the sum of empirical squared errors, subject to the constraints implied by Lipschitz continuity. The resulting optimization problem has a convex objective function and linear constraints, and as a result, is efficiently solvable. The estimated function computed by this technique, is proven to convergeto the underlying regression function uniformly and almost surely, when the sample size grows to infinity, thus providing a very strong form of consistency. Wealso propose a convex optimization approach to the maximum likelihood estimation of unknown parameters in statistical models, where the parameters depend continuously on some observable input variables. For a number of classical distributional forms, the objective function in the underlying optimization problem is convex and the constraints are linear. These problems are, therefore, also efficiently solvable.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007586831473

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

5

Electronic Resource

Feature-Based Methods for Large Scale Dynamic Programming (1996)

Tsitsiklis, John N. ; Van Roy, Benjamin

Springer

Machine learning 22 (1996), S. 59-94

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Compact representation ; curse of dimensionality ; dynamic programming ; features ; function approximation ; neuro-dynamic programming ; reinforcement learning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1018008221616

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

6

Electronic Resource

On the stability of asynchronous iterative processes (1987)

Tsitsiklis, John N.

Springer

Theory of computing systems 20 (1987), S. 137-153

add to mindlist on the mindlist

Details

ISSN: 1433-0490

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We consider an iterative process in which one out of a finite set of possible operators is applied at each iteration. We obtain necessary and sufficient conditions for convergence to a common fixed point of these operators, when the order at which different operators are applied is left completely free, except for the requirement that each operator is applied infinitely many times. The theory developed is similar in spirit to Lyapunov stability theory. We also derive some very different, qualitatively, results for partially asynchronous iterative processes, that is, for the case where certain constraints are imposed on the order at which the different operators are applied.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF01692062

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

7

Electronic Resource

Lyapunov exponents of pairs of matrices, a correction (1997)

Tsitsiklis, John N. ; Blondel, Vincent D.

Springer

Mathematics of control, signals, and systems 10 (1997), S. 381-381

add to mindlist on the mindlist

Details

ISSN: 1435-568X

Source: Springer Online Journal Archives 1860-2000

Topics: Electrical Engineering, Measurement and Control Technology , Mathematics , Technology

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF01211553

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

8

Electronic Resource

The Lyapunov exponent and joint spectral radius of pairs of matrices are hard—when not impossible—to compute and to approximate (1997)

Tsitsiklis, John N. ; Blondel, Vincent D.

Springer

Mathematics of control, signals, and systems 10 (1997), S. 31-40

add to mindlist on the mindlist

Details

ISSN: 1435-568X

Keywords: Lyapunov exponent ; Lyapunov indicator ; Joint spectral radius ; Generalized spectral radius ; Discrete differential inclusion ; Computational complexity ; NP-hard ; Algorithmic solvability

Source: Springer Online Journal Archives 1860-2000

Topics: Electrical Engineering, Measurement and Control Technology , Mathematics , Technology

Notes: Abstract We analyze the computability and the complexity of various definitions of spectral radii for sets of matrices. We show that the joint and generalized spectral radii of two integer matrices are not approximable in polynomial time, and that two related quantities—the lower spectral radius and the largest Lyapunov exponent—are not algorithmically approximable.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF01219774

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

9

Electronic Resource

Rollout Algorithms for Combinatorial Optimization (1997)

Bertsekas, Dimitri P. ; Tsitsiklis, John N. ; Wu, Cynara

Springer

Journal of heuristics 3 (1997), S. 245-262

add to mindlist on the mindlist

Details

ISSN: 1572-9397

Source: Springer Online Journal Archives 1860-2000

Topics: Mathematics

Notes: Abstract We consider the approximate solution of discrete optimization problems using procedures that are capable of magnifying the effectiveness of any given heuristic algorithm through sequential application. In particular, we embed the problem within a dynamic programming framework, and we introduce several types of rollout algorithms, which are related to notions of policy iteration. We provide conditions guaranteeing that the rollout algorithm improves the performance of the original heuristic algorithm. The method is illustrated in the context of a machine maintenance and repair problem.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1009635226865

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

10

Electronic Resource

Large deviations analysis of the generalized processor sharing policy (1999)

Bertsimas, Dimitris ; Paschalidis, Ioannis Ch. ; Tsitsiklis, John N.

Springer

Queueing systems 32 (1999), S. 319-349

add to mindlist on the mindlist

Details

ISSN: 1572-9443

Keywords: large deviations ; communication networks

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract In this paper we consider a stochastic server (modeling a multiclass communication switch) fed by a set of parallel buffers. The dynamics of the system evolve in discrete-time and the generalized processor sharing (GPS) scheduling policy of [25] is implemented. The arrival process in each buffer is an arbitrary, and possibly autocorrelated, stochastic process. We obtain a large deviations asymptotic for the buffer overflow probability at each buffer. In the standard large deviations methodology, we provide a lower and a matching (up to first degree in the exponent) upper bound on the buffer overflow probabilities. We view the problem of finding a most likely sample path that leads to an overflow as an optimal control problem. Using ideas from convex optimization we analytically solve the control problem to obtain both the asymptotic exponent of the overflow probability and a characterization of most likely modes of overflow. These results have important implications for traffic management of high-speed networks. They extend the deterministic, worst-case analysis of [25] to the case where a detailed statistical model of the input traffic is available and can be used as a basis for an admission control mechanism.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1019151423773

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext