Library

feed icon rss

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 16 (1994), S. 185-202 
    ISSN: 0885-6125
    Keywords: Reinforcement learning ; Q-learning ; dynamic programming ; stochastic approximation
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.
    Type of Medium: Electronic Resource
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 2
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 16 (1994), S. 185-202 
    ISSN: 0885-6125
    Keywords: Reinforcement learning ; Q-learning ; dynamic programming ; stochastic approximation
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.
    Type of Medium: Electronic Resource
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 3
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 22 (1996), S. 59-94 
    ISSN: 0885-6125
    Keywords: Compact representation ; curse of dimensionality ; dynamic programming ; features ; function approximation ; neuro-dynamic programming ; reinforcement learning
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.
    Type of Medium: Electronic Resource
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 4
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 35 (1999), S. 225-245 
    ISSN: 0885-6125
    Keywords: nonparametric regression ; VC dimension ; convex optimization
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract We propose a convex optimization approach to solving the nonparametric regression estimation problem when the underlying regression function is Lipschitz continuous. This approach is based on the minimization of the sum of empirical squared errors, subject to the constraints implied by Lipschitz continuity. The resulting optimization problem has a convex objective function and linear constraints, and as a result, is efficiently solvable. The estimated function computed by this technique, is proven to convergeto the underlying regression function uniformly and almost surely, when the sample size grows to infinity, thus providing a very strong form of consistency. Wealso propose a convex optimization approach to the maximum likelihood estimation of unknown parameters in statistical models, where the parameters depend continuously on some observable input variables. For a number of classical distributional forms, the objective function in the underlying optimization problem is convex and the constraints are linear. These problems are, therefore, also efficiently solvable.
    Type of Medium: Electronic Resource
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 5
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 22 (1996), S. 59-94 
    ISSN: 0885-6125
    Keywords: Compact representation ; curse of dimensionality ; dynamic programming ; features ; function approximation ; neuro-dynamic programming ; reinforcement learning
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.
    Type of Medium: Electronic Resource
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 6
    Electronic Resource
    Electronic Resource
    Springer
    Theory of computing systems 20 (1987), S. 137-153 
    ISSN: 1433-0490
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract We consider an iterative process in which one out of a finite set of possible operators is applied at each iteration. We obtain necessary and sufficient conditions for convergence to a common fixed point of these operators, when the order at which different operators are applied is left completely free, except for the requirement that each operator is applied infinitely many times. The theory developed is similar in spirit to Lyapunov stability theory. We also derive some very different, qualitatively, results for partially asynchronous iterative processes, that is, for the case where certain constraints are imposed on the order at which the different operators are applied.
    Type of Medium: Electronic Resource
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 7
    Electronic Resource
    Electronic Resource
    Springer
    Mathematics of control, signals, and systems 10 (1997), S. 381-381 
    ISSN: 1435-568X
    Source: Springer Online Journal Archives 1860-2000
    Topics: Electrical Engineering, Measurement and Control Technology , Mathematics , Technology
    Type of Medium: Electronic Resource
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 8
    Electronic Resource
    Electronic Resource
    Springer
    Mathematics of control, signals, and systems 10 (1997), S. 31-40 
    ISSN: 1435-568X
    Keywords: Lyapunov exponent ; Lyapunov indicator ; Joint spectral radius ; Generalized spectral radius ; Discrete differential inclusion ; Computational complexity ; NP-hard ; Algorithmic solvability
    Source: Springer Online Journal Archives 1860-2000
    Topics: Electrical Engineering, Measurement and Control Technology , Mathematics , Technology
    Notes: Abstract We analyze the computability and the complexity of various definitions of spectral radii for sets of matrices. We show that the joint and generalized spectral radii of two integer matrices are not approximable in polynomial time, and that two related quantities—the lower spectral radius and the largest Lyapunov exponent—are not algorithmically approximable.
    Type of Medium: Electronic Resource
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 9
    Electronic Resource
    Electronic Resource
    Springer
    Journal of heuristics 3 (1997), S. 245-262 
    ISSN: 1572-9397
    Source: Springer Online Journal Archives 1860-2000
    Topics: Mathematics
    Notes: Abstract We consider the approximate solution of discrete optimization problems using procedures that are capable of magnifying the effectiveness of any given heuristic algorithm through sequential application. In particular, we embed the problem within a dynamic programming framework, and we introduce several types of rollout algorithms, which are related to notions of policy iteration. We provide conditions guaranteeing that the rollout algorithm improves the performance of the original heuristic algorithm. The method is illustrated in the context of a machine maintenance and repair problem.
    Type of Medium: Electronic Resource
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 10
    Electronic Resource
    Electronic Resource
    Springer
    Queueing systems 32 (1999), S. 319-349 
    ISSN: 1572-9443
    Keywords: large deviations ; communication networks
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract In this paper we consider a stochastic server (modeling a multiclass communication switch) fed by a set of parallel buffers. The dynamics of the system evolve in discrete-time and the generalized processor sharing (GPS) scheduling policy of [25] is implemented. The arrival process in each buffer is an arbitrary, and possibly autocorrelated, stochastic process. We obtain a large deviations asymptotic for the buffer overflow probability at each buffer. In the standard large deviations methodology, we provide a lower and a matching (up to first degree in the exponent) upper bound on the buffer overflow probabilities. We view the problem of finding a most likely sample path that leads to an overflow as an optimal control problem. Using ideas from convex optimization we analytically solve the control problem to obtain both the asymptotic exponent of the overflow probability and a characterization of most likely modes of overflow. These results have important implications for traffic management of high-speed networks. They extend the deterministic, worst-case analysis of [25] to the case where a detailed statistical model of the input traffic is available and can be used as a basis for an admission control mechanism.
    Type of Medium: Electronic Resource
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...