ZIB

1

Electronic Resource

On-Line Portfolio Selection Using Multiplicative Updates (1998)

Helmbold, David P. ; Schapire, Robert E. ; Singer, Yoram ; [et al.]

Oxford, UK and Boston, USA : Blackwell Publishers Inc

Mathematical finance 8 (1998), S. 0

add to mindlist on the mindlist

Details

ISSN: 1467-9965

Source: Blackwell Publishing Journal Backfiles 1879-2005

Topics: Mathematics , Economics

Notes: We present an on-line investment algorithm that achieves almost the same wealth as the best constant-rebalanced portfolio determined in hindsight from the actual market outcomes. The algorithm employs a multiplicative update rule derived using a framework introduced by Kivinen and Warmuth. Our algorithm is very simple to implement and requires only constant storage and computing time per stock in each trading period. We tested the performance of our algorithm on real stock data from the New York Stock Exchange accumulated during a 22-year period. On these data, our algorithm clearly outperforms the best single stock as well as Cover's universal portfolio selection algorithm. We also present results for the situation in which the investor has access to additional “side information.”

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1111/1467-9965.00058

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

2

Electronic Resource

On-line learning of linear functions (1995)

Littlestone, Nicholas ; Warmuth, Manfred K. ; Long, Philip M.

Springer

Computational complexity 5 (1995), S. 1-23

add to mindlist on the mindlist

Details

ISSN: 1420-8954

Keywords: Machine learning ; computational learning theory ; on-line learning ; linear functions ; worst-case loss bounds ; adaptive filter theory ; 68T05

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We present an algorithm for the on-line learning of linear functions which is optimal to within a constant factor with respect to bounds on the sum of squared errors for a worst case sequence of trials. The bounds are logarithmic in the number of variables. Furthermore, the algorithm is shown to be optimally robust with respect to noise in the data (again to within a constant factor).

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF01277953

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

3

Electronic Resource

Learning Binary Relations Using Weighted Majority Voting (1995)

Goldman, Sally A. ; Warmuth, Manfred K.

Springer

Machine learning 20 (1995), S. 245-271

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: on-line learning ; mistake-bounded learning ; weighted majority voting ; noise tolerance ; binary relation

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract In this paper we demonstrate how weighted majority voting with multiplicative weight updating can be applied to obtain robust algorithms for learning binary relations. We first present an algorithm that obtains a nearly optimal mistake bound but at the expense of using exponential computation to make each prediction. However, the time complexity of our algorithm is significantly reduced from that of previously known algorithms that have comparable mistake bounds. The second algorithm we present is a polynomial time algorithm with a non-optimal mistake bound. Again the mistake bound of our second algorithm is significantly better than previous bounds proven for polynomial time algorithms. A key contribution of our work is that we define a “non-pure” or noisy binary relation and then by exploiting the robustness of weighted majority voting with respect to noise, we show that both of our algorithms can learn non-pure relations. These provide the first algorithms that can learn non-pure binary relations.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1022675327341

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

4

Electronic Resource

Tracking the Best Disjunction (1998)

Auer, Peter ; Warmuth, Manfred K.

Springer

Machine learning 32 (1998), S. 127-150

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: on-line learning ; prediction ; concept drift ; $${WINNOW}$$ ; computational learning theory ; amortized analysis

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract Littlestone developed a simple deterministic on-line learning algorithm for learning k-literal disjunctions. This algorithm (called $${WINNOW}$$ ) keeps one weight for each of then variables and does multiplicative updates to its weights. We develop a randomized version of $${WINNOW} $$ and prove bounds for an adaptation of the algorithm for the case when the disjunction may change over time. In this case a possible target disjunction schedule $${\mathcal{T}} $$ is a sequence of disjunctions (one per trial) and the shift size is the total number of literals that are added/removed from the disjunctions as one progresses through the sequence. We develop an algorithm that predicts nearly as well as the best disjunction schedule for an arbitrary sequence of examples. This algorithm that allows us to track the predictions of the best disjunction is hardly more complex than the original version. However, the amortized analysis needed for obtaining worst-case mistake bounds requires new techniques. In some cases our lower bounds show that the upper bounds of our algorithm have the right constant in front of the leading term in the mistake bound and almost the right constant in front of the second leading term. Computer experiments support our theoretical findings.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007472513967

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

5

Electronic Resource

Tracking the Best Expert (1998)

Herbster, Mark ; Warmuth, Manfred K.

Springer

Machine learning 32 (1998), S. 151-178

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: on-line learning ; amortized analysis ; multiplicative updates ; shifting ; experts

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We generalize the recent relative loss bounds for on-line algorithms where the additional loss of the algorithm on the whole sequence of examples over the loss of the best expert is bounded. The generalization allows the sequence to be partitioned into segments, and the goal is to bound the additional loss of the algorithm over the sum of the losses of the best experts for each segment. This is to model situations in which the examples change and different experts are best for certain segments of the sequence of examples. In the single segment case, the additional loss is proportional to log n, where n is the number of experts and the constant of proportionality depends on the loss function. Our algorithms do not produce the best partition; however the loss bound shows that our predictions are close to those of the best partition. When the number of segments is k+1 and the sequence is of length &ell, we can bound the additional loss of our algorithm over the best partition by $$O\left( {klogn + k\log \left( {{\ell \mathord{\left/ {\vphantom {\ell k}} \right. \kern-\nulldelimiterspace} k}} \right)} \right)$$ . For the case when the loss per trial is bounded by one, we obtain an algorithm whose additional loss over the loss of the best partition is independent of the length of the sequence. The additional loss becomes $$O\left( {klogn + k\log \left( {{\ell \mathord{\left/ {\vphantom {\ell k}} \right. \kern-\nulldelimiterspace} k}} \right)} \right)$$ , where L is the loss of the best partitionwith k+1 segments. Our algorithms for tracking the predictions of the best expert aresimple adaptations of Vovk's original algorithm for the single best expert case. As in the original algorithms, we keep one weight per expert, and spend O(1) time per weight in each trial.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007424614876

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

6

Electronic Resource

Learning nested differences of intersection-closed concept classes (1990)

Helmbold, David ; Sloan, Robert ; Warmuth, Manfred K.

Springer

Machine learning 5 (1990), S. 165-196

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: concept learning ; mistake bounds ; exception handling ; pac learning ; nested difference ; intersection-closed

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract This paper introduces a new framework for constructing learning algorithms. Our methods involve master algorithms which use learning algorithms for intersection-closed concept classes as subroutines. For example, we give a master algorithm capable of learning any concept class whose members can be expressed as nested differences (for example, c 1 − (c 2 − (c 3 − (c 4 − c 5)))) of concepts from an intersection-closed class. We show that our algorithms are optimal or nearly optimal with respect to several different criteria. These criteria include: the number of examples needed to produce a good hypothesis with high confidence, the worst case total number of mistakes made, and the expected number of mistakes made in the firstt trials.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00116036

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

7

Electronic Resource

A Comparison of New and Old Algorithms for a Mixture Estimation Problem (1997)

Helmbold, David P. ; Schapire, Robert E. ; Singer, Yoram ; [et al.]

Springer

Machine learning 27 (1997), S. 97-119

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: mixture models ; maximum likelihood ; exponentiated gradient algorithms ; EM

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We investigate the problem of estimating the proportion vector which maximizes the likelihood of a given sample for a mixture of given densities. We adapt a framework developed for supervised learning and give simple derivations for many of the standard iterative algorithms like gradient projection and EM. In this framework, the distance between the new and old proportion vectors is used as a penalty term. The square distance leads to the gradient projection update, and the relative entropy to a new update which we call the exponentiated gradient update (EGή). Curiously, when a second order Taylor expansion of the relative entropy is used, we arrive at an update EMή which, for ή=1, gives the usual EM update. Experimentally, both the EMή-update and the EGή-update for ή 〉 1 outperform the EM algorithm and its variants. We also prove a polynomial bound on the rate of convergence of the EGή algorithm.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007301011561

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

8

Electronic Resource

On the Computational Complexity of Approximating Distributions by Probabilistic Automata (1992)

Abe, Naoki ; Warmuth, Manfred K.

Springer

Machine learning 9 (1992), S. 205-260

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Hidden Markov models ; PAC learning model ; density estimation ; Kullback-Leibler divergence ; computational learning theory

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We introduce a rigorous performance criterion for training algorithms for probabilistic automata (PAs) and hidden Markov models (HMMs), used extensively for speech recognition, and analyze the complexity of the training problem as a computational problem. The PA training problem is the problem of approximating an arbitrary, unknown source distribution by distributions generated by a PA. We investigate the following question about this important, well-studied problem: Does there exist an efficient training algorithm such that the trained PAs provably converge to a model close to an optimum one with high confidence, after only a feasibly small set of training data? We model this problem in the framework of computational learning theory and analyze the sample as well as computational complexity. We show that the number of examples required for training PAs is moderate—except for some log factors the number of examples is linear in the number of transition probabilities to be trained and a low-degree polynomial in the example length and parameters quantifying the accuracy and confidence. Computationally, however, training PAs is quite demanding: Fixed state size PAs are trainable in time polynomial in the accuracy and confidence parameters and example length, but not in the alphabet size unless RP = NP. The latter result is shown via a strong non-approximability result for the single string maximum likelihood model probem for 2-state PAs, which is of independent interest.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1022693116573

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

9

Electronic Resource

On the computational complexity of approximating distributions by probabilistic automata (1992)

Abe, Naoki ; Warmuth, Manfred K.

Springer

Machine learning 9 (1992), S. 205-260

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Hidden Markov models ; PAC learning model ; density estimation ; Kullback-Leibler divergence ; computational learning theory

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We introduce a rigorous performance criterion for training algorithms for probabilistic automata (PAs) and hidden Markov models (HMMs), used extensively for speech recognition, and analyze the complexity of the training problem as a computational problem. The PA training problem is the problem of approximating an arbitrary, unknown source distribution by distributions generated by a PA. We investigate the following question about this important, well-studied problem: Does there exist anefficient training algorithm such that the trained PAsprovably converge to a model close to an optimum one with high confidence, after only a feasibly small set of training data? We model this problem in the framework of computational learning theory and analyze the sample as well as computational complexity. We show that the number of examples required for training PAs is moderate—except for some log factors the number of examples is linear in the number of transition probabilities to be trained and a low-degree polynomial in the example length and parameters quantifying the accuracy and confidence. Computationally, however, training PAs is quite demanding: Fixed state size PAs are trainable in time polynomial in the accuracy and confidence parameters and example length, butnot in the alphabet size unlessRP=NP. The latter result is shown via a strong non-approximability result for the single string maximum likelihood model probem for 2-state PAs, which is of independent interest.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00992677

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

10

Electronic Resource

On-line prediction and conversion strategies (1996)

Cesa-Bianchi, Nicolò ; Freund, Yoav ; Helmbold, David P. ; [et al.]

Springer

Machine learning 25 (1996), S. 71-110

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: On-line learning ; conversion strategies ; noise robustness ; binomial weights ; exponential weights ; weighted majority algorithm ; expert advice ; mistake bounds ; Ulam's game

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We study the problem of deterministically predicting boolean values by combining the boolean predictions of several experts. Previous on-line algorithms for this problem predict with the weighted majority of the experts' predictions. These algorithms give each expert an exponential weight β m where β is a constant in [0, 1) andm is the number of mistakes made by the expert in the past. We show that it is better to use sums of binomials as weights. In particular, we present a deterministic algorithm using binomial weights that has a better worst case mistake bound than the best deterministic algorithm using exponential weights. The binomial weights naturally arise from a version space argument. We also show how both exponential and binomial weighting schemes can be used to make prediction algorithms robust against noise.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00115301

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext