ZIB

Hits per page

hits 1 - 2 | 2 hits

Sorting

Electronic Resource

Scaling up inductive learning with massive parallelism (1996)

Provost, Foster John ; Aronis, John M.

Springer

Machine learning 23 (1996), S. 33-46

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: inductive learning ; parallelism ; small disjuncts

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract Machine learning programs need to scale up to very large data sets for several reasons, including increasing accuracy and discovering infrequent special cases. Current inductive learners perform well with hundreds or thousands of training examples, but in some cases, up to a million or more examples may by necessary to learn important special cases with confidence. These tasks are infeasible for current learning programs running on sequential machines. We discuss the need for very large data sets and prior efforts to scale up machine learning methods. This discussion motivates a strategy that exploits the inherent parallelism present in many learning algorithms. We describe a parallel implementation of one inductive learning program on the CM-2 Connection Machine, show that it scales up to millions of examples, and show that it uncovers special-case rules that sequential learning programs, running on smaller datasets, would miss. The parallel version of the learning program is preferable to the sequential version for example sets larger than about 10K examples. When learning from a public-health database consisting of 3.5 million examples, the parallel rule-learning system uncovered a surprising relationship that has led to considerable follow-up research.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00116898

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

Electronic Resource

Scaling Up Inductive Learning with Massive Parallelism (1996)

Provost, Foster John ; Aronis, John M.

Springer

Machine learning 23 (1996), S. 33-46

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: inductive learning ; parallelism ; small disjuncts

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract Machine learning programs need to scale up to very large data sets for several reasons, including increasing accuracy and discovering infrequent special cases. Current inductive learners perform well with hundreds or thousands of training examples, but in some cases, up to a million or more examples may be necessary to learn important special cases with confidence. These tasks are infeasible for current learning programs running on sequential machines. We discuss the need for very large data sets and prior efforts to scale up machine learning methods. This discussion motivates a strategy that exploits the inherent parallelism present in many learning algorithms. We describe a parallel implementation of one inductive learning program on the CM-2 Connection Machine, show that it scales up to millions of examples, and show that it uncovers special-case rules that sequential learning programs, running on smaller datasets, would miss. The parallel version of the learning program is preferable to the sequential version for example sets larger than about 10K examples. When learning from a public-health database consisting of 3.5 million examples, the parallel rule-learning system uncovered a surprising relationship that has led to considerable follow-up research.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1018086232231

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

hits 1 - 2 | 2 hits