Library

feed icon rss

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    Electronic Resource
    Electronic Resource
    Springer
    Statistics and computing 9 (1999), S. 123-143 
    ISSN: 1573-1375
    Keywords: Data Mining ; noisy function optimization ; classification ; association ; rule induction
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science , Mathematics
    Notes: Abstract Many data analytic questions can be formulated as (noisy) optimization problems. They explicitly or implicitly involve finding simultaneous combinations of values for a set of (“input”) variables that imply unusually large (or small) values of another designated (“output”) variable. Specifically, one seeks a set of subregions of the input variable space within which the value of the output variable is considerably larger (or smaller) than its average value over the entire input domain. In addition it is usually desired that these regions be describable in an interpretable form involving simple statements (“rules”) concerning the input values. This paper presents a procedure directed towards this goal based on the notion of “patient” rule induction. This patient strategy is contrasted with the greedy ones used by most rule induction methods, and semi-greedy ones used by some partitioning tree techniques such as CART. Applications involving scientific and commercial data bases are presented.
    Type of Medium: Electronic Resource
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 2
    Electronic Resource
    Electronic Resource
    Springer
    Data mining and knowledge discovery 1 (1997), S. 55-77 
    ISSN: 1573-756X
    Keywords: classification ; bias ; variance ; curse-of-dimensionality ; bagging ; naive Bayes ; nearest-neighbors
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract The classification problem is considered in which an outputvariable y assumes discrete values with respectiveprobabilities that depend upon the simultaneous values of a set of input variablesx = {x_1,....,x_n}. At issue is how error in the estimates of theseprobabilities affects classification error when the estimates are used ina classification rule. These effects are seen to be somewhat counterintuitive in both their strength and nature. In particular the bias andvariance components of the estimation error combine to influenceclassification in a very different way than with squared error on theprobabilities themselves. Certain types of (very high) bias can becanceled by low variance to produce accurate classification. This candramatically mitigate the effect of the bias associated with some simpleestimators like “naive” Bayes, and the bias induced by thecurse-of-dimensionality on nearest-neighbor procedures. This helps explainwhy such simple methods are often competitive with and sometimes superiorto more sophisticated ones for classification, and why“bagging/aggregating” classifiers can often improveaccuracy. These results also suggest simple modifications to theseprocedures that can (sometimes dramatically) further improve theirclassification performance.
    Type of Medium: Electronic Resource
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 3
    Electronic Resource
    Electronic Resource
    New York, NY : Wiley-Blackwell
    Journal of Chemometrics 3 (1989), S. 463-475 
    ISSN: 0886-9383
    Keywords: Classification ; Discriminant analysis ; Principal components ; Biased estimates ; Cross-validation ; Chemistry ; Analytical Chemistry and Spectroscopy
    Source: Wiley InterScience Backfile Collection 1832-2000
    Topics: Chemistry and Pharmacology
    Notes: Classification and regression techniques are among the most used tools by chemometricians. With classification, the two classic methods are discriminant analysis and SIMCA. In this paper we discuss the connection between these two methods and introduce two new ones of the same family: DASCO (discriminant analysis with shrunken covariances) and RDA (regularized discriminant analysis). We demonstrate on both simulated and real data sets that their performance is superior to the old favorites. This is especially true in small-sample/high-dimension settings typical in chemistry.
    Additional Material: 3 Tab.
    Type of Medium: Electronic Resource
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...