ISSN:
1573-7675
Keywords:
data mining
;
knowledge discovery
;
machine learning
;
knowledge representation
;
attribute-oriented generalization
;
domain generalization graphs
Source:
Springer Online Journal Archives 1860-2000
Topics:
Computer Science
Notes:
Abstract Attribute-oriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to user-defined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of attributes and show how they are constructed. We then present serial and parallel versions of the Multi-Attribute Generalization algorithm for traversing the generalization state space described by joining the domain generalization graphs for multiple attributes. Based upon a generate-and-test approach, the algorithm generates all possible summaries consistent with the domain generalization graphs. Our experimental results show that significant speedups are possible by partitioning path combinations from the DGGs across multiple processors. We also rank the interestingness of the resulting summaries using measures based upon variance and relative entropy. Our experimental results also show that these measures provide an effective basis for analyzing summary data generated from relational databases. Variance appears more useful because it tends to rank the less complex summaries (i.e., those with few attributes and/or tuples) as more interesting.
Type of Medium:
Electronic Resource
URL:
http://dx.doi.org/10.1023/A:1008769516670
Permalink