
[ Bristol CS | Index | ML group | Peter Flach | Papers | Presentations ]
Program for the discovery of multivalued dependencies from relations
I.Savnik,
P.A.Flach
Introduction
Dependencies between attributes of a database relation express the
presence of structure in that relation. In particular, the existence
of a multivalued dependency X->>Y in a relation r(R), where X and Y
are non-overlapping subsets of R, denotes that for each possible value
of attributes X, there exist no associations between the values of
attributes from Y and Z=R-X-Y. As a consequence, the relation r(R) can
be decomposed into relations r1(XY)=Project[XY](r) and
r2(XZ)=Project[XZ](r) without loss of the information. The
decomposition of r into r1 and r2 makes explicit the internal
structure of relation r. Furthermore, the new representation requires
less storage space than the complete relation r.
Algorithms
The discovery of multivalued dependencies from relations is viewed as
a search in a hypothesis space defined in accordance with the
generalisation relationship among the multivalued dependencies. Two
algorithms for the discovery of multivalued dependencies from
relations are defined. The top-down algorithm enumerates the
hypotheses from the most general to more specific hypotheses which are
checked on the input relation. The bottom-up algorithm first computes
the set of invalid multivalued dependencies. Starting with the most
general dependencies, the bottom-up algorithm iteratively refines
the set of dependencies to conform with each particular invalid
dependency.
Applications
Traditionally, database dependencies were considered to be part of the
data model provided by the database designer.
One reason for the discovery of multivalued dependencies from
extensional data can be that the data model, or parts of it, has
been lost or is no longer accurate, so that reverse engineering is
required. Another reason may be that certain dependencies were not
foreseen by the database designer, but do occur in practice.
The discovered dependencies may be utilised for restructuring
the database but also for query optimisation.
Furthermore, the discovery of multivalued dependencies (as well as
functional dependencies) can assist in the process of discovery of
other types of interesting patterns from relational databases by
revealing the internal structure of datasets. The discovered multivalued
dependencies can be utilised for planning/guiding the overall process
of knowledge discovery, and for restructuring the relations.
Program MDEP
The program MDEP includes implementations of the top-down and
bottom-up algorithms. The program is implemented in Sicstus
Prolog. The MDEP distribution includes the
source code and a short description
of the program. If you have comments or suggestions related to MDEP,
you can send an email to
savnik@informatik.uni-freiburg.de.
Publications
- Discovery of multivalued dependencies from relations.
I.Savnik, P.A.Flach.
Intelligent Data Analysis Journal, Vol.4, No.3, 4(2000),
pp.195-211.
Also: Technical Report 135, Institut für Informatik,
Universität Freiburg, Mar 2000.
[Abstract]
[Postscript]
- Database dependency discovery: a machine learning approach.
P.A.Flach, I.Savnik.
AI Communications, Vol.12, No.3, IOS Press, 1999, pp.139-160.
[
Postscript]
Related work
P A Flach,
Peter.Flach@bristol.ac.uk. Last modified on Tuesday 17 July 2001 at 18:20. © 2001 University of Bristol