Mining Data with Probabilistic Graphical Models: New Algorithms and Applications
Probabilistic graphical models (PGMs) are a competitive tool
that allows discovering useful knowledge from data, and its
posterior exploitation (by means of inference). Although the
field of PGMs exhibits nowadays a high degree of maturity,
more research is necessary to extend its applicability as
a data mining tool to more complex problems in existing
real-world applications, or solve new ones. In this project
we propose a joint effort to advance to this research line,
by means of a coordinated project formed by four groups
that have previously demonstrated their research experience
by making substantial contributions to the state-of-the-art
on PGMs, and with a high degree of interconnection acquired
by previous coordinated projects and collaborations.
The main objective of this project is to advance in different
topics related to PGMs, so that we will obtain better results
than previous approaches, both because we enlarge the class
of in-practice solvable problems, or because we face new/recent
challenges. The core package of the algorithms to be developed
has a direct application in several stages of the Knowledge
Discovering from Databases (KDD) cycle, as preprocessing,
(supervised and unsupervised) data mining and knowledge
exploitation (inference). Also, during the development of
these new algorithms we must bear in mind a set of challenging
real applications we have selected to be included (and that
serve as testbeds) in this project. Concrete goals are:
-
Joint probability distribution function learning. In this part we
cover different aspects as the definition of new scores and the
design of structural algorithms for learning PGMs (not only Bayesian
networks) models. We propose to improve some existing
algorithms/approaches but also to use new ideas as learning
based on self-similarity, regularization, Bayesian estimates,
multicriteria, interaction, with complex data (noisy, missing,
high-dimensional) and models (hybrid, with discrete and
continuous variables, and chain graphs).
-
Supervised classification, i.e. prediction of the class label
for a given object by using (some of) its attributes. Here we
plan to gain some insight on the current domain of competence
of Bayesian networks (BN) classifiers, to advance in the design
of known BN classifiers (AODE, credal classifiers, dealing with
high-dimensionality and imbalanced classes) and to face new
challenges as multidimensional classification, classification
by regression, data streams, and utility-based classifiers
learning.
-
Inference. Though learning is a challenging activity by itself,
it is better to think that the acquired knowledge will be later
used. Reasoning over a PGM is done by means of inference algorithms.
We here propose to work mainly in developing approximate algorithms
for mixtures of truncated exponentials (MTE) hybrid networks,
credal networks, probabilistic decision graphs and precise and
imprecise influence diagrams.
-
Applications. The use of PGMs in a heterogeneous set of real
world applications like technological applications (evolutionary
computation, mobile robotics, requirements tracing and classification),
life sciences (biomedicine, agriculture, environment, genomics)
and social domains (bibliometry, prediction of arrival times of
city buses, detection of credit card frauds).
|