Mining Data with Probabilistic Graphical Models: New Algorithms and Applications

Probabilistic graphical models (PGMs) are a competitive tool that allows discovering useful knowledge from data, and its posterior exploitation (by means of inference). Although the field of PGMs exhibits nowadays a high degree of maturity, more research is necessary to extend its applicability as a data mining tool to more complex problems in existing real-world applications, or solve new ones. In this project we propose a joint effort to advance to this research line, by means of a coordinated project formed by four groups that have previously demonstrated their research experience by making substantial contributions to the state-of-the-art on PGMs, and with a high degree of interconnection acquired by previous coordinated projects and collaborations.

The main objective of this project is to advance in different topics related to PGMs, so that we will obtain better results than previous approaches, both because we enlarge the class of in-practice solvable problems, or because we face new/recent challenges. The core package of the algorithms to be developed has a direct application in several stages of the Knowledge Discovering from Databases (KDD) cycle, as preprocessing, (supervised and unsupervised) data mining and knowledge exploitation (inference). Also, during the development of these new algorithms we must bear in mind a set of challenging real applications we have selected to be included (and that serve as testbeds) in this project. Concrete goals are:

  • Joint probability distribution function learning. In this part we cover different aspects as the definition of new scores and the design of structural algorithms for learning PGMs (not only Bayesian networks) models. We propose to improve some existing algorithms/approaches but also to use new ideas as learning based on self-similarity, regularization, Bayesian estimates, multicriteria, interaction, with complex data (noisy, missing, high-dimensional) and models (hybrid, with discrete and continuous variables, and chain graphs).
  • Supervised classification, i.e. prediction of the class label for a given object by using (some of) its attributes. Here we plan to gain some insight on the current domain of competence of Bayesian networks (BN) classifiers, to advance in the design of known BN classifiers (AODE, credal classifiers, dealing with high-dimensionality and imbalanced classes) and to face new challenges as multidimensional classification, classification by regression, data streams, and utility-based classifiers learning.
  • Inference. Though learning is a challenging activity by itself, it is better to think that the acquired knowledge will be later used. Reasoning over a PGM is done by means of inference algorithms. We here propose to work mainly in developing approximate algorithms for mixtures of truncated exponentials (MTE) hybrid networks, credal networks, probabilistic decision graphs and precise and imprecise influence diagrams.
  • Applications. The use of PGMs in a heterogeneous set of real world applications like technological applications (evolutionary computation, mobile robotics, requirements tracing and classification), life sciences (biomedicine, agriculture, environment, genomics) and social domains (bibliometry, prediction of arrival times of city buses, detection of credit card frauds).
Proyecto MD - PGMs -  Webmaster -