Tijl De Bie - FORSIED

Formalizing Subjective Interestingness in Exploratory Data mining – FORSIED

The principle investigator:

ERC CoG Tijl De BieTijl De Bie is currently Full Professor at the University of Ghent. Before moving to Ghent, he was a Reader at the University of Bristol, where he was appointed Lecturer (Assistant Professor) in 2007. Before that, he was a postdoctoral researcher at the KU Leuven and the University of Southampton. He completed his PhD on machine learning and advanced optimization techniques in 2005 at the KU Leuven. During his PhD he also spent a combined total of 1 year as a visiting research scholar in UC Berkeley and UC Davis.

He is currently most actively interested in the formalization of subjective interestingness in exploratory data mining, and in related issues such as privacy preservation in the data mining process. Other current interests include the use of machine learning and data mining for music informatics as well as for web and social media mining. He currently holds an ERC Consolidator Grant titled "Formalizing Subjective Interestingness in Exploratory Data Mining" (FORSIED), as well as an FWO Odysseus Group I grant titled "Exploring Data: Theoretical Foundations and Applications to Web, multimedia, and Omics Data".

 

Contact:

Publications: biblio.ugent.be/person/802002011673

 

The project:

The rate at which research labs, enterprises and governments accumulate data is high and fast increasing. Often, these data are collected for no specific purpose, or they turn out to be useful for unanticipated purposes: Companies constantly look for new ways to monetize their customer databases; Governments mine various databases to detect tax fraud; Security agencies mine and cross-associate numerous heterogeneous information streams from publicly accessible and classified databases to understand and detect security threats. The objective in such Exploratory Data Mining (EDM) tasks is typically ill-defined, i.e. it is unclear how to formalize how interesting a pattern extracted from the data is. As a result, EDM is often a slow process of trial and error.

In this project we are developing the mathematical principles of what makes a pattern interesting in a very subjective sense. Crucial in this endeavour is research into automatic mechanisms to model and duly consider the prior beliefs and expectations of the user for whom the EDM patterns are intended, thus relieving the users of the complex task to attempt to formalize themselves what makes a pattern interesting to them.

The results of this project may radically change the way in which EDM research is done. Currently, researchers typically imagine a specific purpose for the patterns, try to formalize interestingness of such patterns given that purpose, and design an algorithm to mine them. However, given the variety of users, this strategy has led to a multitude of algorithms. As a result, users need to be data mining experts to understand which algorithm applies to their situation. To resolve this, we are developing a theoretically solid framework for the design of EDM systems that model the user's beliefs and expectations as much as the data itself, so as to maximize the amount of useful information transmitted to the user. This will ultimately bring the power of EDM within reach of the non-expert.