Text to Knowledge

Recent estimates suggest that up to 90% of data on the Web and in enterprises is unstructured, e.g., as natural language text. Information extraction (IE) systems discover structured information from such text (e.g., convert news articles into database entries listing extracted named entities, relations, dates, etc.), since structured information enables much richer querying and data mining, e.g., using semantic reasoning. Beyond understanding the human produced text, we also are interested in assisting people to find (textual) information they are looking for, or could be interested in, including predicting what they will (want to) read.

The overarching objective in our machine reading research group is to build systems that make sense of human produced text.

Some of the topics we recently have been working on in this area include the following:

  • Relation extraction for knowledge base population
  • Keyphrase extraction
  • Text similarity and categorization
  • Prediction of news adoption over social media

For these research topics, we have been using classical machine learning methods, as well as various methods in the field of representation learning, and are currently working towards methods to integrate external (structured) knowledge into neural (unstructured) text processing methods.

In the recent past, we have also been working on information retrieval tasks, in particular federated web search, resource selection for IE, knowledge extraction from social media and user disagreement modeling.

In this research and related projects, we regularly make use of more generic software tools produced by other teams within IDLab, e.g., LimeDS, Tengu.


Chris Develder, Bart Dhoedt, Thomas Demeester, Erik Mannens, Azarakhsh Jalavand


Lucas Sterckx, Giannis Bekoulis, Cedric De Boom, Johannes Deleu, Laurent Mertens, Gerald Haesendonck, Ben De Meester, Martin Van Brabant


Key publications