Bioinformatics

Bioinformatics is an interdisciplinary field that develops and improves upon methods for storing, retrieving, organizing and analyzing biological data.

The recent revolution in technology enables the production of molecular data at unprecedented rates. All this raw data needs to be stored, moved and processed, which is a non-trivial task, given the vast volumes of data. The development of software implementations that can extract useful biological knowledge from these data is a challenge, given that most problems are underdetermined and thus require combining the proper algorithms with the right biological assumptions.

Bioinformatics combines biological knowledge with many areas of computer science, mathematics and engineering to process and biological data. We combine complex machines and the use of parallelization and distributed computing to read in and process biological data at a much faster rate than before. Relying on the latest developments in algorithm design, machine learning, artificial intelligence, soft computing, data mining, image processing, and simulation, we develop dedicated solutions for problems in the biomedical, agrotech and microbial sector. We specifically focus on developing high performance solutions for next-generation sequence analysis and for the integrative analysis of genotype-phenotype data.

Staff

Kathleen Marchal, Jan Fostier, Tom Dhaene, Mario Pickavet, Tijl De Bie, Pieter Audenaert, Lieven Verbeke, Dries De Maeyer, Paolo Simeone, Jefrey Lijffijt

Researchers

Camilo Andres, Dieter De Witte, Dries Decap, Mahdi Heydari, Ahmad Mel, Ine Melckenbeek, Giles Miclotte, Mushthofa Mushthofa, Joeri Ruyssinck, Sofie Van Gassen, Robin Vandaele, Bram Weytjens.

Projects

  • ICON GAP: Genome Analytics Platform
  • IWT-SBO NEMOA: Network-based approaches for the identification and mode of action determination of anti-bacterial agents
  • FWO project: De novo genome assembly using both second and third generation sequencing data
  • FWO project: Network-based analysis of genotype-expression phenotype data
  • FWO project: Genetic adaptation of bacterial catabolic pathways for pesticide biodegradation: role of historical contingency
  • Odysseus Grant “Exploring Data: Theoretical Foundations and Applications to Web, Multimedia, and Omics Data”.

Key publications

Halvade: A parallel Hadoop MapReduce implementation of a read mapping and variant calling pipeline. For a whole genome sequencing dataset (NA12878, human, 50x coverage, 1.5 billion paired-end reads), Halvade reduces the runtime from 12 days (sequential) to under two hours (15 nodes with 24 CPU cores each), corresponding to a ~200 fold speedup.
Halvade: A parallel Hadoop MapReduce implementation of a read mapping and variant calling pipeline. For a whole genome sequencing dataset (NA12878, human, 50x coverage, 1.5 billion paired-end reads), Halvade reduces the runtime from 12 days (sequential) to under two hours (15 nodes with 24 CPU cores each), corresponding to a ~200 fold speedup.

 

Network motifs in integrated molecular networks represent functional relationships between distinct data types. They aggregate to form dense topological structures corresponding to functional modules which cannot be detected by traditional graph clustering algorithms. The picture shows the graphical user-interface of CyClus3D, a Cytoscape plugin we developed for clustering composite three-node network motifs using a 3D spectral clustering algorithm.
Network motifs in integrated molecular networks represent functional relationships between distinct data types. They aggregate to form dense topological structures corresponding to functional modules which cannot be detected by traditional graph clustering algorithms. The picture shows the graphical user-interface of CyClus3D, a Cytoscape plugin we developed for clustering composite three-node network motifs using a 3D spectral clustering algorithm.

 

Molecular interaction networks are a comprehensive and intuitive way of representing known molecular interactions on an organism of interest. As they are inferred from noisy information they rarely represent biological truth, but they are useful as a scaffold for the interpretation and integration of dedicated datasets. We have successfully applied integrative network-based models in the context of cancer driver identification and subtyping, to study the molecular forces that drive bacterial community behavior and for eQTL analysis.
Molecular interaction networks are a comprehensive and intuitive way of representing known molecular interactions on an organism of interest. As they are inferred from noisy information they rarely represent biological truth, but they are useful as a scaffold for the interpretation and integration of dedicated datasets. We have successfully applied integrative network-based models in the context of cancer driver identification and subtyping, to study the molecular forces that drive bacterial community behavior and for eQTL analysis.

Different types of combinatorial regulation identified by our DISTILLER algorithm.
Different types of combinatorial regulation identified by our DISTILLER algorithm.