Biotech Data Processing and Analysis

Recent technological advances have led to the generation of vast collections of biotech data. As an example, the European Bioinformatics Institute (EBI) currently stores 20 petabytes of data and back-ups about genes, proteins, and small molecules. This data avalanche has created a strong need for novel mathematical and computational techniques to address challenges related to disease and our environment.

In our research, we explore how multimedia technology can be used to process and analyze biotech data. In that regard, we have two major research objectives. A first objective is to leverage state-of-the-art video compression technology for the compression of genomics data, so to mitigate issues in terms of storage, transportation, and analysis. A second objective is to leverage deep machine learning in the context of several biotech use cases, ranging from genome annotation (e.g., splice site detection in DNA sequences) to tumor detection in medical images. Of particular interest is the application of techniques for natural language understanding to the analysis of genomics data.

Staff

Wesley De Neve, Joni Dambre

Researchers

Tom Paridaens, Jasper Zuallaert, Mijung Kim, Lionel Pigou

Key publications

The proposed solution for DNA compression allows for a flexible trade-off between efficiency, effectiveness, and functionality, setting our solution apart from the state-of-the-art.
The proposed solution for DNA compression allows for a flexible trade-off between efficiency, effectiveness, and functionality, setting our solution apart from the state-of-the-art.

 

A deep neural network has been trained to recognize whether a given image contains a malignant lesion. When this is the case, the deep neural network subsequently localizes and extracts the malignant lesion.
A deep neural network has been trained to recognize whether a given image contains a malignant lesion. When this is the case, the deep neural network subsequently localizes and extracts the malignant lesion.

 

Automatic genome annotation, such as predicting the location of splice sites, translation initiation sites, or secondary structures. By detecting indicative patterns in an exemplary set of DNA sequences, machine learning models can make predictions for new DNA sequences.
Automatic genome annotation, such as predicting the location of splice sites, translation initiation sites, or secondary structures. By detecting indicative patterns in an exemplary set of DNA sequences, machine learning models can make predictions for new DNA sequences.