Module 8: Data Mining

Dates - Venue - Description - Target audience - Exam - IMPORTANT: Incorporation in DTP and reimbursement by DS
Course prerequisites - Teachers - Course material - Fees - Enrol


Please note that there is a change in dates:

Six evenings in May/June 2019: Monday May 6 and Thursday May 9, Monday May 20 and Thursday May 23, Monday June 3 and Thursday June 6, 2019 from 5.30 pm to 9 pm.
Please note: the deadline for UGent PhD students who want a refund to open a dossier on the DS website (Application for Recognition) is April 5, 2019.


Faculty of Science, Site Sterre, Krijgslaan 281, building S9, Ghent, pc room 3.1, 3rd floor.


Many modern digital applications increasingly rely on machine learning as a means to derive predictive strength from high-dimensional data sets. Compared to traditional statistics, the absence of a focus on scientific hypotheses, and the need for easily leveraging detailed signals in the data require a different set of models, tools, and analytical reflexes.

This course aims to bring participants to the level where they can independently tackle the analytical part of data mining projects. This means that the most common types of projects will be addressed - regression-type with continuous outcomes, classification with categorical outcomes, and clustering. For each of these, the practical use of a set of standard methods will be shown, like Random Forests, Gradient Boosting Machines, Support Vector Machines, k-Nearest-Neighbors, K-means,... Furthermore, throughout the course, concepts will be highlighted that are of concern in every statistical learning applications, like the curse of dimensionality, model capacity, overfitting and regularization, and practical strategies will be offered to deal with them, introducing techniques such as the Lasso and ridge regression, cross-validation, bagging and boosting. Instructions will also be given on a selection of specific techniques that are often of interest, such as modern visualization of high-dimensional data, model calibration, outlier detection using isolation forests, explanation of black-box models,... Finally, the last lecture will introduce the idea of deep learning as a powerful tool for data analysis, discussing when and how to practically use it, and when to shy away from it.

    Target audience

    This course targets professionals and investigators from all areas that are involved in predictive modeling based on large and/or high-dimensional databases.


    Participants can, if they wish, take part in an exam. Upon succeeding in this test a certificate from Ghent University will be issued. The exam will consist in completing an individual take-home project.
    Please note: For UGent PhD students it is no longer necessary to participate/succeed in this exam to be able to incorporate the course in the DTP.

    Incorporation in DTP and reimbursement from DS for UGent PhD students

    To get a reimbursement of the registration fee from your Doctoral School (DS) you need to follow strict rules: please take the necessary action in time. The deadline to open a dossier on the DS website (Application for Recognition) for this course is April 5, 2019. Please note that opening a dossier does not mean that you are enrolled. You still need to enrol via the registration form on this site.

    Course prerequisites

    Participants are expected to be familiar with basic statistical modeling (as for instance taught in Module 2 of this program), and to have a had a first experience programming in Python (as for instance taught in Module 3 of this program).


    As a Senior Data Scientist at the KBC Group Big Data center, dr. Bart Van Rompaye heads a group of data scientists applying modern data analytical approaches to investment-related problems. He obtained his PhD at Ghent University on issues in survival analysis, and held postdoctoral positions at Ghent University and Umea University, Sweden. In the past, he has taught numerous courses for the Master in Statistical Data Analysis, the Institute for Continuing Education in Science, and FLAMES, the Flanders Training Network for Methodology and Statistics.

    Course material

    Copies of the slides and Python code notebooks.


    A different price applies, depending on your main type of employment.

    EmploymentModule 8Exam
    Industry/Private sector1 900 30
    Non-profit, government, university outside AUGent2 405 30
    (Doctoral)student outside AUGent2 315 30

    1 If three or more employees from the same company enrol simultaneously for this course a reduction of 10% on the module price is taken into account.

    2 AUGent staff and AUGent doctoral students who pay through use of an SAP internal order/invoice can participate at these special prices.

    Enrol for this course