Data manipulation, analysis and visualisation in Python

Cluster

Research & Valorization

Target group

This course is intended for researchers that have at least basic programming skills in Python. A basic (scientific) programming course that is part of the regular curriculum of bioscience engineering / engineering / sciences should suffice. For those who have experience in another programming language (e.g. Matlab, R, ...), following an online Python tutorial prior to the course should also suffice.

Organizers

Prof. dr. Peter Dawyndt (Department of Applied Mathematics, Computer Science and Statistics)
Prof. dr. Michiel Stock (Department of data analysis and mathematical modelling)

Aim & scope

This course is intended for researchers that have at least basic programming skills in Python. It targets researchers that want to enhance their general data manipulation and analysis skills in Python.
The course does not aim to provide a course in statistics or machine learning. It aims to provide researchers the means to effectively tackle commonly encountered data handling tasks in order to increase the overall efficiency of the research.

Content

The handling of data is a recurring task for most scientists. Reading in experimental data, checking its properties and creating visualisations may become tedious tasks. Hence, increasing the efficiency in this process is beneficial for many scientists. Spreadsheet based software lacks the ability to properly support this process, due to the lack of automation and repeatability. The usage of a high-level scripting language such as Python is ideal for these tasks.
This course trains students to use Python effectively to do these tasks. The course focuses on data manipulation and cleaning, explorative analysis and visualisation using some important packages such as Pandas, Seaborn and Matplotlib

Lecturers

Joris Van den Bossche is a core contributor to Pandas, the main data analysis library in Python and has given several tutorials on this topic at international conferences (Scipy, EuroScipy, PyData Paris). He did a PhD at Ghent University (Faculty of Bio-science Engineering, Department of Mathematical Modelling, Statistics and Bio-informatics) and VITO in air quality research, worked at the Paris-Saclay Center for Data Science, and, currently is a freelance software developer and teacher.
Stijn Van Hoey is currently working as Research Software Engineer at Fluves, an engineering company operating in water and energy markets. Before that, he was Research Software Engineer and Open Data Publisher at INBO, supporting and automating the cleaning and publishing of data. Formerly, he did a PhD at Ghent University (Faculty of Bio-science Engineering, Department of Mathematical Modelling, Statistics and Bio-informatics) and VITO.

Programme

The course is scheduled as a three day course with the following program:

  • Day 1. Basic concepts and introduction to Pandas
    Some essential concepts of the Python computing language are rehearsed. Setting up the programming environment with the required packages using the conda package manager and an introduction of the Jupyter notebook environment are covered. The day closes with an introduction to the data analysis package Pandas.
  • Day 2. Advanced Pandas
    The usage of Pandas for different data cleaning and manipulation tasks is taught and the acquired skills will immediately be brought into practice to handle real-world data sets. Besides, the automation of these tasks as a reusable function or module will be tackled. Applications include time series handling, categorical data, merging data, ...
  • Day 3. Combining Pandas and Matplotlib
    The process of data cleaning is combined with the creation of plots for data exploration. Essential functionalities of Matplotlib and Seaborn are taught and directly applied.

Dates

Date Time Venue Trainer
29 + 31 January and 2 February 2024 9:00 - 17:00 

Leslokaal 1.4 (Campus Ledeganck)

Joris Van den Bossche and Stijn Van Hoey

Registration

You can find the registration and waiting list here.

Registration fee

Free of charge for members of the Doctoral School. The no show policy applies.

Course material

All the course material will be available on github. The course consists of hands-on sessions, making use of Jupyter notebooks.

Participants need to bring their own laptop with wifi access.

The materials from last year’s course can be found here.

Number of participants

Maximum 25 PhD students

Evaluation criteria (doctoral training programme)

100% active participation