Data manipulation, analysis and visualisation in Python

Target group

This course is intended for PhD researchers of the Doctoral School of (Bioscience) Engineering and Natural Sciences that have at least basic programming skills in Python. A basic (scientific) programming course that is part of the regular curriculum of bioscience engineering / engineering / sciences should suffice. For those who have experience in another programming language (e.g. Matlab, R, ...), following an online Python tutorial prior to the course should also suffice.

Organizers

Prof. dr. Bernard De Baets (Department of mathematical modelling, statistics and Bioinformatics)
Prof. dr. ir. Ingmar Nopens (Department of mathematical modelling, statistics and Bioinformatics)
Prof. dr. Peter Dawyndt (Department of Applied Mathematics, Computer Science and Statistics)

Aim & scope

This course is intended for researchers that have at least basic programming skills in Python. It targets researchers that want to enhance their general data manipulation and analysis skills in Python.
The course does not aim to provide a course in statistics or machine learning. It aims to provide researchers the means to effectively tackle commonly encountered data handling tasks in order to increase the overall efficiency of the research.

Content

The handling of data is a recurring task for most scientists. Reading in experimental data, checking its properties and creating visualisations may become tedious tasks. Hence, increasing the efficiency in this process is beneficial for many scientists. Spreadsheet based software lacks the ability to properly support this process, due to the lack of automation and repeatability. The usage of a high-level scripting language such as Python is ideal for these tasks.
This course trains students to use Python effectively to do these tasks. The course focuses on data manipulation and cleaning, explorative analysis and visualisation using some important packages such as Pandas, Numpy and Matplotlib.

Lecturers

Joris Van den Bossche is a core contributor to Pandas, the main data analysis library in Python and has given several tutorials on this topic at international conferences (Scipy, EuroScipy, PyData Paris). He did a PhD at Ghent University (Faculty of Bio-science Engineering, Department of Mathematical Modelling, Statistics and Bio-informatics) and VITO in air quality research, worked at the Paris-Saclay Center for Data Science, and, currently is a freelance software developer and teacher.
Stijn Van Hoey is currently working as Research Software Engineer at Fluves, an engineering company operating in water and energy markets. Before that, he was Research Software Engineer and Open Data Publisher at INBO, supporting and automating the cleaning and publishing of data. Formerly, he did a PhD at Ghent University (Faculty of Bio-science Engineering, Department of Mathematical Modelling, Statistics and Bio-informatics) and VITO.

Programme

The course is scheduled as a three day course with the following program:

  • Day 1. Basic concepts and introduction to Pandas
    Some essential concepts of the Python computing language are rehearsed. Setting up the programming environment with the required packages using the conda package manager and an introduction of the Jupyter notebook environment are covered. The day closes with an introduction to the data analysis package Pandas.
  • Day 2. Advanced Pandas
    The usage of Pandas for different data cleaning and manipulation tasks is taught and the acquired skills will immediately be brought into practice to handle real-world data sets. Besides, the automation of these tasks as a reusable function or module will be tackled. Applications include time series handling, categorical data, merging data, ...
  • Day 3. Combining Pandas and Matplotlib
    The process of data cleaning is combined with the creation of plots for data exploration. Essential functionalities of Matplotlib are taught and directly applied.

Dates

31 May (Monday) + 1 June (Tuesday) + 3 June (Thursday) 2021

from 9:00 - 17:00 (3 days, no course on Wednesday 2 June 2021!)
ONLINE

Registration

Follow: this link to register: https://eventmanager.ugent.be/datainpython

If the course is fully booked, you can ask to be added to the waiting list by sending an email to doctoralschools@ugent.be

Registration fee

Free of charge for members of the Doctoral Schools of (Bioscience) Engineering and Natural Sciences of Gent University.

Course material

All the course material will be available on github. The course consists of hands-on sessions, making use of Jupyter notebooks.
The materials from last year’s course can be found here.

Number of participants

Maximum 25

Evaluation criteria (doctoral training programme)

100% participation