Data manipulation, analysis and visualisation in Python

Target group

Members of the Doctoral School of (Bioscience) Engineering and Natural Sciences.

This course is intended for researchers that have at least basic programming skills in Python. A basic (scientific) programming course that is part of the regular curriculum of bioscience engineering / engineering / sciences should suffice. For those who have experience in another programming language (e.g. Matlab, R, ...), following an online Python tutorial prior to the course should also suffice.

Content of the course

The handling of data is a recurring task for most scientists. Reading in experimental data, checking its properties and creating visualisations may become tedious tasks. Hence, increasing the efficiency in this process is beneficial for many scientists. Spreadsheet based software lacks the ability to properly support this process, due to the lack of automation and repeatability. The usage of a high-level scripting language such as Python is ideal for these tasks. This course trains students to use Python effectively to do these tasks. The course focuses on data manipulation and cleaning, explorative analysis and visualisation using some important packages such as Pandas, Numpy and Matplotlib.

Aim and Scope

This course is intended for researchers that have at least basic programming skills in Python. It targets researchers that want to enhance their general data manipulation and analysis skills in Python. The course does not aim to provide a course in statistics or machine learning. It aims to provide researchers the means to effectively tackle commonly encountered data handling tasks in order to increase the overall efficiency of the research.

Organisers

Prof. dr. Bernard De Baets (Department of mathematical modelling, statistics and Bioinformatics)
Prof. dr. ir. Ingmar Nopens (Department of mathematical modelling, statistics and Bioinformatics)
Prof. dr. Peter Dawyndt (Department of Applied Mathematics, Computer Science and Statistics)

Programme

The course is scheduled as a three day course with following program:
1. Day 1. Basic concepts and introduction to Pandas
Some essential concepts of the Python computing language are rehearsed. Setting up the programming environment with the required packages using the conda package manager and an introduction of the Jupyter notebook environment are covered. The day closes with an introduction to the data analysis package Pandas.
2. Day 2. Advanced Pandas
The usage of Pandas for different data cleaning and manipulation tasks is taught and the acquired skills will immediately be brought into practice to handle real-world data sets. Besides, the automation of these tasks as a reusable function or module will be tackled. Applications include time series handling, categorical data, merging data, ...
3. Day 3. Combining Pandas and Matplotlib
The process of data cleaning is combined with the creation of plots for data exploration. Essential functionalities of Matplotlib are taught and directly applied.

Dates and Venue

Monday 16 + Tuesday 17 + Thursday 19 December, 2019 from 9:00 - 17:00 (3 days)    (NO COURSE on Wednesday 18 December 2019 !)
Sandwich lunches are included at the workshop locations.
Venue: Pc-zaal Konrad Zuse, Krijgslaan 281 (S9), 3rd floor, Campus Sterre

Lecturers

  • Joris Van den Bossche is a core contributor to Pandas, the main data analysis library in Python and has given several tutorials on this topic at international conferences (PyData Paris and EuroScipy). Formerly, he did a PhD at Ghent University (Faculty of Bio-science Engineering, Department of Mathematical Modelling, Statistics and Bio-informatics) and VITO in air quality research and worked at the Paris-Saclay Center for Data Science. Currently he is a freelance open source software developer and teacher.
  • Stijn Van Hoey is currently working on the Internet of Water project at VITO. Until recently, he was Research Software Engineer and Open Data Publisher at INBO, supporting and automating the cleaning and publishing of data. Formerly, he did a PhD at Ghent University (Faculty of Bio-science Engineering, Department of Mathematical Modelling, Statistics and Bio-informatics) and VITO. As a teaching assistant, he taught courses on modelling and simulation of environmental systems, process control and scientific programming.

Registration

Please use following registration link: https://eventmanager.ugent.be/datamanippython

Course material

All the course material will be available on github . The course consists of hands-on sessions, making use of Jupyter notebooks . We recommend the participants to bring their own laptop with wifi access, but PC’s are available as well.

The materials from last year’s course can be found here.

Number of participants

Maximum 25 PhD students

Language

English

Evaluation criteria (doctoral training programme)

100% active participation in 3 days

Registration fee

Free of charge for members of the Doctoral Schools of (Bioscience) Engineering and Natural Sciences.