Session descriptions and details - SWPC2012
Multithreading and multiprocessing
Thursday and Friday, April 19-20, 2012 - @Sterre Building S9 : April 19 lecture room V1 (first floor) - April 20 Lecture room V3 (third floor).
Georg Hager and Jan Treibig (HPC services, Erlangen Regional Computing Center, Germany)
Chairs: Kenneth Hoste, Stijn De Weirdt
This course gives an introduction to shared-memory parallel programming and optimization on modern multicore systems. The main focus is on OpenMP, which is the dominant shared-memory programming model in computational science, but alternative approaches are also discussed.
After an introduction to parallelism and multicore architecture and to the most important shared-memory programming models we give a solid account of OpenMP and its use in multicore-based systems. Then we describe the dominant performance issues in shared-memory programming, like synchronization overhead, ccNUMA locality, and bandwidth saturation (in cache and memory) in order to pinpoint the influence of system topology and thread affinity on the performance of typical parallel programming constructs. Multiple ways of probing system topology and establishing affinity, either by explicit coding or separate tools, are demonstrated. The basic use of hardware counter measurements for performance analysis is discussed. Finally we elaborate on programming techniques that help establish optimal parallel memory access patterns and/or cache reuse, with an emphasis on leveraging shared caches for improving performance. Hands-on exercises allow the students to apply the concepts right away.
Georg Hager holds a PhD in computational physics from the University of Greifswald. He has been working with high performance systems since 1995, and is now a senior research scientist in the HPC group at Erlangen Regional Computing Center (RRZE). Recent research includes architecture-specific optimization for current microprocessors, performance modeling on processor and system levels, and the efficient use of hybrid parallel systems. See his blog at http://blogs.fau.de/hager for current activities, publications, and talks.
Jan Treibig is a chemical engineer with a special focus on computational fluid dynamics and technical thermodynamics. He holds a PhD in computer science from the University of Erlangen-Nuremberg, and has worked for two years in the embedded automotive software industry as software developer, test engineer and quality manager. Since 2008 he is a postdoctoral researcher in the HPC group at Erlangen Regional Computing Center Erlangen (RRZE). His research activities revolve around low-level and architecture-specific optimization and performance modeling. He is also the author of the LIKWID tool suite, a set of command line tools created to support developers of high-performance multithreaded codes.
Prior knowledge: UNIX/Linux skills are required, since we will be working with Linux systems in the hands-on sessions. Students should also have some programming experience with one of the dominant HPC languages: C, C++, or Fortran.
- open lecture given by the lecturers: http://blogs.fau.de/hager/talks/
- annual 1-semester lecture: Programing Techniques for Supercomputers
- one-week block course at KTH Stockholm: Efficient multithreaded programming on modern CPUs and GPUs
- one-week annual course together with LRZ Munich: Parallel Programming of High Performance Systems
GPGPU: considerations for parallelizing code with CUDA
Friday April 27, 2012 - @Het Pand - Zaal Oude Infirmerie
Carsten Griwodz (University of Oslo, Norway)
Chairs: Ruben De Visscher, Peter Dawyndt
The steadily increasing demand for computing power in all sectors is today addressed by multi-core chips, to the extent that even mobile phones can now be considered multi-core computers. Along the way, a particular kind of support hardware, the graphics processing unit, has attracted programmers' attention because it provides processing power that exceeds that of the CPUs and is available at relatively low cost. When used effectively, existing GPUs can contribute several times the processing power of a CPU to the processing of resource-demanding workloads. Using them effectively, however, is a bigger challenge than using CPUs. Designed for computing and rendering the pixels of complex visual scenes as efficiently as possible, they feature wide parallel processing pipelines, with very limited means for data exchange and synchronization between threads and I/O with other units. Their architectural specialization combined with their high raw computing power require that programmers consider how to combine them with the available CPU resources and to make separate algorithmic choices for both, CPU and GPU. This course is meant to provide an insight into the challenges and potential of GPU programming using the CUDA programming framework for NVidia graphics cards as an example.
Carsten Griwodz is the department leader at the Simula Research Laboratory and a Professor at the Department of Informatics at the University of Oslo, Norway. He is interested in issues of scalability for multimedia applications. His main research interest is the improvement of mechanisms and algorithms for media servers, interactive distributed multimedia and distribution systems. From 1993 to 1997, he worked at the IBM European Networking Center in Heidelberg, Germany. In 1997, he joined the Multimedia Communications Lab at Darmstadt University of Technology, Germany, where he received his PhD degree (Dr.-Ing.) in 2000. More information and publication list can be found at http://home.ifi.uio.no/~griff .
Prior knowledge: General background in informatics, basic knowledge in computer architecture, basic programming skills in C or C++ and working with the command line.
- Van Amsesfoort, S., Varbanescu, A.L., Sips, H.J. and van Nieuwpoort, R.V.: Evaluating multi-core platforms for hpc data-intensive kernels, ACM Computing Frontiers, 2009.
- Stensland, H.K., Espeland, H., Griwodz, C. and Halvorsen, P.: Tips, tricks and troubles: Optimizing for Cell and GPU, ACM NOSSDAV, 2010.
- Brodtkorb, A.R., Dyken, C., Hagen, T.R., Hjelmervik, J.M. and Storaasli, O.O.: State-of-the-art in heterogeneous computing. Scientific Programming (IOS Press), 2010.
- Ian Foster: Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering, Addison-Wesley, 1995.
- Hill, M.D., Marty, M.R.: Amdahl's Law in the Multicore Era, IEEE Computer, 2008.
- NVidia: NVidia CUDA C Programming Guide 4.0, 2011.
- NVidia: NVidia CUDA C Programming Best Practices Guide 4.0, 2011.
- NVidia: NVidia CUDA API Reference Manual 4.0, 2011.
- NVidia: Computer Visual Profiler User Guide 4.0, 2011.
Message Passing Interface
Friday May 4, 2012 - @Het Pand - Prior zaal
Jan Fostier (Ghent University, Belgium)
Chairs: Tom Kuppens, Michael Vyverman
The Message Passing Interface (MPI) is a standardized library
specification for message passing between different processes. In
layman's terms: MPI provides mechanisms for handling the data
communication in a parallel program. It is particularly suited for
computational clusters, where the workstations are connected by an
interconnection network (e.g. Infiniband, Gigabit Ethernet).
In this lecture, the applicability of MPI will be compared to other parallel programming paradigms such as OpenMP, Cuda and MapReduce. Next, the basic principles of MPI will be gradually introduced (Point-to-point communication, collective communication, MPI datatypes, etc). Hands-on exercises allow the participants to immediately turn the newly acquired skills into practice, using the UGent Stevin supercomputer infrastructure. Finally, some more theoretical considerations regarding scalability of algorithms are presented.
Jan Fostier received his MS and PhD degree in physical engineering from Ghent University in 2005 and 2009 respectively. Currently, he is appointed assistant professor in the department of Information Technology (INTEC) at the same university. His main research interests are (parallel) algorithms for biological sciences, high performance computing and computational electromagnetics.
Prior knowledge: Basic knowledge of C / C++ or Fortran is required. No prior knowledge of parallel computing is required. Every participant requires an account for the UGent HPC infrastructure.
- MPI, the complete reference
- Using MPI: Portable Parallel Programming with the Message-Passing Interface , William Gropp
MapReduce and Hadoop
Friday May 11, 2012 - @Het Pand - Prior zaal
Robin Aly (University of Twente, the Netherlands)
Chairs: Jan Fostier, Bart Mesuere
This course provides a mix of theory and hands-on to manage
Big Data as it is done in the data centers of large search engines. In
contrast to existing grid computing and supercomputing paradigms, which
both employ specialized and expensive hardware, search engines use large
numbers of commodity computers. This course teaches how to carry out
large-scale distributed data analysis using the programming paradigm
MapReduce. This paradigm is inspired by the functions 'map' and 'reduce'
as found in functional programming language such as Lisp. Students will
learn to specify algorithms using map and reduce steps and to implement
these algorithms in Java using Hadoop, an open source implementation
for analysis tasks. The course will also introduce the language Pig
Latin which can be used to specify MapReduce tasks in a declarative way.
Finally, if time permits, the course will touch the NoSQL database
HBase which allows structured storage of data suitable for random
Robin Aly is a Post-Doc at the University of Twente in the Netherlands, where he received his PhD in Content Based Multimedia Retrieval. He has a strong background in data management and distributed data processing using innovative programming paradigms in the Hadoop framework. He teaches this framework in master courses. He co-organized the Dutch-Belgian Information Retrieval workshop 2009 and participated in the program committee of several international conferences.
Prior knowledge: Intermediate programming skills in Java (to follow the hands-on sessions), basic knowledge in file systems, functional programming is a plus.
- Ghemawat, S., Gobioff, H. and Leung, S.: The Google file system, In ACM SIGOPS Operating Systems Review, 2003, 37, 29-43.
- Dean, J. and Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM, ACM, 2008, 51, 107-113.
- Jimmy Lin and Chris Dyer: Data-Intensive Text Processing with MapReduce.
- Olston, C., Reed, B., Srivastava, U., Kumar, R. and Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, ACM, 2008, 1099-1110.
- Tom White: Hadoop: The Definitive Guide. O’Reilly Media, 2009.