I/O patterns in AI workflows and their impact on the performance of HPC simulations
Invited Talk, Schloss Dagstuhl - Leibniz Center for Informatics, Wadern, Germany
The Dagstuhl Seminar on Integrating HPC, AI, and Workflows for Scientific Data Analysis held between Aug 27 and Sep 1st 2023 focused on understanding of AI+HPC integrated workflows, elaborating on the different modes in which AI and HPC components could be coupled within workflows and the corresponding paradigm shift of HPC systems towards real-time interaction within workflows. My talk focused on profiling the I/O patterns in applications running on HPC throughout the years and identifying limitations on scaling modern applications that combine HPC and AI.
The seminar included discussions between the atendees and produced a report that identified key challenges and opportunities at the intersection of HPC and AI, such as the stochastic nature of ML and its impact on the reproducibility of data analysis on HPC systems. It highlighted the need for holistic co-design approaches, where workflows are introduced early and scaled from small-scale experiments to large-scale executions. This approach is essential for integrating the full workflow environment, including ML/AI components, early in the process, thereby replacing expensive simulation with fast-running surrogates and enabling interactive exploration with the entire software environment.
Link to the event: https://want-ai-hpc.github.io/neurips2023/schedule
The report PDF can be found here.