A data-centric view on workflows that couple HPC with large-scale models

Date: December 16, 2023

Invited Talk, NeurIPS'23: Workshop on Advancing Neural Network Training, New Orleans, Louisiana

The first edition of the Workshop on Advancing Neural Network Training (WANT) held as part of thre NeurIPS workshop presents talk relevant to computationally efficient training, which covers both memory-related issues and scalability issues, scalability and resource optimization. My invited talk explores the performance trade-offs within the multilayer high-performance I/O systems under the strain of emerging AI applications (inluding deep neural networks and large language models).

Abstract: In recent years, scientific computing workloads at HPC facilities have been undergoing a significant shift. While traditionally dominated by numerical simulations, these facilities are increasingly handling AI/ML applications for training and inference, processing and producing ever-increasing amounts of scientific data. Despite the focus on optimizing the execution of new AI/HPC workflows, little attention has been paid to the I/O runtime challenges they present. This talk aims to address that gap by analyzing these emerging trends from an I/O perspective. We will explore the performance of the multilayer high-performance I/O systems under the strain of these new workflows that combine traditional HPC techniques with AI interacting in new challenging ways.

Link to the event: https://want-ai-hpc.github.io/neurips2023/schedule
Link to my slides here .