Dynamic Resource Allocation for Reasoning Agents in HPC Workflows

Date:

Invited Talk, 19th Scheduling for large-scale systems workshop, Frejus, France

This talk presents a scheduling solution for a reasoning-based discovery loop where LLM agents monitor live simulation and sensor data to detect model drift, anomalies or points of interest. Upon detection, the agent dynamically requests burst resources to trigger analysis tasks or continual learning—e.g. updating surrogate models on-the-fly to refine plasma stability predictions, trigger correction checks or visualization.
I will discuss the scheduling challenges inherent in this “stochastic” workflow, presenting preliminary results to address the need for a trade-off between token consumption and computation and for priority based scheduling for agentic tasks, with real examples from the nuclear energy community.


Link to the event: 19th Scheduling for large-scale systems workshop Website
Link to my talk: PDF