ASPLOS’25 Tutorial: GenAI Catalyst
Abstract
In the rapidly evolving landscape of generative artificial intelligence (AI), the efficiency of underlying systems and compilers plays a crucial role in enabling scalable, sustainable, and accessible AI technologies. This tutorial aims to provide participants with a comprehensive understanding of the state-of-the-art techniques in the design and implementation of systems and compilers that optimize the performance of generative AI models, especially for large language models (LLMs).
First, we will present Mirage, the first multi-level superoptimization-based tensor program compiler that can help developers to generate fast GPU Kernels without programming in CUDA/Triton. Second, we will present FlexFlow Serve, a distributed runtime system for low-latency, high-performance LLM serving. We will also introduce FlexLLM, a distributed runtime for memory-efficient LLM finetuning.
Participants will learn about the latest research in AI infrastructure to significantly improve the efficiency of generative AI applications. The tutorial will also feature interactive sessions and hands-on demonstrations, allowing participants to interact directly with the systems and compilers discussed.
This tutorial
This tutorial will be held at ASPLOS 2025 in Rotterdam on Sunday, March 30th, 2025, morning at Room Goudriaan II of Postillion Hotel & Convention Centre WTC Rotterdam.
Tentative Schedule
Time | Topic |
---|---|
10 mins | Introduction |
60 mins | Efficient Compilers for Generative AI |
30 mins | Coffee break |
90 mins | Efficient Systems for Generative AI |
Closing Remark | Closing Remark |
Organizer
Organizer | |
---|---|
Xupeng Miao is a Kevin C. and Suzanne L. Kahn New Frontiers Assistant Professor in the Department of Computer Science at Purdue University. | |
Zhihao Jia is an assistant professor in the Computer Science Department at Carnegie Mellon University. |
Contact us
For any further questions, please contact Xupeng Miao via xupeng@purdue.edu.