ASPLOS’25 Tutorial: GenAI Catalyst

Abstract

In the rapidly evolving landscape of generative artificial intelligence (AI), the efficiency of underlying systems and compilers plays a crucial role in enabling scalable, sustainable, and accessible AI technologies. This tutorial aims to provide participants with a comprehensive understanding of the state-of-the-art techniques in the design and implementation of systems and compilers that optimize the performance of generative AI models, especially for large language models (LLMs).

First, we will present Mirage, the first multi-level superoptimization-based tensor program compiler that can help developers to generate fast GPU Kernels without programming in CUDA/Triton. Second, we will present FlexFlow Serve, a distributed runtime system for low-latency, high-performance LLM serving. We will also introduce FlexLLM, a distributed runtime for memory-efficient LLM finetuning.

Participants will learn about the latest research in AI infrastructure to significantly improve the efficiency of generative AI applications. The tutorial will also feature interactive sessions and hands-on demonstrations, allowing participants to interact directly with the systems and compilers discussed.

This tutorial

This tutorial will be held at ASPLOS 2025 in Rotterdam on Sunday, March 30th, 2025, morning (9:00 AM) at Room Goudriaan II of Postillion Hotel & Convention Centre WTC Rotterdam.

Tentative Schedule

Time	Topic
15 mins	Introduction
75 mins	Efficient Compilers for Generative AI
30 mins	Coffee break
60 mins	Efficient Systems for Generative AI
Closing Remark	Closing Remark

Organizer

	Organizer
	Xupeng Miao is a Kevin C. and Suzanne L. Kahn New Frontiers Assistant Professor in the Department of Computer Science at Purdue University.
	Zhihao Jia is an Assistant Professor in the Computer Science Department at Carnegie Mellon University.

Reference

Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia. Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems. arXiv:2312.15234.
Mengdi Wu, Xinhao Cheng, Shengyu Liu, Chunan Shi, Jianan Ji, Kit Ao, Praveen Velliengiri, Xupeng Miao, Oded Padon, Zhihao Jia. Mirage: A Multi-Level Superoptimizer for Tensor Programs. In Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI’25).
Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia. SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’24).
Xupeng Miao, Gabriele Oliaro, Xinhao Cheng, Mengdi Wu, Colin Unger, Zhihao Jia. FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning. arXiv:2402.18789.

Contact us

For any further questions, please contact Xupeng Miao via xupeng@purdue.edu.