Training and deploying large AI models, such as LLMs, require significant GPU resources and computing time, making them increasingly expensive and primarily accessible to major tech companies. At the same time, the growing energy consumption of these models raises concerns about their environmental impact. In this talk, we will explore methods to reduce computational costs and improve efficiency through quantization and tensor compression. Specifically, we will discuss how quantization and low-rank tensor decomposition techniques can compress large AI models and accelerate inference, as well as how tensor-compressed training can enhance the efficiency of training large-scale models.
About the speaker
Zi Yang is an Assistant Professor in the Department of Mathematics and Statistics at the University at Albany, SUNY. He earned his Ph.D. in Mathematics from the University of California, San Diego. His research focuses on tensor computation, polynomial optimization, and efficient training and inference of AI models. Prior to his current position, he was a Postdoctoral Scholar at the University of California, Santa Barbara.