Navigating Through the LLM Zoo: How to Find the Best Model?

Speaker:  Shiqiang Wang – Exeter, United Kingdom
Topic(s):  Artificial Intelligence, Machine Learning, Computer Vision, Natural language processing

Abstract

Open-weight large language model (LLM) zoos provide access to numerous high-quality models, but selecting the appropriate model for specific tasks remains challenging and requires technical expertise. Most users simply want factually correct, safe, and satisfying responses without concerning themselves with model technicalities, while inference service providers prioritize minimizing operating costs. In this talk, I will share some of our recent progress towards choosing the best model in the presence of such competing interests. I will first introduce MESS+, a stochastic optimization algorithm for cost-optimal LLM request routing while providing rigorous service level agreement (SLA) compliance guarantees. MESS+ learns request satisfaction probabilities of LLMs in real-time as users interact with the system, based on which model selection decisions are made by solving a per-request optimization problem. Then, considering more sophisticated systems with lightweight local LLMs for processing simple tasks at high speed and large-scale cloud LLMs for handling multi-modal data sources, I will present TMO, a local-cloud LLM inference system with "Three-M" Offloading: Multi-modal, Multi-task, and Multi-dialogue. TMO leverages a strategy based on reinforcement learning (RL) to optimize the inference location and multi-modal data sources to use for each task/dialogue, aiming to maximize the long-term reward while adhering to resource constraints. Finally, I will conclude the talk by outlining some future directions.

About this Lecture

Number of Slides:  30
Duration:  60 minutes
Languages Available:  English
Last Updated:  12/05/2026

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.