About me

Welcome to Wei-Lin Chiang’s page!

I am a PhD student at UC Berkeley SkyLab, working with Prof. Ion Stoica.
My research focuses on building evaluation systems for AI. I’m currently working on Chatbot Arena project, a crowdsourced AI evaluation platform at LMArena (formerly LMSYS.org).
We are fortunate to receive open source grants by a16z and Sequoia. Check out our blogs at LMArena or follow our updates on X.

News

[2024.12] Text-to-image Arena: evaluating text-to-image models in the wild.
[2024.12] WebDev Arena: evaluating AI in real-world web app development.
[2024.09] RedTeam Arena: A Community-driven Jailbreaking Platform
[2024.08] New blog post: decoupling style and substance in Chatbot Arena
[2024.06] Launched: Multimodal Arena
[2024.05] We hosted a Kaggle competition for human preference prediction
[2024.04] Arena Hard and BenchBuilder: a data curation pipeline for LLMs benchmarks
[2024.03] Released: technical report on Chatbot Arena

Projects

Chatbot Arena: An Open Platform for Crowdsourced AI Benchmarking
Our website has served millions of users, collecting over 2.5 million user votes for the leaderboard; We are honored to be recognized by industry leaders and researchers including Jeff Dean, Andrej Karpathy, and Sam Altman.
| Paper | Blog | Website |
LLM Judge: Automating LLM Evaluation
We are developing automated evaluation for LLMs, such as MT-Bench and Arena-Hard benchmarks.
| Paper | Code |
FastChat: Multi-Model Serving Framework
FastChat is an open-source system powering Chatbot Arena and has gained strong developer community (over 30K GitHub stars and 200+ contributors)
Vicuna: high-quality LLM chatbot
Vicuna has been downloaded over 8 million times with 1000+ citations.
| Blog | Weights |
SkyPilot: An Intercloud System for AI and Batch Jobs
| GitHub | Paper |
Cluster-GCN: Scalable Training for Large GNNs
Widely integrated into platforms like DGL, PyTorch Geometric
| GitHub | Paper |

Work Experience

Intern@Amazon, Seattle (May. 2021 - Aug. 2021)
Contrastive learning for information extraction on semi-structure webpages
Intern@Google Research, Mountain View (Dec. 2018 - Mar. 2019)
Developed algorithms for training large and deep GCN models.
Cluster-GCN paper, code
Intern@Alibaba Group, Hangzhou (July 2017 - Sept. 2017)
Distributed ML algorithms on Alibaba’s parameter server (KunPeng)
Intern@Microsoft Research Asia, Beijing (Dec. 2016 - Feb. 2017)
Distributed training for deep learning frameworks
Intern@Microsoft, Redmond (July 2016 - Oct. 2016)
Large-scale ML algorithms on Microsoft’s distributed platform (REEF)

Publications (full list on Google Scholar)

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Wei-Lin Chiang*, Lianmin Zheng*, Sheng Ying, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez, Ion Stoica (*equal contribution)
arXiv preprint
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Lianmin Zheng*, Wei-Lin Chiang*, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica, Hao Zhang (*equal contribution)
ICLR 2024
Llm-assisted code cleaning for training accurate code generators
Naman Jain, Tianjun Zhang, Wei-Lin Chiang, Joseph E. Gonzalez, Koushik Sen, Ion Stoica
ICLR 2024
Rethinking benchmark and contamination for language models with rephrased samples Shuo Yang*, Wei-Lin Chiang*, Lianmin Zheng*, Joseph E. Gonzalez, Ion Stoica (*equal contribution)
arXiv preprint
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Lianmin Zheng*, Wei-Lin Chiang*, Sheng Ying*, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhouhan Li, Dacheng Li, Eric Xing, Hao Zhang, Joseph Gonzalez, Ion Stoica (*equal contribution)
NeurIPS 2023 Dataset and Benchmarks Track
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality
Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph Gonzalez, Ion Stoica, Eric Xing (alphabetical order)
Blogpost model weights
Can’t Be Late: Optimizing Spot Instance Savings under Deadlines
Zhanghao Wu, Wei-Lin Chiang, Zongheng Yang, Eric Friedman, Scott Shenker, Ion Stoica.
NSDI 2024 (Outstanding Paper Award)
SkyPilot: An Intercloud Broker for Sky Computing
Zongheng Yang, Zhanghao Wu, Michael Luo, Wei-Lin Chiang, Romil Bhardwaj, Woosuk Kwon, Siyuan Zhuang, Frank Sifei Luan, Gautam Mittal, Scott Shenker, Ion Stoica
USENIX NSDI 2023
Balsa: Learning a Query Optimizer Without Expert Demonstrations
Zongheng Yang, Wei-Lin Chiang⁺, Sifei Luan⁺, Gautam Mittal, Michael Luo, Ion Stoica. (+ equal contribution)
ACM SIGMOD 2022
Manifold Identification for Ultimately Communication-Efficient Distributed Optimization
Yu-Sheng Li, Wei-Lin Chiang, and Ching-pei Lee.
International Conference on Machine Learning (ICML), 2020
Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks [code, dataset (Amazon2M)]
Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh.
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 2019 (Oral) slides, poster
Preconditioned Conjugate Gradient Methods in Truncated Newton Frameworks for Large-scale Linear Classification [supplement & code. Implementation available in LIBLINEAR after version 2.20.]
Chih-Yang Hsia, Wei-Lin Chiang, and Chih-Jen Lin.
Asian Conference on Machine Learning (ACML), 2018 (Best paper award) slides, poster
Limited-memory Common-directions Method for Distributed L1-regularized Linear Classification [supplement & code. Implementation available in Distributed LIBLINEAR.]
Wei-Lin Chiang, Yu-Sheng Li, Ching-pei Lee, and Chih-Jen Lin.
SIAM International Conference on Data Mining (SDM), 2018 slides, poster
Parallel Dual Coordinate Descent Method for Large-scale Linear Classification in Multi-core Environments [supplement, code. Implementation available in Multi-core LIBLINEAR.]
Wei-Lin Chiang, Mu-Chu Lee, and Chih-Jen Lin.
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 2016 poster
Fast Matrix-vector Multiplications for Large-scale Logistic Regression on Shared-memory Systems [supplement, code. Implementation available in Multi-core LIBLINEAR.]
Mu-Chu Lee, Wei-Lin Chiang, and Chih-Jen Lin.
IEEE International Conference on Data Mining (ICDM), 2015 slides