About me
Welcome to Wei-Lin Chiang’s page!
- I am a PhD student at UC Berkeley SkyLab, working with Prof. Ion Stoica.
- My research focuses on building evaluation systems for AI. I’m currently working on Chatbot Arena project, a crowdsourced AI evaluation platform at LMArena (formerly LMSYS.org).
- We are fortunate to receive open source grants by a16z and Sequoia. Check out our blogs at LMArena or follow our updates on X.
News
- [2024.12] Text-to-image Arena: evaluating text-to-image models in the wild.
- [2024.12] WebDev Arena: evaluating AI in real-world web app development.
- [2024.09] RedTeam Arena: A Community-driven Jailbreaking Platform
- [2024.08] New blog post: decoupling style and substance in Chatbot Arena
- [2024.06] Launched: Multimodal Arena
- [2024.05] We hosted a Kaggle competition for human preference prediction
- [2024.04] Arena Hard and BenchBuilder: a data curation pipeline for LLMs benchmarks
- [2024.03] Released: technical report on Chatbot Arena
Projects
- Chatbot Arena: An Open Platform for Crowdsourced AI Benchmarking
Our website has served millions of users, collecting over 2.5 million user votes for the leaderboard; We are honored to be recognized by industry leaders and researchers including Jeff Dean, Andrej Karpathy, and Sam Altman.
| Paper | Blog | Website | - LLM Judge: Automating LLM Evaluation
We are developing automated evaluation for LLMs, such as MT-Bench and Arena-Hard benchmarks.
| Paper | Code | - FastChat: Multi-Model Serving Framework
FastChat is an open-source system powering Chatbot Arena and has gained strong developer community (over 30K GitHub stars and 200+ contributors) - Vicuna: high-quality LLM chatbot
Vicuna has been downloaded over 8 million times with 1000+ citations.
| Blog | Weights | - SkyPilot: An Intercloud System for AI and Batch Jobs
| GitHub | Paper | - Cluster-GCN: Scalable Training for Large GNNs
Widely integrated into platforms like DGL, PyTorch Geometric
| GitHub | Paper |
Work Experience
- Intern@Amazon, Seattle (May. 2021 - Aug. 2021)
Contrastive learning for information extraction on semi-structure webpages - Intern@Google Research, Mountain View (Dec. 2018 - Mar. 2019)
Developed algorithms for training large and deep GCN models.
Cluster-GCN paper, code - Intern@Alibaba Group, Hangzhou (July 2017 - Sept. 2017)
Distributed ML algorithms on Alibaba’s parameter server (KunPeng) - Intern@Microsoft Research Asia, Beijing (Dec. 2016 - Feb. 2017)
Distributed training for deep learning frameworks - Intern@Microsoft, Redmond (July 2016 - Oct. 2016)
Large-scale ML algorithms on Microsoft’s distributed platform (REEF)
Publications (full list on Google Scholar)
- Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Wei-Lin Chiang*, Lianmin Zheng*, Sheng Ying, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez, Ion Stoica (*equal contribution)
arXiv preprint - LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Lianmin Zheng*, Wei-Lin Chiang*, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica, Hao Zhang (*equal contribution)
ICLR 2024 - Llm-assisted code cleaning for training accurate code generators
Naman Jain, Tianjun Zhang, Wei-Lin Chiang, Joseph E. Gonzalez, Koushik Sen, Ion Stoica
ICLR 2024 - Rethinking benchmark and contamination for language models with rephrased samples Shuo Yang*, Wei-Lin Chiang*, Lianmin Zheng*, Joseph E. Gonzalez, Ion Stoica (*equal contribution)
arXiv preprint - Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Lianmin Zheng*, Wei-Lin Chiang*, Sheng Ying*, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhouhan Li, Dacheng Li, Eric Xing, Hao Zhang, Joseph Gonzalez, Ion Stoica (*equal contribution)
NeurIPS 2023 Dataset and Benchmarks Track - Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality
Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph Gonzalez, Ion Stoica, Eric Xing (alphabetical order)
Blogpost model weights - Can’t Be Late: Optimizing Spot Instance Savings under Deadlines
Zhanghao Wu, Wei-Lin Chiang, Zongheng Yang, Eric Friedman, Scott Shenker, Ion Stoica.
NSDI 2024 (Outstanding Paper Award) - SkyPilot: An Intercloud Broker for Sky Computing
Zongheng Yang, Zhanghao Wu, Michael Luo, Wei-Lin Chiang, Romil Bhardwaj, Woosuk Kwon, Siyuan Zhuang, Frank Sifei Luan, Gautam Mittal, Scott Shenker, Ion Stoica
USENIX NSDI 2023 - Balsa: Learning a Query Optimizer Without Expert Demonstrations
Zongheng Yang, Wei-Lin Chiang+, Sifei Luan+, Gautam Mittal, Michael Luo, Ion Stoica. (+ equal contribution)
ACM SIGMOD 2022 - Manifold Identification for Ultimately Communication-Efficient Distributed Optimization
Yu-Sheng Li, Wei-Lin Chiang, and Ching-pei Lee.
International Conference on Machine Learning (ICML), 2020 - Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks [code, dataset (Amazon2M)]
Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh.
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 2019 (Oral) slides, poster - Preconditioned Conjugate Gradient Methods in Truncated Newton Frameworks for Large-scale Linear Classification [supplement & code. Implementation available in LIBLINEAR after version 2.20.]
Chih-Yang Hsia, Wei-Lin Chiang, and Chih-Jen Lin.
Asian Conference on Machine Learning (ACML), 2018 (Best paper award) slides, poster - Limited-memory Common-directions Method for Distributed L1-regularized Linear Classification [supplement & code. Implementation available in Distributed LIBLINEAR.]
Wei-Lin Chiang, Yu-Sheng Li, Ching-pei Lee, and Chih-Jen Lin.
SIAM International Conference on Data Mining (SDM), 2018 slides, poster - Parallel Dual Coordinate Descent Method for Large-scale Linear Classification in Multi-core Environments [supplement, code. Implementation available in Multi-core LIBLINEAR.]
Wei-Lin Chiang, Mu-Chu Lee, and Chih-Jen Lin.
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 2016 poster - Fast Matrix-vector Multiplications for Large-scale Logistic Regression on Shared-memory Systems [supplement, code. Implementation available in Multi-core LIBLINEAR.]
Mu-Chu Lee, Wei-Lin Chiang, and Chih-Jen Lin.
IEEE International Conference on Data Mining (ICDM), 2015 slides