Glad to meet you here! I am Zhoujun. I am now a Ph.D. student at UC San Diego, advised by Prof. Zhiting Hu.
Prior to this, I received my B.E. and M.E. degrees in Computer Science (IEEE class) from Shanghai
Jiao Tong
University. I also spent great times in HKUNLP with
Prof.
Tao
Yu and Microsoft Research Asia with Haoyu Dong.
 / 
 / 
 / 
 / 
My research focuses on scaling RL and agents along multiple axes:
Compute (Compute-Optimal
LLM RL): how to optimally allocate RL rollout compute as it scales;
Data (Guru): scaling RL training
data across domains and sizes for general reasoning;
Environments (NanoRollout):
scaling parallel environments for digital agents (coding & computer-use) for large-scale agent RL
and distillation.
I am also a core developer of foundation models like
Mocha-Coder-32B and
K2-V2-70B, which apply these insights at production
scale.
I also contribute to multiple real-world agent evaluations, including
CocoaBench,
OSWorld, and
BigCodeBench. I believe good evaluation shapes model
capabilities.
Publications
|
* equal contribution; † equal advising.
NanoRollout: Scale Digital Agent Rollouts without Pain
Junli Wang*, Zhoujun Cheng*†, Yuxuan Zhang*, Shibo Hao, Yao Tang, Zhiting
Hu, Prithviraj Ammanabrolu, Hao Zhang† Technical Blog and Repo blog |
code |
model |
data TL;DR: Scaling parallel environments for digital agents (coding & computer-use) for
large-scale agent RL
and distillation.
IsoCompute Playbook: Optimally Scaling Sampling Compute for RL Training of LLMs Zhoujun Cheng*, Yutao Xie*, Yuxiao Qu*, Amrith Setlur*, Shibo Hao, Varad Pimpalkhute,
Tongtong Liang, Feng Yao, Zhengzhong Liu, Eric Xing, Virginia Smith, Ruslan Salakhutdinov, Zhiting Hu, Taylor
Killian, Aviral Kumar
ICML 2026 pdf |
website |
slides@Northwestern MLL Lab TL;DR: Given a fixed sampling compute budget C, how should it be allocated
across
problems per batch Bproblem, rollouts per problem n, and sequential training steps
M ?
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
Zhoujun Cheng*, Shibo Hao*, Tianyang Liu*, Fan Zhou, Yutao Xie, Feng Yao, Yuexin Bian,
Yonghao Zhuang, Nilabjo Dey, Yuheng Zha, Yi Gu,
Kun Zhou, Yuqi Wang, Yuan Li, Richard Fan, Jianshu
She, Chengqian Gao, Abulhair Saparov, Haonan Li, Taylor W. Killian, Mikhail Yurochkin, Zhengzhong
Liu, see full list, Eric P.
Xing, Zhiting Hu
NeurIPS 2025 Datasets & Benchmarks pdf |
website |
dataset |
code |
model TL;DR: A study of RL for LLM general reasoning with 92K curated problems across six
domains.
Esoteric Language Models: Bridging Autoregressive and Masked Diffusion LLMs
Subham Sekhar Sahoo, Zhihan Yang, Yash Akhauri, Johnna Liu, Deepansha Singh, Zhoujun Cheng,
Zhengzhong Liu, Eric Xing, John Thickstun, Arash Vahdat
Preprint pdf TL;DR: A unified LM family bridging autoregressive and masked diffusion
paradigms.
MegaMath: Pushing the Limits of Open Math Corpora
Fan Zhou*, Zengzhi Wang*, Nikhil Ranjan, Zhoujun Cheng, Liping Tang, Guowei He, Zhengzhong
Liu, Eric P. Xing
COLM 2025 pdf |
dataset |
code TL;DR: A 200B-scale high-quality mathematical mid-training dataset.
K2-V2: A 360-Open, Reasoning-Enhanced LLM
Core contributor (responsible for pretraining scaling law and post-training for reasoning)
Technical Report pdf |
data |
code TL;DR: A fully open 70B reasoning-enhanced LLM pretrained from scratch.
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex
Instructions
Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan
Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong,
Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu,
Zhoujun Cheng, Jiawei Liu, Qian Liu, Zijian Wang, Binyuan Hui, Niklas Muennighoff, David Lo,
Daniel Fried, Xiaoning Du, Harm de Vries, Leandro Von Werra
ICLR 2025 (Oral) pdf |
website |
code TL;DR: A benchmark for code generation with diverse function calls and complex
instructions.
OpenAgents: An Open Platform for Language Agents in the Wild
Tianbao Xie*, Fan Zhou*, Zhoujun Cheng*, Peng Shi*, Luoxuan Weng*, Yitao Liu*, Toh
Jing Hua, Junning Zhao, Qian Liu, Che Liu, Leo Z. Liu, Yiheng Xu, Hongjin Su, Dongchan Shin, Caiming
Xiong, Tao Yu
COLM 2024, 4.7k github stars, 7k demo users pdf |
code |
demo |
docs TL;DR: An open platform for using, hosting, and building language agents.
Binding Language Models in Symbolic Languages Zhoujun Cheng*, Tianbao Xie*, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming
Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu
ICLR 2023 (Spotlight) pdf |
code |
demo TL;DR: A training-free neural-symbolic framework mapping task inputs to programs of LLM
calls + symbolic
languages.