Zhoujun (Jorge) Cheng 承洲骏

Glad to meet you here! I am Zhoujun. I am now a Ph.D. student at UC San Diego, advised by Prof. Zhiting Hu.

Prior to this, I received my B.E. and M.E. degrees in Computer Science (IEEE class) from Shanghai Jiao Tong University. I also spent great times in HKUNLP with Prof. Tao Yu and Microsoft Research Asia with Haoyu Dong.

/ / / /

My research focuses on scaling RL and agents along multiple axes:

Compute (Compute-Optimal LLM RL): how to optimally allocate RL rollout compute as it scales;
Data (Guru): scaling RL training data across domains and sizes for general reasoning;
Environments (NanoRollout): scaling parallel environments for digital agents (coding & computer-use) for large-scale agent RL and distillation.

I am also a core developer of foundation models like Mocha-Coder-32B and K2-V2-70B, which apply these insights at production scale. I also contribute to multiple real-world agent evaluations, including CocoaBench, OSWorld, and BigCodeBench. I believe good evaluation shapes model capabilities.

Publications

* equal contribution; ^† equal advising.

NanoRollout: Scale Digital Agent Rollouts without Pain
Junli Wang*, Zhoujun Cheng*^†, Yuxuan Zhang*, Shibo Hao, Yao Tang, Zhiting Hu, Prithviraj Ammanabrolu, Hao Zhang^†
Technical Blog and Repo
blog | code | model | data
TL;DR: Scaling parallel environments for digital agents (coding & computer-use) for large-scale agent RL and distillation.

IsoCompute Playbook: Optimally Scaling Sampling Compute for RL Training of LLMs
Zhoujun Cheng*, Yutao Xie*, Yuxiao Qu*, Amrith Setlur*, Shibo Hao, Varad Pimpalkhute, Tongtong Liang, Feng Yao, Zhengzhong Liu, Eric Xing, Virginia Smith, Ruslan Salakhutdinov, Zhiting Hu, Taylor Killian, Aviral Kumar
ICML 2026
pdf | website | slides @Northwestern MLL Lab
TL;DR: Given a fixed sampling compute budget C, how should it be allocated across problems per batch B_problem, rollouts per problem n, and sequential training steps M ?

CocoaBench: Evaluating Unified Digital Agents in the Wild
Cocoa Team
Under Review
pdf | website | code
TL;DR: A unified agent benchmark that requires coding, computer use, and deep research.

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
Zhoujun Cheng*, Shibo Hao*, Tianyang Liu*, Fan Zhou, Yutao Xie, Feng Yao, Yuexin Bian, Yonghao Zhuang, see full list, Eric P. Xing, Zhiting Hu
NeurIPS 2025 Datasets & Benchmarks
pdf | website | dataset | code | model
TL;DR: A study of RL for LLM general reasoning with 92K curated problems across six domains.

Esoteric Language Models: Bridging Autoregressive and Masked Diffusion LLMs
Subham Sekhar Sahoo, Zhihan Yang, Yash Akhauri, Johnna Liu, Deepansha Singh, Zhoujun Cheng, Zhengzhong Liu, Eric Xing, John Thickstun, Arash Vahdat
Preprint
pdf
TL;DR: A unified LM family bridging autoregressive and masked diffusion paradigms.

MegaMath: Pushing the Limits of Open Math Corpora
Fan Zhou*, Zengzhi Wang*, Nikhil Ranjan, Zhoujun Cheng, Liping Tang, Guowei He, Zhengzhong Liu, Eric P. Xing
COLM 2025
pdf | dataset | code
TL;DR: A 200B-scale high-quality mathematical mid-training dataset.

K2-V2: A 360-Open, Reasoning-Enhanced LLM
Core contributor (responsible for pretraining scaling law and post-training for reasoning)
Technical Report
pdf | data | code
TL;DR: A fully open 70B reasoning-enhanced LLM pretrained from scratch.

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu
NeurIPS 2024 Datasets & Benchmarks
pdf | website | code | data | data viewer
TL;DR: A computer use agent benchmark.

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu, Zijian Wang, Binyuan Hui, Niklas Muennighoff, David Lo, Daniel Fried, Xiaoning Du, Harm de Vries, Leandro Von Werra
ICLR 2025 (Oral)
pdf | website | code
TL;DR: A benchmark for code generation with diverse function calls and complex instructions.

OpenAgents: An Open Platform for Language Agents in the Wild
Tianbao Xie*, Fan Zhou*, Zhoujun Cheng*, Peng Shi*, Luoxuan Weng*, Yitao Liu*, Toh Jing Hua, Junning Zhao, Qian Liu, Che Liu, Leo Z. Liu, Yiheng Xu, Hongjin Su, Dongchan Shin, Caiming Xiong, Tao Yu
COLM 2024, 4.7k github stars, 7k demo users
pdf | code | demo | docs
TL;DR: An open platform for using, hosting, and building language agents.

What Are Tools Anyway? A Survey from the Language Model Perspective
Zhiruo Wang, Zhoujun Cheng, Hao Zhu, Daniel Fried, Graham Neubig
COLM 2024
pdf | collection
TL;DR: An attempt to clarify and discuss some ambiguities in LM tool-using papers.

Lemur: Harmonizing Natural Language and Code for Language Agents
Yiheng Xu*, Hongjin Su*, Chen Xing*, Boyu Mi, Qian Liu, Weijia Shi, Binyuan Hui, Fan Zhou, Yitao Liu, Tianbao Xie, Zhoujun Cheng, Siheng Zhao, Lingpeng Kong, Bailin Wang, Caiming Xiong, Tao Yu
ICLR 2024 (Spotlight)
pdf | code | checkpoint
TL;DR: A pretrained 70B agent model with balanced code-text corpora.

Batch Prompting: Efficient Inference with Large Language Model APIs
Zhoujun Cheng, Jungo Kasai, Tao Yu
EMNLP 2023 Industry Track
pdf | code
TL;DR: Batch multiple queries into a single prompt for cheaper, faster LLM API inference.

Binding Language Models in Symbolic Languages
Zhoujun Cheng*, Tianbao Xie*, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu
ICLR 2023 (Spotlight)
pdf | code | demo
TL;DR: A training-free neural-symbolic framework mapping task inputs to programs of LLM calls + symbolic languages.

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation
Zhoujun Cheng*, Haoyu Dong*, Zhiruo Wang*, Ran Jia, Jiaqi Guo, Yan Gao, Shi Han, Jian-Guang Lou, Dongmei Zhang
ACL 2022
pdf | code | dataset
TL;DR: A hierarchical table dataset for question answering and natural language generation.

FORTAP: Using Formulae for Numerical-Reasoning-Aware Table Pretraining
Zhoujun Cheng*, Haoyu Dong*, Ran Jia, Pengfei Wu, Shi Han, Fan Cheng, Dongmei Zhang
ACL 2022
pdf | code
TL;DR: Adopting spreadsheet formulas to enhance numerical reasoning skills of table modeling.

TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Data
Fan Zhou, Mengkang Hu, Haoyu Dong, Zhoujun Cheng, Shi Han, Dongmei Zhang
EMNLP 2022 Findings
pdf
TL;DR: Pre-computing data cubes for numerical reasoning over tables.

Services & Awards

Services

Reviewer

NeurIPS 2025 · ICLR 2025–2026 · COLM 2025–2026 · ACL ARR 2023–2025 · NAACL/ACL/EMNLP/EACL/COLING 2024 · CCL 2022–2023

Teaching

Intro to Programming · Intro to Machine Learning

Awards

Shanghai Outstanding Graduates

2021

MSRA Stars of Tomorrow

2021

Zhiyuan Honor Scholarship (top 5%, SJTU)

2018–2020

SJTU Excellent Scholarship (top 10%)

2018–2020

National Scholarship

2018

Last update: May 2026

website credit to Jon Barron and Tairan He