Glad to meet you here! I am Zhoujun (pronounced similar to
Jorge). I am now a first-year Ph.D. student at UC San Diego, advised by Prof. Zhiting Hu.
Before
that, I received my B.E. and M.E. degrees in Computer Science (IEEE class) from Shanghai Jiao Tong
University,
under the supervision of Prof. Fan Cheng.
I also spent great times in HKUNLP advised by Prof. Tao
Yu and Microsoft Research Asia with Haoyu Dong.
My research interests lie in the intersection of building large language models and agent systems.
 / 
 / 
 / 
 / 
|
|
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer
Environments
Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua,
Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio
Savarese, Caiming Xiong, Victor Zhong, Tao Yu
NeurIPS 2024 Datasets & Benchmarks
pdf |
website |
code |
data |
data viewer
An executable environment for benchmarking multi-modal agents on computers.
|
What Are Tools Anyway? A Survey from the Language Model Perspective
Zhiruo Wang, Zhoujun Cheng, Hao Zhu, Daniel Fried, Graham Neubig
COLM 2024
pdf |
collection
An attempt to clarify and discuss some ambiguities in LM tool-using papers.
|
OpenAgents: An Open Platform for Language Agents in the Wild
Tianbao Xie*, Fan Zhou*, Zhoujun Cheng*, Peng Shi*, Luoxuan Weng*, Yitao Liu*, Toh
Jing Hua, Junning Zhao, Qian Liu, Che Liu, Leo Z. Liu, Yiheng Xu, Hongjin Su, Dongchan Shin, Caiming
Xiong, Tao Yu
COLM 2024
pdf |
code |
demo |
docs
An open platform for using, hosting, and building language agents.
|
Lemur: Harmonizing Natural Language and Code for Language Agents
Yiheng Xu*, Hongjin Su*, Chen Xing*, Boyu Mi, Qian Liu, Weijia Shi, Binyuan Hui, Fan Zhou, Yitao Liu,
Tianbao Xie, Zhoujun Cheng, Siheng Zhao, Lingpeng Kong, Bailin Wang, Caiming Xiong,
Tao Yu
ICLR 2024 (Spotlight)
pdf |
code |
checkpoint
A pretrained 70B agent model with balanced code-text corpora.
|
Binding Language Models in Symbolic Languages
Zhoujun Cheng*, Tianbao Xie*, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming
Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu
ICLR 2023 (Spotlight)
pdf |
code |
demo
A training-free neural-symbolic framework mapping task inputs to programs of LLM calls + symbolic
languages.
|
Batch Prompting: Efficient Inference with Large Language Model APIs
Zhoujun Cheng, Jungo Kasai, Tao Yu
EMNLP 2023 Industry Track
pdf |
code
A simple prompting approach that enables the LLMs to run inference in batches to save budgets&time.
|
TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over
Tabular Data
Fan Zhou, Mengkang Hu, Haoyu Dong, Zhoujun Cheng, Shi Han, Dongmei Zhang
EMNLP 2022 (Oral)
pdf |
code
Precomputing aggregation/arithmetic results to assist table numerical reasoning.
|
HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language
Generation
Zhoujun Cheng*, Haoyu Dong*, Zhiruo Wang*, Ran Jia, Jiaqi Guo, Yan Gao, Shi Han,
Jian-Guang Lou, Dongmei Zhang
ACL 2022
pdf |
code |
dataset
A hierarchical table dataset for question answering and natural language generation.
|
FORTAP: Using Formulae for Numerical-Reasoning-Aware Table Pretraining
Zhoujun Cheng*, Haoyu Dong*, Ran Jia, Pengfei Wu, Shi Han, Fan Cheng, Dongmei Zhang
ACL 2022
pdf |
code
Adopting spreadsheet formulas to enhance numerical reasoning skills of table modeling.
|
Table Pre-training: A Survey on Model Architectures, Pretraining Objectives, and
Downstream Tasks
Haoyu Dong, Zhoujun Cheng, Xinyi He, Mengyu Zhou, Anda Zhou, Fan Zhou, Ao Liu, Shi
Han, Dongmei Zhang
IJCAI 2022 Survey Track
pdf
A survey on various tabular models, especially on the pretrained transformers.
|
KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human
Annotations
Yang You, Yujing Lou*, Chengkun Li*, Zhoujun Cheng, Liangwei Li, Lizhuang Ma, Cewu
Lu, Weiming Wang
CVPR 2020
pdf
|
code
A large-scale and diverse 3D keypoint dataset.
|
Human Correspondence Consensus for 3D Object Semantic Understanding
Yujing Lou*, Yang You*, Chengkun Li*, Zhoujun Cheng, Liangwei Li, Lizhuang Ma,
Weiming Wang, Cewu Lu
ECCV 2020
pdf |
code
Learning dense semantic correspondences on 3D objects.
|
|
OpenAgents Platform
OpenAgents is an agent engineering platform for LLM-powered agents. Before the online service
stops, it gained 3.5K stars on Github and has been used by 7K+
users. It has three built-in agents:
Data Agent: code interpreter augmented with data tools
Plugins Agent: 200+ plugins for daily life
Web Agent: autonomous web browsing
demo |
code |
docs |
twitter
|
Reviewer: ARR, NAACL, ACL, EMNLP, EACL, ICLR
Teaching Assistant: Introduction to Programming, Introduction to Machine Learning
National Scholarship, 2018
Shanghai Outstanding Graduates, 2021
Beyond academics, I love exploring various cuisines and cooking (inspired by @美食作家王刚), and have fun with Dota2
(inspired by @谢彬DD). Additionally, I've been a fan of the
NBA and have enjoyed playing basketball ever since the Mavericks won the championship, largely due to my
admiration for Dirk Nowitzki.
|