Zhoujun (Jorge) Cheng    承洲骏

Glad to meet you here! I am Zhoujun (pronounced similar to Jorge). I am now a first-year Ph.D. student at UC San Diego, advised by Prof. Zhiting Hu.

Before that, I received my B.E. and M.E. degrees in Computer Science (IEEE class) from Shanghai Jiao Tong University, under the supervision of Prof. Fan Cheng. I also spent great times in HKUNLP advised by Prof. Tao Yu and Microsoft Research Asia with Haoyu Dong.

My research interests lie in the intersection of building large language models and agent systems.

Email  /  Google Scholar  /  Semantic Scholar  /  Twitter  /  github

profile photo

  Publications
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu
NeurIPS 2024 Datasets & Benchmarks
pdf | website | code | data | data viewer

An executable environment for benchmarking multi-modal agents on computers.

What Are Tools Anyway? A Survey from the Language Model Perspective
Zhiruo Wang, Zhoujun Cheng, Hao Zhu, Daniel Fried, Graham Neubig
COLM 2024
pdf | collection

An attempt to clarify and discuss some ambiguities in LM tool-using papers.

OpenAgents: An Open Platform for Language Agents in the Wild
Tianbao Xie*, Fan Zhou*, Zhoujun Cheng*, Peng Shi*, Luoxuan Weng*, Yitao Liu*, Toh Jing Hua, Junning Zhao, Qian Liu, Che Liu, Leo Z. Liu, Yiheng Xu, Hongjin Su, Dongchan Shin, Caiming Xiong, Tao Yu
COLM 2024
pdf | code | demo | docs

An open platform for using, hosting, and building language agents.

Lemur: Harmonizing Natural Language and Code for Language Agents
Yiheng Xu*, Hongjin Su*, Chen Xing*, Boyu Mi, Qian Liu, Weijia Shi, Binyuan Hui, Fan Zhou, Yitao Liu, Tianbao Xie, Zhoujun Cheng, Siheng Zhao, Lingpeng Kong, Bailin Wang, Caiming Xiong, Tao Yu
ICLR 2024 (Spotlight)
pdf | code | checkpoint

A pretrained 70B agent model with balanced code-text corpora.

Binding Language Models in Symbolic Languages
Zhoujun Cheng*, Tianbao Xie*, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu
ICLR 2023 (Spotlight)
pdf | code | demo

A training-free neural-symbolic framework mapping task inputs to programs of LLM calls + symbolic languages.

Batch Prompting: Efficient Inference with Large Language Model APIs
Zhoujun Cheng, Jungo Kasai, Tao Yu
EMNLP 2023 Industry Track
pdf | code

A simple prompting approach that enables the LLMs to run inference in batches to save budgets&time.

TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Data
Fan Zhou, Mengkang Hu, Haoyu Dong, Zhoujun Cheng, Shi Han, Dongmei Zhang
EMNLP 2022 (Oral)
pdf | code

Precomputing aggregation/arithmetic results to assist table numerical reasoning.

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation
Zhoujun Cheng*, Haoyu Dong*, Zhiruo Wang*, Ran Jia, Jiaqi Guo, Yan Gao, Shi Han, Jian-Guang Lou, Dongmei Zhang
ACL 2022
pdf | code | dataset

A hierarchical table dataset for question answering and natural language generation.

FORTAP: Using Formulae for Numerical-Reasoning-Aware Table Pretraining
Zhoujun Cheng*, Haoyu Dong*, Ran Jia, Pengfei Wu, Shi Han, Fan Cheng, Dongmei Zhang
ACL 2022
pdf | code

Adopting spreadsheet formulas to enhance numerical reasoning skills of table modeling.

Table Pre-training: A Survey on Model Architectures, Pretraining Objectives, and Downstream Tasks
Haoyu Dong, Zhoujun Cheng, Xinyi He, Mengyu Zhou, Anda Zhou, Fan Zhou, Ao Liu, Shi Han, Dongmei Zhang
IJCAI 2022 Survey Track
pdf

A survey on various tabular models, especially on the pretrained transformers.

KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human Annotations
Yang You, Yujing Lou*, Chengkun Li*, Zhoujun Cheng, Liangwei Li, Lizhuang Ma, Cewu Lu, Weiming Wang
CVPR 2020
pdf | code

A large-scale and diverse 3D keypoint dataset.

Human Correspondence Consensus for 3D Object Semantic Understanding
Yujing Lou*, Yang You*, Chengkun Li*, Zhoujun Cheng, Liangwei Li, Lizhuang Ma, Weiming Wang, Cewu Lu
ECCV 2020
pdf | code

Learning dense semantic correspondences on 3D objects.

  Projects
OpenAgents Platform

OpenAgents is an agent engineering platform for LLM-powered agents. Before the online service stops, it gained 3.5K stars on Github and has been used by 7K+ users. It has three built-in agents:

  • Data Agent: code interpreter augmented with data tools
  • Plugins Agent: 200+ plugins for daily life
  • Web Agent: autonomous web browsing

  • demo | code | docs | twitter

      Services & Awards
  • Reviewer: ARR, NAACL, ACL, EMNLP, EACL, ICLR
  • Teaching Assistant: Introduction to Programming, Introduction to Machine Learning
  • National Scholarship, 2018
  • Shanghai Outstanding Graduates, 2021

  •   Miscellanies

    Beyond academics, I love exploring various cuisines and cooking (inspired by @美食作家王刚), and have fun with Dota2 (inspired by @谢彬DD). Additionally, I've been a fan of the NBA and have enjoyed playing basketball ever since the Mavericks won the championship, largely due to my admiration for Dirk Nowitzki.



    Last update on Oct. 2024

    website credit to Jon Barron and Tairan He