Di Zhang: LLM Reasoning and Scientific Intelligence

I am a PhD candidate at Fudan University, working on chemical AI, multimodal reasoning, test-time search, reinforcement learning, and foundation models for scientific discovery. My recent work includes ChemLLM, ChemVLM, MCTSr, Llama-Berry, Critic-V, Chem-R, and related systems for reasoning and chemistry.

I am a Research Resident and Head of Agent Model at MindLab starting in 2026. Previously, I interned at NVIDIA Research from 2025 to 2026 and Shanghai AI Lab from 2023 to 2025, worked full-time as a machine learning developer at Alibaba from 2022 to 2023, and received my Master of Engineering from the USTC Robotics Lab from 2019 to 2022. I also interned at Ant Group and MIT Han Lab.

CV Email Google Scholar LinkedIn HuggingFace GitHub X/Twitter Blog Medium Blog

Updates

Jun 10, 2026 blog

CiteClaw: Reference Infrastructure for Agentic Research

Blog entry introducing CiteClaw as reference infrastructure for agentic research, with emphasis on citation tools, MCP access, Zotero state, and the bibliography curation skill.

citeclaw citations agents

Jun 8, 2026 publication

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Publication entry added for a study reframing PEFT as a substrate for persistent personal models scaled up, down, and out.

peft personal-models llm-infrastructure

Jun 7, 2026 blog

Macaron-V1-Preview

Original English repost of Mind Lab's Macaron-V1-Preview release for the 749B Mixture-of-LoRA Agent Model post-trained from GLM5.1.

macaron agent-models mol

Jun 6, 2026 blog

Agent-Own Loop Harness

Blog entry on turning OpenSpec from a human-attended workflow into an agent-owned SDD loop.

agents openspec sdd

All Updates Updates Feed Email Updates

Research

My current research is centered on the work I have pursued since 2025:

LLM reasoning: test-time scaling, reinforcement learning, tree search, self-evaluation, critic models, and controllable reasoning, represented by Llama-Berry, Control-R, SELT, Chem-R, Critic-V, and TinyEye.
Scientific intelligence: foundation models and reasoning systems for chemistry, materials science, molecules, and scientific discovery, represented by ChemVLM, Mol-R1, MolReflect, ChemAgent, MOOSE-Chem3, CMPhysBench, and LoRA-Chem.
Agentic learning: tool-using agents, memory, retrieval-time critique, scalable training/serving infrastructure, parameter-efficient personal models, and protocols for agentic model development, represented by PEFT scaling, MinT, delta-mem, Retrieval Is Not Enough, and MCP-based reasoning.

Selected Papers

Visit Google Scholar for the complete publication list.

NAACL 2025

Llama-Berry: Pairwise Optimization for Olympiad-Level Mathematical Reasoning via o1-like Monte Carlo Tree Search

Di Zhang, Jianbo Wu, Jingdi Lei, Tong Che, Jiatong Li, Tong Xie, Xiaoshui Huang, Shufei Zhang, Marco Pavone, Yuqiang Li, and others.

Proceedings of NAACL-HLT 2025, Long Papers, pages 7315-7337.

HF Paper

CVPR 2025

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Di Zhang, Jingdi Lei, Junxian Li, Xunzhi Wang, Yujie Liu, Zonglin Yang, Jiatong Li, Weida Wang, Suorong Yang, Jianbo Wu, and others.

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pages 9050-9061.

HF Paper

CVPR 2026

IAG: Input-Aware Backdoor Attack on VLM-Based Visual Grounding

Junxian Li, Beining Xu, Simin Chen, Jiatong Li, Jingdi Lei, Haodong Zhao, Di Zhang^‡.

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026, pages 27872-27883.

HF Paper

AAAI 2025

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

Junxian Li, Di Zhang, Xunzhi Wang, Zeying Hao, Jingdi Lei, Qian Tan, Cai Zhou, Wei Liu, Yaotian Yang, Xinrui Xiong, and others.

Proceedings of the AAAI Conference on Artificial Intelligence, 39(1), 415-423, 2025.

HF Paper

arXiv 2024

Accessing GPT-4 Level Mathematical Olympiad Solutions via Monte Carlo Tree Self-Refine with Llama-3 8B

Di Zhang, Xiaoshui Huang, Dongzhan Zhou, Yuqiang Li, Wanli Ouyang.

arXiv preprint arXiv:2406.07394, 2024.

HF Paper

TKDE 2026

MolReflect: Towards In-Context Fine-Grained Alignments Between Molecules and Texts

Jiatong Li, Yunqing Liu, Wei Liu, Jingdi Lei, Di Zhang, Wenqi Fan, Dongzhan Zhou, Yuqiang Li, Qing Li.

IEEE Transactions on Knowledge and Data Engineering, 2026.

HF Paper

Publications

2026

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters.
Mind Lab, Song Cao, Vic Cao, Kaijie Chen, Bunny Fan, Hera Feng, Huan Feng, Arthur Fu, Jun Gao, Hongquan Gu, and others. arXiv preprint arXiv:2606.02437, 2026. HF Paper
MinT: Managed Infrastructure for Training and Serving Millions of LLMs.
Mind Lab, Song Cao, Vic Cao, Andrew Chen, Kaijie Chen, Cleon Cheng, Steven Chiang, Kaixuan Fan, Hera Feng, Huan Feng, and others. arXiv preprint arXiv:2605.13779, 2026. HF Paper
delta-mem: Efficient Online Memory for Large Language Models.
Jingdi Lei, Di Zhang, Junxian Li, Weida Wang, Kaixuan Fan, Xiang Liu, Qihan Liu, Xiaoteng Ma, Baian Chen, Soujanya Poria. arXiv preprint arXiv:2605.12357, 2026. HF Paper
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text.
Ximing Lu, David Acuna, Jaehun Jung, Jian Hu, Di Zhang, Shizhe Diao, Yunheng Zou, Shaokun Zhang, Brandon Cui, Mingjie Liu, and others. arXiv preprint arXiv:2601.22975, 2026. arXiv
Retrieval Is Not Enough: Enhancing RAG Through Test-Time Critique and Optimization.
Jiaqi Wei, Hao Zhou, Xiang Zhang, Di Zhang, Zijie Qiu, Noah Wei, Jinzhe Li, Wanli Ouyang, Siqi Sun. Advances in Neural Information Processing Systems, 38:21484-21520, 2026. OpenReview
IAG: Input-Aware Backdoor Attack on VLM-Based Visual Grounding.
Junxian Li, Beining Xu, Simin Chen, Jiatong Li, Jingdi Lei, Haodong Zhao, Di Zhang^‡. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 27872-27883, 2026. HF Paper
MolReflect: Towards In-Context Fine-Grained Alignments Between Molecules and Texts.
Jiatong Li, Yunqing Liu, Wei Liu, Jingdi Lei, Di Zhang, Wenqi Fan, Dongzhan Zhou, Yuqiang Li, Qing Li. IEEE Transactions on Knowledge and Data Engineering, 2026. HF Paper

2025

TinyEye: Sharpening Visual Reasoning of Tiny Models with Offline Policy Optimization.
Di Zhang, Junxian Li, Shihao Wang, Weida Wang, Guo Chen, Hao Zhang, Shizhe Diao, Mingjie Liu, Ximing Lu, Jaehun Jung, and others. OpenReview, 2025. OpenReview
Error-Free Linear Attention Is a Free Lunch: Exact Solution from Continuous-Time Dynamics.
Jingdi Lei, Di Zhang, Soujanya Poria. arXiv preprint arXiv:2512.12602, 2025. arXiv
Chem-R: Learning to Reason as a Chemist.
Weida Wang, Benteng Chen, Di Zhang, Wanhao Liu, Shuchen Pu, Ben Gao, Jin Zeng, Xiaoyong Wei, Tianshu Yu, Shuzhou Sun, and others. arXiv preprint arXiv:2510.16880, 2025. arXiv
NVIDIA Nemotron Nano V2 VL.
Amala Sanjay Deshmukh, Kateryna Chumachenko, Tuomas Rintamaki, Matthieu Le, Tyler Poon, Danial Mohseni Taheri, Ilia Karmanov, Guilin Liu, Jarno Seppanen, Guo Chen, and others. arXiv preprint arXiv:2511.03929, 2025. arXiv
NVIDIA Isaac GR00T N1.6.
NVIDIA. NVIDIA News, 2025. News
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics.
Weida Wang, Dongchen Huang, Jiatong Li, Tengchao Yang, Ziyang Zheng, Di Zhang, Dong Han, Benteng Chen, Binzhao Luo, Zhiyu Liu, and others. arXiv preprint arXiv:2508.18124, 2025. arXiv
Your Reward Function for RL Is Your Best PRM for Search: Unifying RL and Search-Based TTS.
Can Jin, Yang Zhou, Qixin Zhang, Hongwu Peng, Di Zhang, Zihan Dong, Marco Pavone, Ligong Han, Zhang-Wei Hong, Tong Che, and others. arXiv preprint arXiv:2508.14313, 2025. arXiv
Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery.
Jiatong Li, Weida Wang, Qinggang Zhang, Junxian Li, Di Zhang, Changmeng Zheng, Shufei Zhang, Xiaoyong Wei, Qing Li. arXiv preprint arXiv:2508.08401, 2025. arXiv
Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training.
Mingjie Liu, Shizhe Diao, Jian Hu, Ximing Lu, Xin Dong, Hao Zhang, Alexander Bukharin, Shaokun Zhang, Jiaqi Zeng, Makesh Narsimhan Sreedhar, and others. arXiv preprint arXiv:2507.12507, 2025. arXiv
SELT: Self-Evaluation Tree Search for LLMs with Task Decomposition.
Mengsong Wu, Di Zhang, Yuqiang Li, Dongzhan Zhou, Wenliang Chen. arXiv preprint arXiv:2506.07557, 2025. arXiv
Control-R: Towards Controllable Test-Time Scaling.
Di Zhang, Weida Wang, Junxian Li, Xunzhi Wang, Jiatong Li, Jianbo Wu, Jingdi Lei, Haonan He, Peng Ye, Shufei Zhang, and others. arXiv preprint arXiv:2506.00189, 2025. HF Paper
Exploring the Application of Model Context Protocol for Enhanced Reasoning in Large Language Models.
Jianbo Wu, Di Zhang, Wei Shu, Jie Liu. ICML 2025 Workshop NewInML Poster, 2025. ICML
ChemAgent: Enhancing LLMs for Chemistry and Materials Science Through Tree-Search Based Tool Learning.
Mengsong Wu, YaFei Wang, Di Zhang, Yidong Ming, Yuqi An, Yuwei Wan, Wenliang Chen, Binbin Lin, Yuqiang Li, Tong Xie, Dongzhan Zhou. OpenReview, 2025. HF Paper
LoRA-Chem: Modular Machine Learning for Multitask Prediction in Organic Reactions.
Ben Gao, Penghui Li, Di Zhang, Qian Tan, Wanhao Liu, Xunzhi Wang, Junxian Li, Shufei Zhang, Dongzhan Zhou, Yuqiang Li, and others. CCS Chemistry, 1-9, 2025. DOI
MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback.
Wanhao Liu, Zonglin Yang, Jue Wang, Lidong Bing, Di Zhang, Dongzhan Zhou, Yuqiang Li, Houqiang Li, Erik Cambria, Wanli Ouyang. NeurIPS 2025 AI for Science Workshop, 2025. arXiv
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning.
Di Zhang, Jingdi Lei, Junxian Li, Xunzhi Wang, Yujie Liu, Zonglin Yang, Jiatong Li, Weida Wang, Suorong Yang, Jianbo Wu, and others. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9050-9061, 2025. HF Paper
Llama-Berry: Pairwise Optimization for Olympiad-Level Mathematical Reasoning via o1-like Monte Carlo Tree Search.
Di Zhang, Jianbo Wu, Jingdi Lei, Tong Che, Jiatong Li, Tong Xie, Xiaoshui Huang, Shufei Zhang, Marco Pavone, Yuqiang Li, and others. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, Long Papers, 7315-7337. HF Paper
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area.
Junxian Li, Di Zhang, Xunzhi Wang, Zeying Hao, Jingdi Lei, Qian Tan, Cai Zhou, Wei Liu, Yaotian Yang, Xinrui Xiong, and others. Proceedings of the AAAI Conference on Artificial Intelligence, 39(1):415-423, 2025. HF Paper

2024 and Earlier

Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models.
Haonan He, Yuchen Ren, Yining Tang, Ziyang Xu, Junxian Li, Minghao Yang, Di Zhang, Dong Yuan, Tao Chen, Shufei Zhang, and others. EMNLP 2025 Findings. HF Paper
Accessing GPT-4 Level Mathematical Olympiad Solutions via Monte Carlo Tree Self-Refine with Llama-3 8B.
Di Zhang, Xiaoshui Huang, Dongzhan Zhou, Yuqiang Li, Wanli Ouyang. arXiv preprint arXiv:2406.07394, 2024. HF Paper
ChemLLM: A Chemical Large Language Model.
D. Zhang, W. Liu, Q. Tan, J. Chen, H. Yan, Y. Yan, J. Li, W. Huang, X. Yue, D. Zhou. arXiv preprint arXiv:2402.06852, 2024. HF Paper
Sentiment Analysis Dataset for Service-Oriented Places Like Electric Power Supply Offices.
Bo Zhang, Chenguang Li, Di Zhang, Bin Lu, Kaibao Zhou, Jing Zhang, Qiming Zhu, Xiaoping Chen. Journal of Computer Applications, 42(S1):37-42, 2022. Wanfang
Target Selection Model for Robot Interaction and Robot Interaction System.
Bo Zhang, Bin Lyu, Huizhou Liu, Yu Ouyang, Qian Zhao, Di Zhang, Rongya Chen, Xiaoping Chen, Liang Tang, Songlin Zuo, and others. CN Patent CN114,399,529 B, 2022. Patent
Design and Implementation of Safety and Robustness of Mobile Service Robot Navigation in Complex Pedestrian Scenarios.
Di Zhang. Master thesis, University of Science and Technology of China, 2022.
Juvenile State Hypothesis: What We Can Learn from Lottery Ticket Hypothesis Researches?
Di Zhang. arXiv preprint arXiv:2109.03862, 2021. arXiv
Water Supply Engineering Design Scheme 2 of Municipal Services District of C City.
Di Zhang. Thesis, Hefei Technology University, 2019.

Experience

MindLab, Research Residency, Head of Agent Model, 2026-present.
NVIDIA Research, Research Intern, 2025-2026.
Shanghai AI Lab, Research Intern, 2023-2025.
Alibaba Inc., Machine Learning Developer, 2022-2023.
Ant Group, Intern, 2021.
MIT Han Lab, Intern, 2021-2022.

Education

Fudan University, PhD Candidate, 2023-present.
University of Science and Technology of China, Master of Engineering, Robotics Lab, 2019-2022.

Occasionally open to technically grounded collaborations and advisory conversations around agentic AI systems, research infrastructure, and scientific intelligence. Collaborate.