詹锟 | Kun Zhan

Email: zk_1028@aliyun.com

WeChat: KevinZhan1990

Beijing, China

Foundation Models for Physical Intelligence

Building embodied intelligence from autonomous driving to robotics

I lead Li Auto's MindVLA and MindGPT teams, spanning behavior intelligence, cognitive intelligence, and production-grade deployment across autonomous driving, smart cabin, and future embodied systems.

Request a collaboration Browse publications

Google Scholar zk_1028@aliyun.com KevinZhan1990 Beijing, China

About Me

I'm Kun Zhan, leading Li Auto's MindVLA and MindGPT teams while also serving as Site Manager for the company's Silicon Valley R&D center. My work spans behavior intelligence and cognitive intelligence, building foundation models for autonomous-driving VLA, smart-cabin LLM/VLM, and speech systems, then pushing them into production with automotive-grade reliability.

My journey began with a master's degree in Automation from Beihang University, followed by leading Baidu Apollo's behavior prediction team. Since joining Li Auto in 2021, I have been responsible for architecting and deploying three generations of autonomous-driving stacks, evolving toward a unified framework that connects perception, reasoning, planning, and action, and scales from driving to broader embodied systems.

My mission is to realize embodied, physical-world AGI using autonomous driving as the starting point and expanding toward robotics and wider real-world intelligence.

Scaled leadership

Lead Li Auto's Foundation Model organization across VLA, VLM, LLM, and World Models, covering research, training infrastructure, deployment, and on-vehicle integration.

Production impact

Delivered Highway NoA (2022), City NoA (2023), End-to-End + VLM dual-system (2024), and the new VLA stack (2025) to mass-produced Li Auto vehicles.

Global execution

Built Li Auto's U.S. research hub and aligned Silicon Valley exploration with Beijing headquarters execution.

Autonomous DrivingVLA models, end-to-end driving, planning, decision making

Computer VisionDetection, tracking, scene understanding, BEV perception

3D & World ModelsDynamic reconstruction, generative simulation, RL at fleet scale

Multimodal LLMsReasoning, planning, and driver-vehicle interaction

Agent ModelsReasoning to action with tool use, safety, and reliability

RoboticsEmbodied AI, humanoids, real-world manipulation and navigation

Head of Foundation Model Team

Lead the R&D of VLA foundation models and coordinate integration with self-developed autonomous-driving chips.
Built Li Auto's autonomous stack from E2E to VLM to VLA architectures operating on hundreds of thousands of vehicles.
Mentor a 100+ member organization across perception, planning, foundation models, simulation, and deployment.
Established dedicated world-model and RL groups to accelerate closed-loop learning and reduce real-world testing cost.

Site Manager, U.S. R&D Center

Launched Li Auto's overseas research hub, covering local strategy, budgeting, and talent acquisition.
Bridge Silicon Valley innovation with Beijing execution through cross-border program reviews and roadmap alignment.

Algorithm Lead, L4 Prediction & Planning

Led the L4 prediction pre-decision algorithms for robo-taxi pilots, improving motion forecasting in complex urban scenes.
Shipped planning-and-control modules and deep-learning onboard components for autonomous fleets in Beijing and Guangzhou.

论文数 49

总引用 1335

h-index 15

i10-index 21

arXiv preprint arXiv:2402.12289 [599](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=9069990263513405041 2024 引用 562

Drivevlm: The convergence of autonomous driving and large vision-language models

X Tian, J Gu, B Li, Y Liu, Y Wang, Z Zhao, K Zhan, P Jia, X Lang, H Zhao

论文链接 Scholar PDF DOI arXiv

European Conference on Computer Vision, 156-173 [426](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=8138670866186059561 2024 引用 406

Street gaussians: Modeling dynamic urban scenes with gaussian splatting

Y Yan, H Lin, C Zhou, W Wang, H Sun, K Zhan, X Lang, X Zhou, S Peng

论文链接 Scholar PDF DOI arXiv

Proceedings of the Computer Vision and Pattern Recognition Conference, 1559-1569 [87](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=10376473717473330982 2025 引用 86

Recondreamer: Crafting world models for driving scene reconstruction via online restoration

C Ni, G Zhao, X Wang, Z Zhu, W Qin, G Huang, C Liu, Y Chen, Y Wang, ...

论文链接 Scholar PDF DOI arXiv

IEEE Transactions on Cognitive and Developmental Systems [62](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=4421753326440065257 2026 引用 55

Planagent: A multi-modal large language agent for closed-loop vehicle motion planning

Y Zheng, Z Xing, Q Zhang, B Jin, P Li, Y Zheng, Z Xia, Y Chen, D Zhao

论文链接 Scholar PDF DOI arXiv

arXiv preprint arXiv:2406.01349 [56](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=8558179969649046939 2024 引用 51

Unleashing generalization of end-to-end autonomous driving with controllable long video generation

E Ma, L Zhou, T Tang, Z Zhang, D Han, J Jiang, K Zhan, P Jia, X Lang, ...

论文链接 Scholar PDF DOI arXiv

Proceedings of the Computer Vision and Pattern Recognition Conference, 822-832 [45](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=10025705900330225678 2025 引用 43

Streetcrafter: Street view synthesis with controllable video diffusion models

Y Yan, Z Xu, H Lin, H Jin, H Guo, Y Wang, K Zhan, X Lang, H Bao, X Zhou, ...

论文链接 Scholar PDF DOI arXiv

European Conference on Computer Vision, 367-384 [44](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=17399025554193074791 2024 引用 42

Tod3cap: Towards 3d dense captioning in outdoor scenes

B Jin, Y Zheng, P Li, W Li, Y Zheng, S Hu, X Liu, J Zhu, Z Yan, H Sun, ...

论文链接 Scholar PDF DOI arXiv

Proceedings of the Computer Vision and Pattern Recognition Conference, 27531… [32](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=1178226343426138884 2025 引用 32

Drivingsphere: Building a high-fidelity 4d world for closed-loop simulation

T Yan, D Wu, W Han, J Jiang, X Zhou, K Zhan, C Xu, J Shen

论文链接 Scholar PDF DOI arXiv

arXiv e-prints, arXiv: 2503.10434 [31](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=14639323362602446741 2025 引用 31

Finetuning generative trajectory model with reinforcement learning from human feedback

D Li, J Ren, Y Wang, X Wen, P Li, L Xu, K Zhan, Z Xia, P Jia, X Lang, N Xu, ...

论文链接 Scholar PDF DOI arXiv

arXiv preprint arXiv:2409.01595 [32](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=13549890270565481597 2024 引用 31

Dive: Dit-based video generation with enhanced control

J Jiang, G Hong, L Zhou, E Ma, H Hu, X Zhou, J Xiang, F Liu, K Yu, H Sun, ...

论文链接 Scholar PDF DOI arXiv

Patents

20 granted or issued patents: 18 CN and 2 US across perception, planning, and HD mapping pipelines.

Reviewer

CVPR, ICCV, ECCV, NeurIPS, AAAI, IROS, and journals including TPAMI, T-ITS, and T-IV.

Community

Organizer of the CVPR 2023 Autonomous Driving Workshop and frequent speaker on VLA deployment in production.

詹锟 | Kun Zhan

Building embodied intelligence from autonomous driving to robotics

About Me

Key Highlights

Scaled leadership

Production impact

Global execution

Research Interests

Work Experience

Li Auto

Baidu Apollo

学术成果

Top 10 引用论文

Drivevlm: The convergence of autonomous driving and large vision-language models

Street gaussians: Modeling dynamic urban scenes with gaussian splatting

Recondreamer: Crafting world models for driving scene reconstruction via online restoration

Planagent: A multi-modal large language agent for closed-loop vehicle motion planning

Unleashing generalization of end-to-end autonomous driving with controllable long video generation

Streetcrafter: Street view synthesis with controllable video diffusion models

Tod3cap: Towards 3d dense captioning in outdoor scenes

Drivingsphere: Building a high-fidelity 4d world for closed-loop simulation

Finetuning generative trajectory model with reinforcement learning from human feedback

Dive: Dit-based video generation with enhanced control

Patents & Service

Patents

Reviewer

Community