詹锟 | Kun Zhan

Head of Foundation Model Team at Li Auto

portrait.jpeg

Email: zk_1028@aliyun.com

WeChat: KevinZhan1990

Beijing, China

Foundation Models for Physical Intelligence

Building embodied intelligence from autonomous driving to robotics

I lead Li Auto's MindVLA and MindGPT teams, spanning behavior intelligence, cognitive intelligence, and production-grade deployment across autonomous driving, smart cabin, and future embodied systems.

Google Scholar zk_1028@aliyun.com KevinZhan1990 Beijing, China

About Me

I'm Kun Zhan, leading Li Auto's MindVLA and MindGPT teams while also serving as Site Manager for the company's Silicon Valley R&D center. My work spans behavior intelligence and cognitive intelligence, building foundation models for autonomous-driving VLA, smart-cabin LLM/VLM, and speech systems, then pushing them into production with automotive-grade reliability.

My journey began with a master's degree in Automation from Beihang University, followed by leading Baidu Apollo's behavior prediction team. Since joining Li Auto in 2021, I have been responsible for architecting and deploying three generations of autonomous-driving stacks, evolving toward a unified framework that connects perception, reasoning, planning, and action, and scales from driving to broader embodied systems.

My mission is to realize embodied, physical-world AGI using autonomous driving as the starting point and expanding toward robotics and wider real-world intelligence.

Key Highlights

Leadership, production execution, and research translation at scale.

Scaled leadership

Lead Li Auto's Foundation Model organization across VLA, VLM, LLM, and World Models, covering research, training infrastructure, deployment, and on-vehicle integration.

Production impact

Delivered Highway NoA (2022), City NoA (2023), End-to-End + VLM dual-system (2024), and the new VLA stack (2025) to mass-produced Li Auto vehicles.

Global execution

Built Li Auto's U.S. research hub and aligned Silicon Valley exploration with Beijing headquarters execution.

Research Interests

The current themes that define my research and engineering agenda.

Autonomous DrivingVLA models, end-to-end driving, planning, decision making
Computer VisionDetection, tracking, scene understanding, BEV perception
3D & World ModelsDynamic reconstruction, generative simulation, RL at fleet scale
Multimodal LLMsReasoning, planning, and driver-vehicle interaction
Agent ModelsReasoning to action with tool use, safety, and reliability
RoboticsEmbodied AI, humanoids, real-world manipulation and navigation

Work Experience

Programs and roles that shaped my approach to applied AI systems.

Li Auto

Apr 2021 - Present

Beijing / San Jose
Head of Foundation Model Team
  • Lead the R&D of VLA foundation models and coordinate integration with self-developed autonomous-driving chips.
  • Built Li Auto's autonomous stack from E2E to VLM to VLA architectures operating on hundreds of thousands of vehicles.
  • Mentor a 100+ member organization across perception, planning, foundation models, simulation, and deployment.
  • Established dedicated world-model and RL groups to accelerate closed-loop learning and reduce real-world testing cost.
Site Manager, U.S. R&D Center
  • Launched Li Auto's overseas research hub, covering local strategy, budgeting, and talent acquisition.
  • Bridge Silicon Valley innovation with Beijing execution through cross-border program reviews and roadmap alignment.

Baidu Apollo

Apr 2016 - Mar 2021

Beijing, China
Algorithm Lead, L4 Prediction & Planning
  • Led the L4 prediction pre-decision algorithms for robo-taxi pilots, improving motion forecasting in complex urban scenes.
  • Shipped planning-and-control modules and deep-learning onboard components for autonomous fleets in Beijing and Guangzhou.

学术成果

基于 Google Scholar 的论文与引用快照

论文数 49
总引用 1335
h-index 15
i10-index 21

Top 10 引用论文

按 Google Scholar 引用量排序,更新时间见卡片上方。

Google Scholar
Drivevlm: The convergence of autonomous driving and large vision-language models
arXiv preprint arXiv:2402.12289 [562](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=9069990263513405041 2024 引用 562

Drivevlm: The convergence of autonomous driving and large vision-language models

X Tian, J Gu, B Li, Y Liu, Y Wang, Z Zhao, K Zhan, P Jia, X Lang, H Zhao

Street gaussians: Modeling dynamic urban scenes with gaussian splatting
European Conference on Computer Vision, 156-173 [406](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=8138670866186059561 2024 引用 406

Street gaussians: Modeling dynamic urban scenes with gaussian splatting

Y Yan, H Lin, C Zhou, W Wang, H Sun, K Zhan, X Lang, X Zhou, S Peng

Recondreamer: Crafting world models for driving scene reconstruction via online restoration
Proceedings of the Computer Vision and Pattern Recognition Conference, 1559-1569 [86](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=10376473717473330982 2025 引用 86

Recondreamer: Crafting world models for driving scene reconstruction via online restoration

C Ni, G Zhao, X Wang, Z Zhu, W Qin, G Huang, C Liu, Y Chen, Y Wang, ...

Planagent: A multi-modal large language agent for closed-loop vehicle motion planning
IEEE Transactions on Cognitive and Developmental Systems [55](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=4421753326440065257 2026 引用 55

Planagent: A multi-modal large language agent for closed-loop vehicle motion planning

Y Zheng, Z Xing, Q Zhang, B Jin, P Li, Y Zheng, Z Xia, Y Chen, D Zhao

Unleashing generalization of end-to-end autonomous driving with controllable long video generation
arXiv preprint arXiv:2406.01349 [51](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=8558179969649046939 2024 引用 51

Unleashing generalization of end-to-end autonomous driving with controllable long video generation

E Ma, L Zhou, T Tang, Z Zhang, D Han, J Jiang, K Zhan, P Jia, X Lang, ...

Streetcrafter: Street view synthesis with controllable video diffusion models
Proceedings of the Computer Vision and Pattern Recognition Conference, 822-832 [43](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=10025705900330225678 2025 引用 43

Streetcrafter: Street view synthesis with controllable video diffusion models

Y Yan, Z Xu, H Lin, H Jin, H Guo, Y Wang, K Zhan, X Lang, H Bao, X Zhou, ...

Tod3cap: Towards 3d dense captioning in outdoor scenes
European Conference on Computer Vision, 367-384 [42](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=17399025554193074791 2024 引用 42

Tod3cap: Towards 3d dense captioning in outdoor scenes

B Jin, Y Zheng, P Li, W Li, Y Zheng, S Hu, X Liu, J Zhu, Z Yan, H Sun, ...

Drivingsphere: Building a high-fidelity 4d world for closed-loop simulation
Proceedings of the Computer Vision and Pattern Recognition Conference, 27531… [32](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=1178226343426138884 2025 引用 32

Drivingsphere: Building a high-fidelity 4d world for closed-loop simulation

T Yan, D Wu, W Han, J Jiang, X Zhou, K Zhan, C Xu, J Shen

Finetuning generative trajectory model with reinforcement learning from human feedback
arXiv e-prints, arXiv: 2503.10434 [31](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=14639323362602446741 2025 引用 31

Finetuning generative trajectory model with reinforcement learning from human feedback

D Li, J Ren, Y Wang, X Wen, P Li, L Xu, K Zhan, Z Xia, P Jia, X Lang, N Xu, ...

Dive: Dit-based video generation with enhanced control
arXiv preprint arXiv:2409.01595 [31](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=13549890270565481597 2024 引用 31

Dive: Dit-based video generation with enhanced control

J Jiang, G Hong, L Zhou, E Ma, H Hu, X Zhou, J Xiang, F Liu, K Yu, H Sun, ...

Patents & Service

Research service and technology transfer beyond the production stack.

Patents

20 granted or issued patents: 18 CN and 2 US across perception, planning, and HD mapping pipelines.

Reviewer

CVPR, ICCV, ECCV, NeurIPS, AAAI, IROS, and journals including TPAMI, T-ITS, and T-IV.

Community

Organizer of the CVPR 2023 Autonomous Driving Workshop and frequent speaker on VLA deployment in production.