詹锟 | Kun Zhan

Head of Foundation Model Team at Li Auto

portrait.jpeg

Email: zk_1028@aliyun.com

WeChat: KevinZhan1990

Beijing, China

Foundation Models for Physical Intelligence

Building embodied intelligence from autonomous driving to robotics

I lead Li Auto's MindVLA and MindGPT teams, spanning behavior intelligence, cognitive intelligence, and production-grade deployment across autonomous driving, smart cabin, and future embodied systems.

Google Scholar zk_1028@aliyun.com KevinZhan1990 Beijing, China

About Me

I'm Kun Zhan, leading Li Auto's MindVLA and MindGPT teams while also serving as Site Manager for the company's Silicon Valley R&D center. My work spans behavior intelligence and cognitive intelligence, building foundation models for autonomous-driving VLA, smart-cabin LLM/VLM, and speech systems, then pushing them into production with automotive-grade reliability.

My journey began with a master's degree in Automation from Beihang University, followed by leading Baidu Apollo's behavior prediction team. Since joining Li Auto in 2021, I have been responsible for architecting and deploying three generations of autonomous-driving stacks, evolving toward a unified framework that connects perception, reasoning, planning, and action, and scales from driving to broader embodied systems.

My mission is to realize embodied, physical-world AGI using autonomous driving as the starting point and expanding toward robotics and wider real-world intelligence.

Key Highlights

Leadership, production execution, and research translation at scale.

Scaled leadership

Lead Li Auto's Foundation Model organization across VLA, VLM, LLM, and World Models, covering research, training infrastructure, deployment, and on-vehicle integration.

Production impact

Delivered Highway NoA (2022), City NoA (2023), End-to-End + VLM dual-system (2024), and the new VLA stack (2025) to mass-produced Li Auto vehicles.

Global execution

Built Li Auto's U.S. research hub and aligned Silicon Valley exploration with Beijing headquarters execution.

Research Interests

The current themes that define my research and engineering agenda.

Autonomous DrivingVLA models, end-to-end driving, planning, decision making
Computer VisionDetection, tracking, scene understanding, BEV perception
3D & World ModelsDynamic reconstruction, generative simulation, RL at fleet scale
Multimodal LLMsReasoning, planning, and driver-vehicle interaction
Agent ModelsReasoning to action with tool use, safety, and reliability
RoboticsEmbodied AI, humanoids, real-world manipulation and navigation

Work Experience

Programs and roles that shaped my approach to applied AI systems.

Li Auto

Apr 2021 - Present

Beijing / San Jose
Head of Foundation Model Team
  • Lead the R&D of VLA foundation models and coordinate integration with self-developed autonomous-driving chips.
  • Built Li Auto's autonomous stack from E2E to VLM to VLA architectures operating on hundreds of thousands of vehicles.
  • Mentor a 100+ member organization across perception, planning, foundation models, simulation, and deployment.
  • Established dedicated world-model and RL groups to accelerate closed-loop learning and reduce real-world testing cost.
Site Manager, U.S. R&D Center
  • Launched Li Auto's overseas research hub, covering local strategy, budgeting, and talent acquisition.
  • Bridge Silicon Valley innovation with Beijing execution through cross-border program reviews and roadmap alignment.

Baidu Apollo

Apr 2016 - Mar 2021

Beijing, China
Algorithm Lead, L4 Prediction & Planning
  • Led the L4 prediction pre-decision algorithms for robo-taxi pilots, improving motion forecasting in complex urban scenes.
  • Shipped planning-and-control modules and deep-learning onboard components for autonomous fleets in Beijing and Guangzhou.

学术成果

基于 Google Scholar 的论文与引用快照

论文数 49
总引用 1335
h-index 15
i10-index 21

Top 10 引用论文

按 Google Scholar 引用量排序,更新时间见卡片上方。

Google Scholar
Drivevlm: The convergence of autonomous driving and large vision-language models
Google Scholar 2024 引用 492

Drivevlm: The convergence of autonomous driving and large vision-language models

X Tian, J Gu, B Li, Y Liu, Y Wang, Z Zhao, K Zhan, P Jia, X Lang, H Zhao

Street gaussians: Modeling dynamic urban scenes with gaussian splatting
Google Scholar 2024 引用 361

Street gaussians: Modeling dynamic urban scenes with gaussian splatting

Y Yan, H Lin, C Zhou, W Wang, H Sun, K Zhan, X Lang, X Zhou, S Peng

Recondreamer: Crafting world models for driving scene reconstruction via online restoration
Google Scholar 2025 引用 71

Recondreamer: Crafting world models for driving scene reconstruction via online restoration

C Ni, G Zhao, X Wang, Z Zhu, W Qin, G Huang, C Liu, Y Chen, Y Wang, ...

Planagent: A multi-modal large language agent for closed-loop vehicle motion planning
Google Scholar 2024 引用 49

Planagent: A multi-modal large language agent for closed-loop vehicle motion planning

Y Zheng, Z Xing, Q Zhang, B Jin, P Li, Y Zheng, Z Xia, K Zhan, X Lang, ...

Unleashing generalization of end-to-end autonomous driving with controllable long video generation
Google Scholar 2024 引用 47

Unleashing generalization of end-to-end autonomous driving with controllable long video generation

E Ma, L Zhou, T Tang, Z Zhang, D Han, J Jiang, K Zhan, P Jia, X Lang, ...

Tod3cap: Towards 3d dense captioning in outdoor scenes
Google Scholar 2024 引用 40

Tod3cap: Towards 3d dense captioning in outdoor scenes

B Jin, Y Zheng, P Li, W Li, Y Zheng, S Hu, X Liu, J Zhu, Z Yan, H Sun, ...

Streetcrafter: Street view synthesis with controllable video diffusion models
Google Scholar 2025 引用 37

Streetcrafter: Street view synthesis with controllable video diffusion models

Y Yan, Z Xu, H Lin, H Jin, H Guo, Y Wang, K Zhan, X Lang, H Bao, X Zhou, ...

Dive: Dit-based video generation with enhanced control
Google Scholar 2024 引用 28

Dive: Dit-based video generation with enhanced control

J Jiang, G Hong, L Zhou, E Ma, H Hu, X Zhou, J Xiang, F Liu, K Yu, H Sun, ...

Finetuning generative trajectory model with reinforcement learning from human feedback
Google Scholar 2025 引用 26

Finetuning generative trajectory model with reinforcement learning from human feedback

D Li, J Ren, Y Wang, X Wen, P Li, L Xu, K Zhan, Z Xia, P Jia, X Lang, N Xu, ...

Drivingsphere: Building a high-fidelity 4d world for closed-loop simulation
Google Scholar 2025 引用 23

Drivingsphere: Building a high-fidelity 4d world for closed-loop simulation

T Yan, D Wu, W Han, J Jiang, X Zhou, K Zhan, C Xu, J Shen

Patents & Service

Research service and technology transfer beyond the production stack.

Patents

20 granted or issued patents: 18 CN and 2 US across perception, planning, and HD mapping pipelines.

Reviewer

CVPR, ICCV, ECCV, NeurIPS, AAAI, IROS, and journals including TPAMI, T-ITS, and T-IV.

Community

Organizer of the CVPR 2023 Autonomous Driving Workshop and frequent speaker on VLA deployment in production.