Vision-Language Models | Field

AI researcher at DeepSeek working on natural language processing, code intelligence, and large language model reasoning.

Radu Soricut is a Distinguished Scientist at Google DeepMind working on natural language processing and machine learning, with earlier Google Research and Google Translate work.

Haoyu Lu is a Ph.D. student at Renmin University of China working on multimodal foundation models and video understanding. His homepage highlights papers and code including DeepSeek-VL, UniAdapter, and VDT.

Wanli Ouyang is a professor at Shanghai AI Laboratory. His homepage says he is also with MMlab and the SIGMA lab, obtained a PhD from the Chinese University of Hong Kong, and works on AI4Science, computer vision, and pattern recognition.

CEO of Sand AI. His homepage describes prior work leading multimodal and vision research at BAAI and serving as a senior researcher at Microsoft Research Asia.

Jifeng Dai is a tenured associate professor in electronic engineering at Tsinghua University and founder of Fundamental Vision. His research spans computer vision, deep learning, multimodal learning, and autonomous driving. He previously worked at Microsoft Research Asia and SenseTime Research, and he received both his bachelor's and PhD degrees from Tsinghua University.

Junyang Lin (Justin Lin) is a researcher and open-source maintainer known for the Qwen family of models. His public profiles list interests in LLMs, AI agents, multimodal learning, long-horizon reasoning, world models, and reinforcement learning; multiple March 2026 news reports said he stepped down from the Qwen tech lead role.

Jifeng Dai is a tenured associate professor in the Department of Electronic Engineering at Tsinghua University. His homepage says his current research focuses on agentic AI and continual learning, and lists prior roles at Shanghai AI Lab, SenseTime Research, and Microsoft Research Asia.

Xingcheng Yao is a research scientist at Moonshot AI. His public profile notes prior work as a research engineer at Tencent AI Lab, a PhD in computer science from the University of Southern California, and research interests spanning NLP, multimodal systems, and AI agents.

Xinyi Chen is a PhD candidate in computer science at Princeton University and concurrently a research scientist at Google DeepMind. Her public homepage says she works at the intersection of machine learning, optimization, and dynamical systems, focusing on robust and efficient methods for sequential decision-making and control, and that she previously completed undergraduate studies in mathematics at Princeton.

Researcher at NVIDIA Research. Previously a PhD student in Computer Science and Engineering at HKUST, with earlier internships at International Digital Economy Academy and Microsoft Research.

Luke Zettlemoyer works on empirical methods for natural language semantics, machine learning, new tasks and datasets, and self-supervision for pre-training.

Pengyu Cheng is a researcher at Alibaba Group leading reinforcement-learning training for the Qwen large-model application team. His homepage also lists prior work with Moonshot AI and Tencent's Hunyuan large-model team.

Hao Yang works on multimodal data infrastructure at Moonshot.ai. He previously worked at ByteDance ICVG and Microsoft Research Asia, and received BS and PhD degrees from Tsinghua University.

Researcher at OpenAI whose homepage highlights work on document understanding, coding agents, and computer-use agents.

Researcher at DeepSeek whose public homepage describes work on DeepSeek R1, V1, V2, V3, Math, Coder, and mixture-of-experts systems.

Jiahui Yu is a Research Lead at OpenAI leading the Perception team. His homepage notes prior co-leadership on Gemini Multimodal at Google DeepMind and work on deep learning and high-performance computing.

Senior algorithm expert at Alibaba Group working on large language models, multimodal large language models, and diffusion models.

Jingren Zhou is Chief Technology Officer of Alibaba Cloud. Public speaker biographies describe him as a computer scientist and entrepreneur whose work includes large-scale AI and cloud systems.

Jian Yang is an Associate Professor at Beihang University whose research focuses on code intelligence, large language models, and AI agents. He worked with Alibaba Qwen from 2023 to July 2025.

Researcher at DeepSeek AI working on decision-making and post-training for large language models.

Research scientist in Tongyi Lab whose public homepage and OpenReview profile describe work on large language models, multimodal learning, and visual grounding. His public profiles also list affiliations with Alibaba Group and East China Normal University.

PhD student at The Hong Kong University of Science and Technology (Guangzhou) whose research interests include large language models, vision-language models, AI agents, and multimodal retrieval.

Yulun Du is a Moonshot AI-affiliated researcher. Public profiles also show prior work and study at Carnegie Mellon University, including a Master of Language Technologies completed in 2020.

Siyuan Li is a research scientist at NVIDIA working on large language models, multimodal foundation models, and reinforcement learning. His homepage says he received a PhD in computer science from the University of Toronto in 2024 and previously worked at Meta AI, Microsoft Research, and Mila.

Researcher and engineer focused on reinforcement learning and embodied intelligence; his public profile lists work spanning Huawei Noah's Ark Lab, Momenta, Moonshot AI, and XVI Robotics, and he is credited on Moonshot AI technical reports.

Research scientist at Moonshot AI working on foundation models, multimodal large language models, and agents; previously worked at Huawei Noah's Ark Lab and studied at the Chinese University of Hong Kong.

Research scientist at Alibaba working on multimodal learning and generation; previously a postdoctoral researcher at Carnegie Mellon University.

Yixiao Ge is a Research Scientist at Shanghai AI Laboratory and OpenGVLab. His work focuses on multimodal large language models, computer vision, efficient deep learning, and vision-language understanding.

Research scientist at Moonshot AI focused on large multimodal models and large language model post-training.

Researcher focused on AGI, multimodal models, and reasoning. Coauthor of Janus and JanusFlow.

Dongliang Wang is a research scientist at Moonshot AI whose public profiles highlight multimodal large language models. His homepage also notes earlier PhD work at Shanghai AI Lab and Shanghai Jiao Tong University.

Huabin Zheng is a research scientist at Moonshot AI. His homepage says he works on large language models, multi-agent systems, code generation, and game agents.

Jun Tang works on multimodal foundation models, open-source language models, and agent systems. His personal site highlights work on Qwen and Qwen3-VL alongside related multimodal research.

PhD student at Tsinghua University focusing on multimodal large language models, reasoning, and reinforcement learning.

Researcher focused on large language models and multimodal learning, with public profiles linking Keqin Chen to Beihang University and to Qwen vision-language model work.

Assistant Professor of Computer Science at the University of Hong Kong and director of XLANG Lab, focusing on natural language processing and embodied AI agents.

PhD student at Tsinghua University focusing on LLM reasoning, RLHF, and multimodal large language models; research intern at DeepSeek.

Qwen researcher and author on the Qwen2-VL and Qwen2.5-VL technical reports, with public profiles linking his work to multimodal and vision-language systems.

Research scientist at Google DeepMind working on multimodal generative models, visual generation, and image editing; previously completed a PhD at TU Munich.

Jiahao Liu works on multimodal large language models, reasoning systems, and continual learning. His public profiles connect him to the Qwen2.5-VL technical report and related open research work.

Researcher at the Allen Institute for AI (Ai2) working on vision-language and multimodal AI, with a focus on reliable reasoning and understanding beyond text.

Xi Zhang works on multimodal and vision-language model research. Public profiles connect him to Qwen2-VL and related open research projects.

Noah A. Smith is a computer scientist and professor at the University of Washington, where he serves as Vice Provost for Artificial Intelligence and co-directs the OLMo open language modeling effort with Ai2. His research focuses on natural language processing, machine learning, and evaluation methodology.

Researcher at Salesforce AI Research and coauthor of the xLAM-2 Technical Report.

Mingkun Yang works on multimodal large language models, embodied AI, and robotics. His public profile says he is a postdoc at Zhejiang University and a research scientist at Qwen.

Research Scientist at Moonshot AI whose public work focuses on large language models, multimodal models, and embodied AI; he previously earned a PhD from Zhejiang University and was a visiting student at Oxford.

Research scientist and writer behind Scientific Spaces whose public profile lists work on large language models and service on the Kimi team at Moonshot AI.

Research scientist in Alibaba DAMO Academy's Tongyi Lab working on multimodal learning, vision-language models, and embodied AI; author on the Qwen2-VL and Qwen2.5-VL technical reports.

Lucas Beyer is an ML researcher at Google DeepMind in Zurich. His public homepage highlights prior work at Google Brain and a PhD at ETH Zurich.

Maarten Sap is an assistant professor at the University of Washington and a senior research scientist at the Allen Institute for AI. His work focuses on human-centered language technologies and social NLP.

Research scientist at Google DeepMind on the Gemini team, working on multimodal AI.

Researcher at DeepSeek and a first-year computer science PhD student at the University of Science and Technology of China; works on multimodal reasoning and world models; coauthor of Janus.

Researcher working on multimodal learning and vision-language systems, with public academic work on visual question answering and related topics.

Research scientist at Moonshot AI focused on multimodal large language models.

Algorithm expert at Alibaba Group working on computer vision, multimodal learning, and large language models.

Research scientist at DeepSeek and PhD student at the University of Illinois Urbana-Champaign working on multimodal foundation models, large language models, and embodied AI.

Associate research scientist at Moonshot AI based in Beijing, China; previously worked as a postdoctoral researcher.

Zhibo Yang works on multimodal and vision-language systems. Public profiles connect him to the Qwen2.5-VL technical report and to an individual GitHub account that links back to his personal site.

Machine learning researcher at Moonshot AI and incoming assistant professor at Shanghai Jiao Tong University.

DeepSeek report author whose DBLP record includes DeepSeek LLM, DeepSeekMath, DeepSeek-Coder-V2, DeepSeek-V3, DeepSeek-R1, Janus, and JanusFlow work.

DeepSeek report author whose DBLP-linked publication record includes DeepSeek LLM, DeepSeek-Coder-V2, Janus, DeepSeek-V3, and DeepSeek-R1 work.

DeepSeek team member and co-author of the DeepSeek-V3, DeepSeek-V2, and DeepSeek LLM technical reports.

Alibaba Qwen report author whose DBLP profile identifies an Alibaba Group affiliation and Qwen technical report authorship.

Alibaba Qwen report author whose DBLP record includes Qwen2.5-VL and Qwen technical report work on multimodal and large language models.

Google researcher whose official profile says he joined Google in September 2008 and has been with Google Brain since January 2015, with research interests spanning information retrieval, machine learning, machine translation, and natural language processing.

Dieter Schwarz Foundation Professor and Senior Fellow in Stanford Computer Science and HAI. Her public homepage notes previous roles as professor at the University of Washington and senior director at Ai2.

Leonardo Beyer is a research scientist at Google DeepMind. His public homepage highlights work across representation learning, multimodal models, and large-scale machine learning systems.

Multimodal and omni-model engineer whose public profile lists Moonshot AI experience and Kimi-VL among recent projects.

Researcher at the University of Illinois Urbana-Champaign focused on vision-language models, multimodal large language models, and physical AI.

Research intern at DeepSeek and PhD student at Princeton University whose research interests include large language models and multimodal foundation models.

Research intern at DeepSeek and master's student at Renmin University of China working on multimodal large language models and AI agents.

Tongyi Lab researcher working on large language models, vision-language models, and reinforcement learning; public profiles connect Zheren Fu to the Qwen2-VL technical report.

CEO of the Allen Institute for AI and professor of computer science at the University of Washington. His work spans computer vision, multimodal learning, reasoning, and embodied AI.

Alibaba researcher working on large language models and multimodal pretraining; public research profiles connect An Yang to Qwen-related work and earlier study at Peking University.

Researcher on Alibaba's Qwen team focused on large language models and NLP, with public research profiles listing a Nankai University background.

Chief Technology Officer at Google DeepMind, with work spanning machine learning and reinforcement learning.

Xinlong Wang is a researcher working across computer vision, embodied AI, robotics, and machine learning. Public profiles link him to OpenGVLab and Shanghai AI Laboratory, and he is a coauthor of DeepSeek-VL2.

Technical staff member at Moonshot AI whose public profile highlights work on web and app agents, multimodal systems, reinforcement learning, and LLMs.

Dikang Du is a research scientist at Moonshot AI. His homepage says he received a Ph.D. from Cornell University and works on natural language processing, machine learning, and multimodal learning.

Technical staff member at Moonshot AI working on general AI agents, reinforcement learning, and multimodal foundation models.

PhD student in computer science at the University of Hong Kong working in vision and machine intelligence.

Researcher in computer vision and multimodal learning. Public profile lists PhD study in computer science and engineering at HKUST under Qifeng Chen.

Christopher Clark is a researcher working on language models, efficient inference, and trustworthy NLP systems. His public profile highlights work at the intersection of NLP, efficiency, and model evaluation.

Research scientist on the Qwen team at Alibaba Group, focusing on foundation models and language agents. He received a PhD in computer science from the University of Illinois Urbana-Champaign.

Wenfeng Liang, also known as Liang Wenfeng, is linked to DeepSeek technical reports in LLMpeople and is identified in public references as the founder and CEO of DeepSeek.

Alibaba Qwen report author listed on Qwen, Qwen2.5, Qwen2.5-1M, Qwen3, Qwen3 Embedding, QwQ-32B, and Qwen-VL reports, with report-backed work on large language models, embeddings, reranking, and multimodal models.

Zeyu Cui is listed as an author of the Qwen technical report Qwen3 Technical Report.

Alibaba Qwen report author listed on Qwen, Qwen2.5, Qwen3, Qwen-VL, and Qwen-Image technical reports, with report-backed work on large language models, vision-language models, and image generation.

Research scientist in Tongyi Lab whose official profile highlights post-training and multimodal large language models.

Antonio Torralba is the Delta Electronics Professor in the EECS Department at MIT and a member of CSAIL whose research focuses on computer vision, visual learning, and scene understanding.

PhD student in CSLT at Tsinghua University working on large language models, multimodal large language models, and speech-language models; publication context connects Jinbo Zhao to the Qwen2.5-VL technical report.

Wenhai Wang is a researcher working on visual perception foundation models, efficient learning, and multimodal large models. Public profiles list him with OpenGVLab and Shanghai AI Laboratory, and he is a coauthor of DeepSeek-VL2.

Yuanzhi Zhu is a Qwen researcher whose public work includes multimodal and audio-language models.

Research scientist at the Allen Institute for AI (Ai2) whose work focuses on natural language understanding and commonsense reasoning.

Research scientist on the AllenNLP team at the Allen Institute for AI, focused on post-training language models.

PhD student at the University of Hong Kong who worked as a research intern at Moonshot AI in 2025 and studies digital agents, computer-use agents, and multimodal intelligence.

Research scientist at Moonshot AI with public profiles covering large language models, diffusion models, and generative AI.

Computer science graduate from the University of Hong Kong who worked as a research intern at Moonshot AI on general-purpose computer-use agents.

Researcher at Moonshot AI with public homepage and GitHub profiles under the name Xixia Zhong.

Technical staff at Moonshot AI working on large language model reasoning, agents, and multimodal large models.

Research scientist at Google DeepMind based in Paris, focused on deep learning and computer vision.

Research scientist at Moonshot AI with public GitHub and Google Scholar profiles covering efficient inference and multimodal systems.

Research intern at DeepSeek and master's student at Tsinghua University working on large language models, multimodal models, and reinforcement learning.

Xiaohua Zhai is a researcher on the Google Research team in Zurich whose work focuses on large multimodal models and efficient deep learning.

AI researcher at Moonshot AI with a public homepage and Google Scholar profile spanning robust AI, computer vision, and multimodal systems.

Moonshot AI researcher working on large language models, coding agents, and multimodal safety; his public homepage also documents earlier study at Shanghai Jiao Tong University and Huazhong University of Science and Technology.

Researcher at Moonshot AI with a personal homepage and GitHub profile covering machine learning research.

Researcher at Moonshot AI focused on large language models, computational photography, and low-level computer vision; previously worked at Megvii and completed a PhD and postdoc at Tsinghua University.

Research scientist at Moonshot AI working on multimodal AI agents, large multimodal models, video generation, speech, machine learning systems, and AI for science.

PhD student in Computer Science and Technology at Tsinghua University with public research interests in machine learning, natural language processing, and large language models.

DeepSeek report author listed on DeepSeek-VL2, with report-backed work on mixture-of-experts vision-language models and multimodal understanding.

Machine learning researcher with a public homepage and GitHub profile covering AI research and engineering projects.

Research scientist at Shanghai AI Laboratory working on NLP and multimodal AI, and a co-author of InternLM-XComposer2.5.

PhD student at The Chinese University of Hong Kong focused on multimodal reasoning, optical character recognition, and document parsing; coauthor of DeepSeek-VL.

Research scientist at the Allen Institute for AI working on multimodal large language models, embodied agents, and reasoning for robots and games.

Matt Deitke is a researcher at Ai2 whose public homepage and Google Scholar profile highlight work on multimodal learning, vision-language models, embodied AI, and open models.

Molly S. Lewis is an Assistant Professor of Psychology at Princeton University whose research examines how language is shaped by social and cultural structure.

Professor at the Technion and head of the CHIA Lab, with research spanning natural language processing, machine learning, and social-good applications.

Xiuye Gu is a researcher whose public work focuses on vision-language modeling and machine learning systems.

Research scientist at Kimi AI (Moonshot AI). Previously completed a PhD in computer science at the University of Wisconsin-Madison.

Yao Lu is listed as an author of the Google technical report Gemini Robotics: Bringing AI into the Physical World.

Publicly available Moonshot AI technical reports list Zheng Zhang as a coauthor on Kimi-VL and Kimi K2. The surviving public evidence supports research authorship on language and multimodal systems, not a separately verified individual employer profile.

Qwen researcher and co-lead whose work focuses on pretraining and post-training, multimodal models, agent systems, and large-scale model infrastructure.

Senior research scientist in Tongyi Lab whose official profile highlights post-training, AI for science, evaluation and alignment, multimodal reasoning, and large language model reasoning.

Maxwell Collins is a Research Scientist at Google DeepMind.

Researcher at Shanghai AI Laboratory and coauthor of the InternLM2 Technical Report.

First-year PhD student at Shanghai Jiao Tong University focused on multimodal large language models, text-to-image generation, and image/video generation; coauthor of DeepSeek-VL2.

Yonggang Zhang is a researcher whose public OpenReview profile includes the DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding paper.

Researcher at Shanghai AI Laboratory and coauthor of the InternLM2 Technical Report.

Researcher at Salesforce AI Research and coauthor of the XGen-7B Technical Report.

Shan Lu is listed as an author of the DeepSeek technical report JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation.

Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.

Xinxing Zu is listed as an author of the Moonshot AI technical report Kimi K2.5: Visual Agentic Intelligence.

Yafei Wen is a MiniMax report-backed author on MiniMax-Text-01, a MiniMax technical report in the LLMpeople catalog.

Researcher at Salesforce AI Research and coauthor of the XGen-7B Technical Report.

Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.

Researcher at Shanghai AI Laboratory and coauthor of the InternLM2 Technical Report.

Researcher at Shanghai AI Laboratory and coauthor of the InternLM-XComposer2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.

Researcher at Salesforce AI Research and coauthor of the XGen-MM (BLIP-3): A Family of Open Large Multimodal Models.