Qwen2-Audio Technical Report

Computer scientist and engineer credited on OpenAI's GPT-4 public contributions page; OpenAI's 2016 team update says he previously led Dropbox's core file sync team after earlier work in Pieter Abbeel's Berkeley robotics lab.

Senior research scientist in Tongyi Lab whose official profile highlights post-training, AI for science, evaluation and alignment, multimodal reasoning, and large language model reasoning.

Alibaba researcher working on large language models and multimodal pretraining; public research profiles connect An Yang to Qwen-related work and earlier study at Peking University.

Jingren Zhou is Chief Technology Officer of Alibaba Cloud. Public speaker biographies describe him as a computer scientist and entrepreneur whose work includes large-scale AI and cloud systems.

Tianyu Liu is a researcher at Kimi working on coding and agents. He previously worked on Qwen at Alibaba and was a founding member of Tencent Hunyuan, and he earned a PhD in natural language processing from Peking University.

Yuanzhi Zhu is a Qwen researcher whose public work includes multimodal and audio-language models.

Second-year PhD student at Peking University focused on audio-language foundation models, trustworthy AI, and embodied AI; coauthor of Qwen2-Audio.

Zhen Ye is a researcher in the Qwen team at Alibaba Cloud. His public profile notes a PhD in computer science from the University of Massachusetts Amherst and research interests in natural language understanding, generation, and reasoning.

Yeyun Gong is a researcher and engineering leader focused on multimodal large language models, grounding, and large-scale knowledge systems. His homepage lists selected work including Qwen2-Audio.

Research scientist in Tongyi Lab whose public profile highlights work on speech processing, machine learning, and multimodal large language models.

Research intern at Alibaba Group focused on multimodal understanding and generation, large multimodal models, and reinforcement learning; coauthor of Qwen2-Audio.

Research scientist in Tongyi Lab and technical lead of Qwen2-Audio, with public work on audio-language models.

Shen Gao is a PhD student at Zhejiang University working on multimedia and large language models. His public profiles connect him to Qwen2-Audio and related multimodal systems including OmniParser.

Machine learning engineer and researcher interested in large language models and multimodal audio-language systems; coauthor of Qwen2-Audio.

Vice President of JD.COM and Deputy Director of JD Future Academy, leading foundation model research across language, audio, vision, and embodied AI; previously Technical Fellow at StepFun and Senior Principal Researcher at Microsoft Research Asia.

Weiqiang Wang is a PhD student working on multimedia and multimodal AI. Public profiles connect him to the Qwen2-Audio technical report and related research.

Research assistant at CUHK-Shenzhen focused on multimodal learning, efficient adaptation, alignment, and reinforcement learning; coauthor of Qwen2-Audio.

Zhengyuan Liu is a research scientist at Alibaba Group and a PhD student at the National University of Singapore. His public profile highlights work in natural language processing, vision-language models, and grounding.

Research scientist at Alibaba working on speech processing, multimodal learning, natural language processing, and efficient human-computer interaction.

Yushi Hu is a senior research engineer at Shanghai AI Laboratory and a founding member of OpenMMLab. Public arXiv records also list him as a coauthor of Qwen2-Audio.

Jiaqi Wang works on machine learning, multimodal large language models, and AI for healthcare. Public profiles connect him to the Qwen2-Audio technical report.

Associate professor at the University of Virginia and Qwen contributor whose research focuses on personalization and recommender systems, online advertising, and AI systems.

Chao Zhang is an applied scientist in the Alibaba Foundation Model team. His public profile notes a PhD in computer science from the University of Illinois Urbana-Champaign and research interests in NLP, large language models, reasoning, and multimodal generation.

PhD student at The Chinese University of Hong Kong focused on speech language understanding, audio-language multimodal learning, and efficient model adaptation; coauthor of Qwen2-Audio.

Canonical link

Jie Tang

Shijie Wang

An Yang

Jingren Zhou

Tianyu Liu

Yuanzhi Zhu

Qingyang Zhang

Zhen Ye

Yeyun Gong

Yongqi Wang

Mingyang Shang

Yaqi Wang

Shen Gao

Yinghao Li

Nan Duan

Weiqiang Wang

Mengzhe Chen

Zhengyuan Liu

Yongqiang Wang

Yushi Hu

Jiaqi Wang

Hongning Wang

Chao Zhang

Zejun Ma