Audio Language Models | Field

Chao Zhang is an applied scientist in the Alibaba Foundation Model team. His public profile notes a PhD in computer science from the University of Illinois Urbana-Champaign and research interests in NLP, large language models, reasoning, and multimodal generation.

Zhengyuan Liu is a research scientist at Alibaba Group and a PhD student at the National University of Singapore. His public profile highlights work in natural language processing, vision-language models, and grounding.

Tianyu Liu is a researcher at Kimi working on coding and agents. He previously worked on Qwen at Alibaba and was a founding member of Tencent Hunyuan, and he earned a PhD in natural language processing from Peking University.

Zhen Ye is a researcher in the Qwen team at Alibaba Cloud. His public profile notes a PhD in computer science from the University of Massachusetts Amherst and research interests in natural language understanding, generation, and reasoning.

Jingren Zhou is Chief Technology Officer of Alibaba Cloud. Public speaker biographies describe him as a computer scientist and entrepreneur whose work includes large-scale AI and cloud systems.

Vice President of JD.COM and Deputy Director of JD Future Academy, leading foundation model research across language, audio, vision, and embodied AI; previously Technical Fellow at StepFun and Senior Principal Researcher at Microsoft Research Asia.

Research scientist at Alibaba working on speech processing, multimodal learning, natural language processing, and efficient human-computer interaction.

Jiaqi Wang works on machine learning, multimodal large language models, and AI for healthcare. Public profiles connect him to the Qwen2-Audio technical report.

Shen Gao is a PhD student at Zhejiang University working on multimedia and large language models. His public profiles connect him to Qwen2-Audio and related multimodal systems including OmniParser.

Weiqiang Wang is a PhD student working on multimedia and multimodal AI. Public profiles connect him to the Qwen2-Audio technical report and related research.

Yeyun Gong is a researcher and engineering leader focused on multimodal large language models, grounding, and large-scale knowledge systems. His homepage lists selected work including Qwen2-Audio.

Alibaba researcher working on large language models and multimodal pretraining; public research profiles connect An Yang to Qwen-related work and earlier study at Peking University.

Zhifeng Chen's public homepage describes him as a distinguished software engineer at Google Brain focused on large-scale computer systems and machine learning applications.

Computer scientist and engineer credited on OpenAI's GPT-4 public contributions page; OpenAI's 2016 team update says he previously led Dropbox's core file sync team after earlier work in Pieter Abbeel's Berkeley robotics lab.

Associate professor at the University of Virginia and Qwen contributor whose research focuses on personalization and recommender systems, online advertising, and AI systems.

Xian-Sheng Hua is a computer vision and multimodal AI researcher known for work in visual recognition, multimedia understanding, and large AI systems. Public profiles tie him to Alibaba DAMO Academy and related academic service roles.

Xiaoyong Du works on multimodal large language models and language agents, with public profile text highlighting omni models, visual agents, and GUI agents. His homepage explicitly identifies him with Qwen.

Yuanzhi Zhu is a Qwen researcher whose public work includes multimodal and audio-language models.

Senior research scientist in Tongyi Lab whose official profile highlights post-training, AI for science, evaluation and alignment, multimodal reasoning, and large language model reasoning.

Researcher at Stepfun and coauthor of the Step-Audio 2: Cascaded Multimodal Large Language Models with Versatile Speech Capabilities.

Research scientist in Tongyi Lab whose public profile highlights work on speech processing, machine learning, and multimodal large language models.

Research intern at Alibaba Group focused on multimodal understanding and generation, large multimodal models, and reinforcement learning; coauthor of Qwen2-Audio.

Second-year PhD student at Peking University focused on audio-language foundation models, trustworthy AI, and embodied AI; coauthor of Qwen2-Audio.

Research scientist in Tongyi Lab and technical lead of Qwen2-Audio, with public work on audio-language models.

Machine learning engineer and researcher interested in large language models and multimodal audio-language systems; coauthor of Qwen2-Audio.

Yushi Hu is a senior research engineer at Shanghai AI Laboratory and a founding member of OpenMMLab. Public arXiv records also list him as a coauthor of Qwen2-Audio.

Public report authorship links Dongping Wei to the GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbots at Z.ai.

Public report authorship links Jingyun Yang to the GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbots at Z.ai.

Public report authorship links Peiyi Wang to the GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbots at Z.ai.

Researcher whose arXiv author results include Qwen-Audio and related audio-language modeling work.

Research assistant at CUHK-Shenzhen focused on multimodal learning, efficient adaptation, alignment, and reinforcement learning; coauthor of Qwen2-Audio.

Research scientist at Z.ai focused on multimodal large language models, speech interaction, and large language models. He received a bachelor's degree from Tsinghua University and a master's degree from Columbia University.

Research scientist at Z.ai focused on multimodal large language models, speech interaction, and large language models. Her work includes pre-training, post-training, and evaluation of multimodal and speech models.

Research scientist at Z.ai focused on multimodal large language models, speech interaction, and large language models. She works on pre-training, post-training, and evaluation of multimodal and speech models.

Research scientist at Z.ai focused on multimodal large language models, speech interaction, and large language models. He received a bachelor's degree from Shanghai Jiao Tong University and a master's degree from Columbia University.

Research scientist at Z.ai focused on multimodal understanding and generation, large language models, and speech interaction. He received a bachelor's degree from Tsinghua University and a master's degree from Columbia University.

Researcher at StepFun AI working on speech, language, and multimodal learning, including Step-Audio 2.

Research scientist at Z.ai focused on multimodal large language models, speech interaction, and large language models. His work includes pre-training, post-training, and evaluation of multimodal and speech models.

Research scientist at Z.ai focused on multimodal understanding and generation, large language models, and speech interaction. He received a bachelor's degree from Tsinghua University and a master's degree from the University of California, San Diego.

PhD student at The Chinese University of Hong Kong focused on speech language understanding, audio-language multimodal learning, and efficient model adaptation; coauthor of Qwen2-Audio.

Chao Zhang

Zhengyuan Liu

Tianyu Liu

Zhen Ye

Jingren Zhou

Nan Duan

Yongqiang Wang

Jiaqi Wang

Shen Gao

Weiqiang Wang

Yeyun Gong

An Yang

Zhifeng Chen

Jie Tang

Hongning Wang

Xian-Sheng Hua

Xiaoyong Du

Yuanzhi Zhu

Shijie Wang

Can Cui

Huan Yang

Hui Yu

Jiale Zhuang

Ruiqi Song

Siyao Wang

Xiao Ma

Yu Guo

Zeyi Yan

Yongqi Wang

Mingyang Shang

Qingyang Zhang

Yaqi Wang

Yinghao Li

Yushi Hu

Dongping Wei

Jingyun Yang

Peiyi Wang

Hongyin Luo

Mengzhe Chen

Mingjie Li

Na Cao

Shuang Ma

Yi Ma

Yimin Wang

Yizhou Zou

Yujie He

Zehan Wang

Zejun Ma