Large Language Models190 people

GLM-5: Thinking, Coding, and Agentic Intelligence

Z.ai

Large Language Models · 2602.15763 · 2026-02-17

Large Language Models2602.15763
Code Language Models9 people

CWM: An Open-Weights LLM for Research on Code Generation with World Models

Meta AI

Code Language Models · 2509.12054 · 2025-09-24

Code Language Models2509.12054
Multimodal Language Models94 people

Apple Intelligence Foundation Language Models: Tech Report 2025

Apple

Multimodal Language Models · 2507.13575 · 2025-07-16

Multimodal Language Models2507.13575
Reasoning Models10 people

Magistral: Efficient Training of Small Language Models for Reasoning

Mistral AI

Reasoning Models · 2506.10910 · 2025-06-12

Reasoning Models2506.10910
Speech Language Models37 people

Amazon Nova Sonic Technical Report

Amazon

Speech Language Models · 2505.11298 · 2025-05-15

Speech Language Models2505.11298
Reasoning Models18 people

Phi-4-mini-reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Microsoft

Reasoning Models · 2504.21233 · 2025-04-29

Reasoning Models2504.21233
Reasoning Models8 people

Phi-4-reasoning Technical Report

Microsoft

Reasoning Models · 2504.21318 · 2025-04-29

Reasoning Models2504.21318
Reasoning Models11 people

Hunyuan-T1: Scaling Up Test-Time Compute with Open-Source Reinforcement Learning

Tencent Hunyuan

Reasoning Models · 2504.02234 · 2025-04-03

Reasoning Models2504.02234
Large Language Models157 people

Command A: An Enterprise-Ready Large Language Model

Cohere

Large Language Models · 2504.00698 · 2025-04-01

Large Language Models2504.00698
Large Language Models18 people

Mistral Small 3.1 Technical Report

Mistral AI

Large Language Models · 2503.23335 · 2025-03-31

Large Language Models2503.23335
Reasoning Models5 people

QwQ-32B: Embracing the Power of Reinforcement Learning

Qwen

Reasoning Models · 2503.20735 · 2025-03-27

Reasoning Models2503.20735
Multimodal Models10 people

Qwen2.5-Omni Technical Report

Qwen

Multimodal Models · 2503.20215 · 2025-03-23

Multimodal Models2503.20215
Language Models15 people

Phi-4 Technical Report

Microsoft

Language Models · 2503.01743 · 2025-03-03

Language Models2503.01743
Vision-Language Models27 people

Qwen2.5-VL Technical Report

Qwen

Vision-Language Models · 2502.13923 · 2025-02-19

Vision-Language Models2502.13923
Large Language Models11 people

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Moonshot AI

Large Language Models · 2501.12599 · 2025-01-21

Large Language Models2501.12599
Large Language Models39 people

2 OLMo 2 Furious

Ai2

Large Language Models · 2501.00656 · 2024-12-31

Large Language Models2501.00656
Vision-Language Models13 people

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

DeepSeek

Vision-Language Models · 2412.10302 · 2024-12-12

Vision-Language Models2412.10302
Multimodal Language Models35 people

NVLM: Open Frontier-Class Multimodal LLMs

NVIDIA

Multimodal Language Models · 2412.04468 · 2024-12-05

Multimodal Language Models2412.04468
Audio Language Models12 people

GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbots

Z.ai

Audio Language Models · 2412.02612 · 2024-12-04

Audio Language Models2412.02612
Large Language Models23 people

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Ai2

Large Language Models · 2411.15124 · 2024-11-22

Large Language Models2411.15124
Vision-Language Models22 people

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

DeepSeek

Vision-Language Models · 2411.07975 · 2024-11-11

Vision-Language Models2411.07975
Large Language Models108 people

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Tencent Hunyuan

Large Language Models · 2411.02265 · 2024-11-04

Large Language Models2411.02265
Vision-Language Models22 people

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

DeepSeek

Vision-Language Models · 2410.13848 · 2024-10-18

Vision-Language Models2410.13848
Vision-Language Models18 people

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

Ai2

Vision-Language Models · 2409.17146 · 2024-09-25

Vision-Language Models2409.17146
Vision-Language Models26 people

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Qwen

Vision-Language Models · 2409.12191 · 2024-09-18

Vision-Language Models2409.12191
Large Language Models17 people

OLMoE: Open Mixture-of-Experts Language Models

Ai2

Large Language Models · 2409.02060 · 2024-09-03

Large Language Models2409.02060
Mathematical Reasoning Models8 people

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

DeepSeek

Mathematical Reasoning Models · 2408.08152 · 2024-08-14

Mathematical Reasoning Models2408.08152
Multimodal Language Models149 people

Apple Intelligence Foundation Language Models

Apple

Multimodal Language Models · 2407.21075 · 2024-07-29

Multimodal Language Models2407.21075
Audio Language Models26 people

Qwen2-Audio Technical Report

Qwen

Audio Language Models · 2407.10759 · 2024-07-14

Audio Language Models2407.10759
Large Language Models19 people

Open Instruct: A Simple Method for Aligning Language Models with Human Preferences

Ai2

Large Language Models · 2406.18405 · 2024-06-26

Large Language Models2406.18405
Code Language Models10 people

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

DeepSeek

Code Language Models · 2406.11931 · 2024-06-17

Code Language Models2406.11931
Language Models7 people

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Microsoft

Language Models · 2404.14219 · 2024-04-22

Language Models2404.14219
Language Models8 people

Jamba: A Hybrid Transformer-Mamba Language Model

AI21 Labs

Language Models · 2403.19887 · 2024-03-28

Language Models2403.19887
Vision-Language Models5 people

DeepSeek-VL: Towards Real-World Vision-Language Understanding

DeepSeek

Vision-Language Models · 2403.05525 · 2024-03-08

Vision-Language Models2403.05525
Large Language Models19 people

Nemotron-4 15B Technical Report

NVIDIA

Large Language Models · 2402.16819 · 2024-02-26

Large Language Models2402.16819
Alignment and Safety17 people

Many-shot Jailbreaking

Anthropic

Alignment and Safety · 2402.03206 · 2024-02-12

Alignment and Safety2402.03206
Speech Language Models14 people

SPIrit-LM: Interleaved Spoken and Written Language Model

Meta AI

Speech Language Models · 2402.05755 · 2024-02-09

Speech Language Models2402.05755
Mathematical Reasoning Models8 people

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

DeepSeek

Mathematical Reasoning Models · 2402.03300 · 2024-02-06

Mathematical Reasoning Models2402.03300
Large Language Models14 people

Mixtral of Experts

Mistral AI

Large Language Models · 2401.04088 · 2024-01-08

Large Language Models2401.04088
Large Language Models26 people

Tulu 2: Demystifying the Effectiveness of RLHF and Reinforcement Learning with Human Feedback

Ai2

Large Language Models · 2311.10702 · 2023-11-17

Large Language Models2311.10702
Audio Language Models8 people

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Qwen

Audio Language Models · 2311.07919 · 2023-11-13

Audio Language Models2311.07919
Large Language Models18 people

Mistral 7B

Mistral AI

Large Language Models · 2310.06825 · 2023-10-10

Large Language Models2310.06825
Alignment and RLHF25 people

Collective Constitutional AI: Aligning a Language Model with Public Input

Anthropic

Alignment and RLHF · 2310.01835 · 2023-10-03

Alignment and RLHF2310.01835
Vision-Language Models19 people

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Qwen

Vision-Language Models · 2308.12966 · 2023-08-24

Vision-Language Models2308.12966
Code Language Models7 people

Code Llama: Open Foundation Models for Code

Meta AI

Code Language Models · 2308.12950 · 2023-08-24

Code Language Models2308.12950
Large Language Models14 people

LLaMA: Open and Efficient Foundation Language Models

Meta AI

Large Language Models · 2302.13971 · 2023-02-27

Large Language Models2302.13971
Alignment and RLHF46 people

Constitutional AI: Harmlessness from AI Feedback

Anthropic

Alignment and RLHF · 2212.08073 · 2022-12-15

Alignment and RLHF2212.08073
Alignment and RLHF31 people

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Anthropic

Alignment and RLHF · 2204.05862 · 2022-04-12

Alignment and RLHF2204.05862
Alignment11 people

Training language models to follow instructions with human feedback

OpenAI

Alignment · 2203.02155 · 2022-03-04

Alignment2203.02155
Large Language Models31 people

Language Models are Few-Shot Learners

OpenAI

Large Language Models · 2005.14165 · 2020-05-28

Large Language Models2005.14165
Large Language Models19 people

MiniMax-Text-01

MiniMax

Large Language Models · 2501.08338

Large Language Models2501.08338
Vision-Language Models18 people

MiniMax-VL-01

MiniMax

Vision-Language Models · 2501.08336

Vision-Language Models2501.08336
Interpretability30 people

Tracing the thoughts of a large language model

Anthropic

Interpretability · 2503.21435

Interpretability2503.21435
Interpretability13 people

On the Biology of a Large Language Model

Anthropic

Interpretability · 2504.19173

Interpretability2504.19173
Alignment and Safety28 people

Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

Anthropic

Alignment and Safety · 2601.04603

Alignment and Safety2601.04603
Alignment and Safety13 people

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

Anthropic

Alignment and Safety · 2501.18837

Alignment and Safety2501.18837
Speech Language Models100 people

Voxtral Technical Report

Mistral AI

Speech Language Models · 2507.13264

Speech Language Models2507.13264
Reasoning Models13 people

Nemotron-CrossThink: Efficient Knowledge Distillation of Long Chain-of-Thought Reasoning

NVIDIA

Reasoning Models · 2504.13941

Reasoning Models2504.13941
Reasoning Models11 people

Nemotron 3 Super: Open, efficient mixture-of-experts hybrid mamba-transformer model for agentic reasoning

NVIDIA

Reasoning Models · 2601.11868

Reasoning Models2601.11868
Large Language Models6 people

NVIDIA Nemotron 3: Efficient and Open Intelligence

NVIDIA

Large Language Models · 2512.20856

Large Language Models2512.20856
Reasoning Models11 people

Nemotron 3 nano: Open, efficient mixture-of-experts hybrid mamba-transformer model for agentic reasoning

NVIDIA

Reasoning Models · 2512.20848

Reasoning Models2512.20848
Reasoning Models9 people

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

NVIDIA

Reasoning Models · 2508.14444

Reasoning Models2508.14444
Large Language Models13 people

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

NVIDIA

Large Language Models · 2504.03624

Large Language Models2504.03624
Mathematical Reasoning Models6 people

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning and Monte-Carlo Tree Search with Proof Assistant Feedback

DeepSeek

Mathematical Reasoning Models · 2508.03613

Mathematical Reasoning Models2508.03613
Language Models8 people

Large Concept Models: Language Modeling in a Sentence Representation Space

Meta AI

Language Models · 2502.06018

Language Models2502.06018
Multimodal Models16 people

Qwen3-Omni Technical Report

Qwen

Multimodal Models · 2509.17765

Multimodal Models2509.17765
Speech Language Models10 people

MiniMax-Speech: Intrinsic Zero-Shot Speech Understanding for Advanced Foundation Models

MiniMax

Speech Language Models · 2505.07916

Speech Language Models2505.07916
Multimodal Agent Models14 people

Magma: A Foundation Model for Multimodal AI Agents

Microsoft

Multimodal Agent Models · 2502.13130

Multimodal Agent Models2502.13130
Language Models96 people

GLM-4.5: Agentic, Reasoning, and Coding Foundation Models

Z.ai

Language Models · 2508.06471

Language Models2508.06471
Reasoning Models10 people

GLM-Z1-Rumination: An Open Frontier-Class Reasoning Model Through Test-Time Scaling

Z.ai

Reasoning Models · 2506.17434

Reasoning Models2506.17434
Alignment and Safety13 people

Auditing language models for hidden objectives

Anthropic

Alignment and Safety · 2507.11473

Alignment and Safety2507.11473
Alignment and Safety20 people

Alignment faking in large language models

Anthropic

Alignment and Safety · 2412.14093

Alignment and Safety2412.14093
Alignment and Safety39 people

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Anthropic

Alignment and Safety · 2401.05566

Alignment and Safety2401.05566
Large Language Models61 people

Amazon Nova Premier Technical Report

Amazon

Large Language Models · 2504.01081

Large Language Models2504.01081
Multimodal Language Models14 people

Aya Vision: Advancing the Frontier of Multilingual Multimodality

Cohere

Multimodal Language Models · 2410.14756

Multimodal Language Models2410.14756
Multimodal Language Models23 people

MM1.5: Methods, Analysis and Insights from Multimodal LLM Fine-tuning

Apple

Multimodal Language Models · 2409.20566

Multimodal Language Models2409.20566
Multimodal Language Models31 people

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Apple

Multimodal Language Models · 2403.09611

Multimodal Language Models2403.09611
Reasoning Models16 people

OpenAI o3 and o4-mini System Card

OpenAI

Reasoning Models · 2504.21798

Reasoning Models2504.21798
Reasoning Models18 people

OpenAI o1 System Card

OpenAI

Reasoning Models · 2412.16720

Reasoning Models2412.16720
Large Language Models165 people

Nemotron-4 340B Technical Report

NVIDIA

Large Language Models · 2406.11704

Large Language Models2406.11704
Language Models12 people

Jamba 1.5 Technical Report

AI21 Labs

Language Models · 2508.15167

Language Models2508.15167
Code Language Models8 people

Qwen2.5-Coder Technical Report

Qwen

Code Language Models · 2409.12186

Code Language Models2409.12186
Large Language Models33 people

OLMo: Accelerating the Science of Language Models

Ai2

Large Language Models · 2402.00838

Large Language Models2402.00838
Multimodal Large Language Models21 people

Pixtral 12B

Mistral AI

Multimodal Large Language Models · 2410.17897

Multimodal Large Language Models2410.17897
Multimodal Large Language Models13 people

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

DeepSeek

Multimodal Large Language Models · 2501.17811

Multimodal Large Language Models2501.17811
Reasoning Large Language Models144 people

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

MiniMax

Reasoning Large Language Models · 2506.13585

Reasoning Large Language Models2506.13585
Multimodal Large Language Models26 people

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Meta AI

Multimodal Large Language Models · 2405.09818

Multimodal Large Language Models2405.09818
Large Language Models10 people

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek

Large Language Models · 2501.12948

Large Language Models2501.12948
Multimodal Models12 people

GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Z.ai

Multimodal Models · 2507.01006

Multimodal Models2507.01006
Large Language Models38 people

MiniMax-01: Scaling Foundation Models with Lightning Attention

MiniMax

Large Language Models · 2501.08313

Large Language Models2501.08313
Large Language Models69 people

The Llama 3 Herd of Models

Meta AI

Large Language Models · 2407.21783

Large Language Models2407.21783
Large Language Models23 people

Llama 2: Open Foundation and Fine-Tuned Chat Models

Meta AI

Large Language Models · 2307.09288

Large Language Models2307.09288
Multimodal Agentic Models324 people

Kimi K2.5: Visual Agentic Intelligence

Moonshot AI

Multimodal Agentic Models · 2602.02276

Multimodal Agentic Models2602.02276
Vision-Language Models94 people

Kimi-VL Technical Report

Moonshot AI

Vision-Language Models · 2504.07491

Vision-Language Models2504.07491
Large Language Models86 people

DeepSeek LLM Technical Report

DeepSeek

Large Language Models · 2401.02954

Large Language Models2401.02954
Large Language Models156 people

DeepSeek-V2 Technical Report

DeepSeek

Large Language Models · 2405.04434

Large Language Models2405.04434
Large Language Models197 people

DeepSeek-V3 Technical Report

DeepSeek

Large Language Models · 2412.19437

Large Language Models2412.19437
Large Language Models42 people

Qwen2.5 Technical Report

Qwen

Large Language Models · 2412.15115

Large Language Models2412.15115
Large Language Models60 people

Qwen3 Technical Report

Qwen

Large Language Models · 2505.09388

Large Language Models2505.09388
Large Language Models48 people

Qwen Technical Report

Qwen

Large Language Models · 2309.16609

Large Language Models2309.16609
Large Language Models280 people

GPT-4 Technical Report

OpenAI

Large Language Models · 2303.08774

Large Language Models2303.08774