Skip to content

Model Arena

AI Model Intelligence Hub

模型排名与选型看板

把国际 Arena、国内模型分区、API 定价和发布时间线放到同一页,方便你快速做模型选型、竞品观察和技术跟踪。

数据更新:2026/5/19 13:58:41 · 国际 Arena 来源 LMArena Leaderboard · 时间线参考 AI Flash Report

3
国际 Arena 榜单
Text / Code / Vision
6
国内模型分区
覆盖通用到多模态
20+
API 价格样本
便于成本对比
40
发布时间线
追踪模型演进

国际 Arena 榜单

保留国际主流评测口径,适合看当前头部模型在对话、代码和视觉理解上的整体位置。

查看数据来源
综合对话 Top 1
claude-opus-4-6-thinking
Anthropic · Rating 1501.8
代码 Top 1
claude-opus-4-7-thinking
Anthropic · Rating 1567.4
视觉 Top 1
claude-opus-4-7-thinking
Anthropic · Rating 1306.3

代码 / Web 开发

Code
#模型Rating组织投票
1 claude-opus-4-7-thinking 1567.4 Anthropic 4,176
2 claude-opus-4-7 1559.3 Anthropic 3,959
3 claude-opus-4-6-thinking 1545.6 Anthropic 7,021
4 claude-opus-4-6 1540.8 Anthropic 8,022
5 glm-5.1 1531.9 Z.ai 3,613
6 claude-sonnet-4-6 1523.6 Anthropic 10,155
7 kimi-k2.6 1518.5 Moonshot 3,230
8 muse-spark 1508.7 Meta 1,630
9 gpt-5.5-xhigh (codex-harness) 1500.6 OpenAI 3,220
10 qwen3.6-max-preview 1491.3 Alibaba 1,900
11 claude-opus-4-5-20251101-thinking-32k 1490.5 Anthropic 13,066
12 gpt-5.5-high (codex-harness) 1481.3 OpenAI 3,403
13 mimo-v2.5-pro 1472.3 Xiaomi 3,879
14 claude-opus-4-5-20251101 1467.0 Anthropic 15,306
15 qwen3.6-plus 1460.4 Alibaba 5,140

视觉理解

Vision
#模型Rating组织投票
1 claude-opus-4-7-thinking 1306.3 Anthropic 6,531
2 claude-opus-4-7 1303.8 Anthropic 6,779
3 claude-opus-4-6-thinking 1299.8 Anthropic 7,135
4 muse-spark 1295.9 Meta 4,526
5 claude-opus-4-6 1292.9 Anthropic 8,549
6 gemini-3-pro 1289.0 Google 13,226
7 gpt-5.5 1287.6 OpenAI 4,662
8 gpt-5.2-chat-latest-20260210 1280.2 OpenAI 12,478
9 gpt-5.5-high 1278.5 OpenAI 4,238
10 gemini-3.1-pro-preview 1277.5 Google 16,807
11 gpt-5.4-high 1276.5 OpenAI 5,884
12 claude-sonnet-4-6 1275.3 Anthropic 8,916
13 gpt-5.5-instant 1274.6 OpenAI 3,645
14 gemini-3-flash 1271.0 Google 21,195
15 gpt-5.4 1269.4 OpenAI 5,580

国内模型分区榜单

国内模型分区榜单为站内整理版,综合公开榜单、产品能力、生态落地与近期发布节奏,适合做选型参考,不等同于单一统一基准测试。

查看总来源

通用

综合对话、推理、长上下文与产品完成度

Top 5
#1
DeepSeek V3.2 DeepSeek
通用对话、推理、长上下文、性价比
MMLU 90.1%,HumanEval 92.5%,1M+ context
98
推荐
#2
GLM-5 Zhipu AI
综合智能、低幻觉率、国产算力适配
HLE 50.4%,Hallucination Rate 1.2%
96
#3
Kimi K2 Moonshot AI
长文理解、中文体验、开放权重影响力
LMSYS Arena #1 open-weight,1.04T params
95
#4
Doubao Seed 2.0 ByteDance
产品化、Agent 场景、多模态联动
多模态能力强,适合字节生态产品化
93
#5
MiniMax M2.7 MiniMax
通用能力平衡、生成式交互、多模态协同
Arena code 1445,综合产品成熟度高
91

代码

编码、Agent、工具调用与工程落地能力

Top 5
#1
GLM-5 Zhipu AI
工程代码生成、Agent 编排、中文开发支持
Arena Code Top domestic,HLE 50.4%
97
推荐
#2
MiniMax M2.7 MiniMax
代码补全、复杂任务拆解、工具调用
Arena Code 1445
95
#3
GLM-4.7 Zhipu AI
稳定编码、函数调用、工程问答
Arena Code 1439.1
93
#4
DeepSeek Coder V3 DeepSeek
代码生成、重构、开源生态认可度高
HumanEval / Repo-level coding 表现强
92
#5
Kimi K2 Moonshot AI
长上下文代码理解、文档到实现链路
长文代码库理解表现突出
90

TTS

语音合成自然度、情感表现与商用成熟度

Top 5
#1
MiniMax Speech-02 MiniMax
自然度高、情感表达、商业落地成熟
中文自然度与角色语音表现领先
96
推荐
#2
CosyVoice 2 FunAudioLLM / 阿里系生态
开源可控、零样本音色克隆、中文效果好
开源中文 TTS 代表方案
94
#3
Step-Audio TTS StepFun
对话式语音、拟人化表达、端到端体验好
语音交互体验强
92
#4
Doubao Voice ByteDance
产品集成强、延迟低、适合陪伴/内容场景
大规模应用落地能力强
90
#5
Tencent Cloud TTS Tencent
稳定性高、企业服务成熟、音库丰富
企业级接入成熟
88

ASR

语音识别准确率、实时性与行业适配

Top 5
#1
Paraformer Large FunAudioLLM / ModelScope
中文识别准确率高、流式与非流式都成熟
中文 ASR 开源标杆
96
推荐
#2
SenseVoice FunAudioLLM
多语言识别、情感/事件理解、实时性好
ASR + speech understanding 一体化
95
#3
Tencent Cloud ASR Tencent
稳定、工程接入成熟、行业方案丰富
企业级落地广泛
91
#4
iFLYTEK Spark ASR iFLYTEK
中文语音识别积累深、行业词表能力强
政企和教育场景强势
90
#5
Baidu Speech ASR Baidu
云服务稳定、普通话识别成熟、接入门槛低
通用云语音场景覆盖广
88

视频生成

镜头稳定性、动作表现与一致性

Top 5
#1
Seedance 2.0 ByteDance
音视频同步生成、镜头一致性、商业化能力强
同步音视频单次生成,Languages 8+
97
推荐
#2
Kling 2.0 Kuaishou
运动幅度、镜头语言、人物动作细节
国内视频生成头部产品
95
#3
Vidu Q1 ShengShu AI
叙事连贯性、风格控制、中文提示词友好
创作者社区反馈稳定
93
#4
Wan 2.1 Alibaba
开源路线、可控生成、生态联动强
开源视频模型代表之一
91
#5
Hailuo Video MiniMax
人物演绎、短视频生成、产品体验完整
消费级生成体验优秀
89

图片生成

中文提示词理解、审美质量与可控性

Top 5
#1
FLUX China / 国内优化版生态 Open ecosystem
中文提示词适配、写实质感、社区活跃
国内创作者生态采用广泛
94
推荐
#2
Kolors Kuaishou
中文语义理解强、海报和人物图表现好
中文文生图代表模型
93
#3
Tongyi Wanxiang Alibaba
电商和设计场景适配、企业服务能力强
企业图像生成落地成熟
92
#4
Doubao Image ByteDance
社媒内容生成、风格化、上手门槛低
内容创作场景增长快
90
#5
Ernie Image Baidu
企业集成、通用图像生成、中文理解稳定
云产品体系完整
88

API 定价对比

统一按每百万 tokens 展示,方便快速看输入、输出与上下文成本结构。

查看数据来源
筛选思路 先看输入成本,再看输出倍率和上下文窗口,避免只看单价忽略真实任务成本。
模型 提供商 输入 输出 上下文 最大输出
GPT-4.1 Nano OpenAI $0.10 $0.40 1M 16K
Qwen 3.5 Flash Alibaba $0.10 $0.40 1M 8K
Gemini 2.5 Flash Google $0.15 $0.60 1M 64K
Llama 4 Maverick Meta (via API) $0.20 $0.60 1M 16K
GPT-5 Mini OpenAI $0.25 $2.00 128K 16K
DeepSeek V3.2 DeepSeek $0.25 $0.40 128K 16K
Grok 3 Mini xAI $0.30 $0.50 128K 16K
GPT-4.1 Mini OpenAI $0.40 $1.60 1M 16K
DeepSeek R1 DeepSeek $0.55 $2.19 128K 16K
Claude Haiku 3.5 Anthropic $0.80 $4.00 200K 8K
o4-mini OpenAI $1.10 $4.40 200K 100K
GPT-5 OpenAI $1.25 $10.00 128K 16K
Gemini 3.1 Pro Google $1.25 $10.00 1M 64K
Gemini 2.5 Pro Google $1.25 $10.00 1M 64K
GPT-5.2 OpenAI $1.75 $14.00 400K 32K
GPT-4.1 OpenAI $2.00 $8.00 1M 32K
o3 OpenAI $2.00 $8.00 200K 100K
Claude Sonnet 4.6 Anthropic $3.00 $15.00 200K 64K
Grok 3 xAI $3.00 $15.00 128K 16K
Claude Opus 4.6 Anthropic $5.00 $25.00 200K 32K

模型发布时间线

按时间回看模型迭代节奏,判断各厂商当前重点押注方向。

查看数据来源
观察重点 留意各家在代码、推理、多模态和长上下文上的发布时间密度,能更快看出产品路线。
2026-05-11
MiniCPM-V 4.6 1.3B OpenBMB LLM
OpenBMB MiniCPM-V 4.6 1.3B — AA Intelligence Index 12.7, 262K tokens context.。GPQA Diamond 30.5%,HLE 4.9%
2026-04-30
Grok 4.3 xAI LLM
xAI Grok 4.3 — AA Intelligence Index 53.2, 1M tokens context, reasoning model.。GPQA Diamond 90.1%,HLE 35.0%
2026-04-29
Granite 4.1 30B IBM LLM
IBM Granite 4.1 30B — AA Intelligence Index 14.7, 131K tokens context.。GPQA Diamond 48.1%,HLE 4.2%
2026-04-29
Granite 4.1 3B IBM LLM
IBM Granite 4.1 3B — AA Intelligence Index 8.5, 131K tokens context.。GPQA Diamond 31.4%,HLE 3.4%
2026-04-29
Granite 4.1 8B IBM LLM
IBM Granite 4.1 8B — AA Intelligence Index 12.4, 131K tokens context.。GPQA Diamond 43.3%,HLE 3.8%
2026-04-29
Mistral Medium 3.5 Mistral LLM
Mistral Mistral Medium 3.5 — AA Intelligence Index 39.2, 256K tokens context, reasoning model.。GPQA Diamond 74.8%,HLE 12.8%
2026-04-29
Nemotron 3 Nano Omni 30B A3B Reasoning NVIDIA LLM
NVIDIA Nemotron 3 Nano Omni 30B A3B Reasoning — AA Intelligence Index 21.4, 256K tokens context, reasoning model.。GPQA Diamond 46.9%,HLE 5.3%
2026-04-24
DeepSeek V4 Flash DeepSeek LLM
DeepSeek DeepSeek V4 Flash — AA Intelligence Index 46.5, 1M tokens context, reasoning model.。GPQA Diamond 89.4%,HLE 32.1%
2026-04-24
DeepSeek V4 Pro DeepSeek LLM
DeepSeek DeepSeek V4 Pro — AA Intelligence Index 51.5, 1M tokens context, reasoning model.。GPQA Diamond 88.8%,HLE 35.9%
2026-04-23
Ling-2.6-1T InclusionAI LLM
InclusionAI Ling-2.6-1T — AA Intelligence Index 33.6, 262K tokens context.。GPQA Diamond 75.2%,HLE 8.2%
2026-04-23
GPT-5.5 OpenAI LLM
OpenAI GPT-5.5 — AA Intelligence Index 60.2, 922K tokens context, reasoning model.。GPQA Diamond 93.5%,HLE 44.3%
2026-04-23
Hy3-preview Tencent LLM
Tencent Hy3-preview — AA Intelligence Index 41.9, 256K tokens context, reasoning model.。GPQA Diamond 86.7%,HLE 25.5%
2026-04-22
Qwen3.6 27B Alibaba LLM
Alibaba Qwen3.6 27B — AA Intelligence Index 45.8, 262K tokens context, reasoning model.。GPQA Diamond 84.2%,HLE 21.6%
2026-04-22
MiMo-V2.5 Xiaomi LLM
Xiaomi MiMo-V2.5 — AA Intelligence Index 49.0, 1M tokens context, reasoning model.。GPQA Diamond 84.9%,HLE 25.2%
2026-04-22
MiMo-V2.5-Pro Xiaomi LLM
Xiaomi MiMo-V2.5-Pro — AA Intelligence Index 53.8, 1M tokens context, reasoning model.。GPQA Diamond 86.6%,HLE 33.8%
2026-04-21
Ling 2.6 Flash InclusionAI LLM
InclusionAI Ling 2.6 Flash — AA Intelligence Index 26.2, 262K tokens context.。GPQA Diamond 59.3%,HLE 6.2%
2026-04-20
Qwen3.6 Max Preview Alibaba LLM
Alibaba Qwen3.6 Max Preview — AA Intelligence Index 51.8, 256K tokens context, reasoning model.。GPQA Diamond 88.8%,HLE 28.9%
2026-04-20
Kimi K2.6 Kimi LLM
Kimi Kimi K2.6 — AA Intelligence Index 53.9, 256K tokens context, reasoning model.。GPQA Diamond 91.1%,HLE 35.9%
2026-04-16
Qwen3.6 35B A3B Alibaba LLM
Alibaba Qwen3.6 35B A3B — AA Intelligence Index 43.5, 262K tokens context, reasoning model.。GPQA Diamond 84.1%,HLE 20.2%
2026-04-16
Claude Opus 4.7 Anthropic LLM
Anthropic Claude Opus 4.7 — AA Intelligence Index 57.3, 1M tokens context, reasoning model.。GPQA Diamond 91.4%,HLE 39.6%
2026-04-09
EXAONE 4.5 33B LG AI Research LLM
LG AI Research EXAONE 4.5 33B — AA Intelligence Index 30.2, 262K tokens context, reasoning model.。GPQA Diamond 79.4%,HLE 11.6%
2026-04-08
Muse Spark Meta LLM
Meta Muse Spark — AA Intelligence Index 52.1, 262K tokens context, reasoning model.。GPQA Diamond 88.4%,HLE 39.9%
2026-04-07
GLM-5.1 Z AI LLM
Z AI GLM-5.1 — AA Intelligence Index 51.4, 200K tokens context, reasoning model.。GPQA Diamond 86.8%,HLE 28.0%
2026-04-07
Grok 4.20 0309 v2 xAI LLM
xAI Grok 4.20 0309 v2 — AA Intelligence Index 49.3, 2M tokens context, reasoning model.。GPQA Diamond 91.1%,HLE 32.2%
2026-04-06
Solar Pro 3 Upstage LLM
Upstage Solar Pro 3 — AA Intelligence Index 25.9, 128K tokens context, reasoning model.。GPQA Diamond 72.4%,HLE 10.1%
2026-04-03
Gemma 4 E4B Google LLM
Google Gemma 4 E4B — AA Intelligence Index 18.8, 128K tokens context, reasoning model.。GPQA Diamond 57.6%,HLE 3.7%
2026-04-02
Qwen3.6 Plus Alibaba LLM
Alibaba Qwen3.6 Plus — AA Intelligence Index 50.0, 1M tokens context, reasoning model.。GPQA Diamond 88.2%,HLE 25.7%
2026-04-02
Gemma 4 26B A4B Google LLM
Google Gemma 4 26B A4B — AA Intelligence Index 31.2, 256K tokens context, reasoning model.。GPQA Diamond 79.2%,HLE 18.3%
2026-04-02
Gemma 4 31B Google LLM
Google Gemma 4 31B — AA Intelligence Index 39.2, 256K tokens context, reasoning model.。GPQA Diamond 85.7%,HLE 22.7%
2026-04-02
Gemma 4 E2B Google LLM
Google Gemma 4 E2B — AA Intelligence Index 15.2, 128K tokens context, reasoning model.。GPQA Diamond 43.3%,HLE 4.8%
2026-04-02
Step 3.5 Flash 2603 StepFun LLM
StepFun Step 3.5 Flash 2603 — AA Intelligence Index 38.5, 256K tokens context, reasoning model.。GPQA Diamond 82.6%,HLE 22.6%
2026-04-01
Trinity Large Thinking Arcee AI LLM
Arcee AI Trinity Large Thinking — AA Intelligence Index 31.9, 512K tokens context, reasoning model.。GPQA Diamond 75.2%,HLE 14.7%
2026-04-01
GLM 5V Turbo Z AI LLM
Z AI GLM 5V Turbo — AA Intelligence Index 42.9, 200K tokens context, reasoning model.。GPQA Diamond 80.9%,HLE 15.8%
2026-03-30
Qwen3.5 Omni Flash Alibaba LLM
Alibaba Qwen3.5 Omni Flash — AA Intelligence Index 25.9, 256K tokens context.。GPQA Diamond 74.2%,HLE 7.1%
2026-03-30
Qwen3.5 Omni Plus Alibaba LLM
Alibaba Qwen3.5 Omni Plus — AA Intelligence Index 38.6, 256K tokens context.。GPQA Diamond 82.6%,HLE 13.9%
2026-03-27
MiMo-V2-Omni-0327 Xiaomi LLM
Xiaomi MiMo-V2-Omni-0327 — AA Intelligence Index 44.9, 256K tokens context, reasoning model.。GPQA Diamond 85.5%,HLE 20.4%
2026-03-19
Nemotron Cascade 2 30B A3B NVIDIA LLM
NVIDIA Nemotron Cascade 2 30B A3B — AA Intelligence Index 28.4, 1M tokens context, reasoning model.。GPQA Diamond 75.8%,HLE 11.4%
2026-03-19
MiMo-V2-Omni Xiaomi LLM
Xiaomi MiMo-V2-Omni — AA Intelligence Index 43.4, 256K tokens context, reasoning model.。GPQA Diamond 82.8%,HLE 19.9%
2026-03-18
MiniMax-M2.7 MiniMax LLM
MiniMax MiniMax-M2.7 — AA Intelligence Index 49.6, 204K tokens context, reasoning model.。GPQA Diamond 87.4%,HLE 28.1%
2026-03-18
MiMo-V2-Pro Xiaomi LLM
Xiaomi MiMo-V2-Pro — AA Intelligence Index 49.2, 1M tokens context, reasoning model.。GPQA Diamond 87.0%,HLE 28.3%