Skip to content

Model Arena

AI Model Intelligence Hub

模型排名与选型看板

把国际 Arena、国内模型分区、API 定价和发布时间线放到同一页,方便你快速做模型选型、竞品观察和技术跟踪。

数据更新:2026/4/13 12:51:32 · 国际 Arena 来源 LMArena Leaderboard · 时间线参考 AI Flash Report

3
国际 Arena 榜单
Text / Code / Vision
6
国内模型分区
覆盖通用到多模态
20+
API 价格样本
便于成本对比
40
发布时间线
追踪模型演进

国际 Arena 榜单

保留国际主流评测口径,适合看当前头部模型在对话、代码和视觉理解上的整体位置。

查看数据来源
综合对话 Top 1
claude-opus-4-6-thinking
Anthropic · Rating 1503.8
代码 Top 1
claude-opus-4-6-thinking
Anthropic · Rating 1548.4
视觉 Top 1
claude-opus-4-6-thinking
Anthropic · Rating 1302.5

代码 / Web 开发

Code
#模型Rating组织投票
1 claude-opus-4-6-thinking 1548.4 Anthropic 4,015
2 claude-opus-4-6 1542.2 Anthropic 4,841
3 glm-5.1 1529.9 Z.ai 1,046
4 claude-sonnet-4-6 1521.3 Anthropic 6,979
5 claude-opus-4-5-20251101-thinking-32k 1490.2 Anthropic 13,065
6 claude-opus-4-5-20251101 1465.9 Anthropic 14,517
7 gpt-5.4-high (codex-harness) 1456.6 OpenAI 1,485
8 gemini-3.1-pro-preview 1456.3 Google 5,819
9 qwen3.6-plus-preview 1453.3 Alibaba 2,112
10 glm-4.7 1439.4 Z.ai 4,878

视觉理解

Vision
#模型Rating组织投票
1 claude-opus-4-6-thinking 1302.5 Anthropic 2,388
2 muse-spark 1292.8 Meta 1,210
3 claude-opus-4-6 1288.7 Anthropic 2,525
4 gemini-3-pro 1287.8 Google 13,390
5 gemini-3.1-pro-preview 1278.4 Google 10,116
6 gpt-5.2-chat-latest-20260210 1277.5 OpenAI 7,001
7 claude-sonnet-4-6 1274.2 Anthropic 2,836
8 gemini-3-flash 1267.9 Google 16,184
9 gemini-3-flash (thinking-minimal) 1257.5 Google 14,242
10 dola-seed-2.0-preview 1257.3 Bytedance 4,640

国内模型分区榜单

国内模型分区榜单为站内整理版,综合公开榜单、产品能力、生态落地与近期发布节奏,适合做选型参考,不等同于单一统一基准测试。

查看总来源

通用

综合对话、推理、长上下文与产品完成度

Top 5
#1
DeepSeek V3.2 DeepSeek
通用对话、推理、长上下文、性价比
MMLU 90.1%,HumanEval 92.5%,1M+ context
98
推荐
#2
GLM-5 Zhipu AI
综合智能、低幻觉率、国产算力适配
HLE 50.4%,Hallucination Rate 1.2%
96
#3
Kimi K2 Moonshot AI
长文理解、中文体验、开放权重影响力
LMSYS Arena #1 open-weight,1.04T params
95
#4
Doubao Seed 2.0 ByteDance
产品化、Agent 场景、多模态联动
多模态能力强,适合字节生态产品化
93
#5
MiniMax M2.7 MiniMax
通用能力平衡、生成式交互、多模态协同
Arena code 1445,综合产品成熟度高
91

代码

编码、Agent、工具调用与工程落地能力

Top 5
#1
GLM-5 Zhipu AI
工程代码生成、Agent 编排、中文开发支持
Arena Code Top domestic,HLE 50.4%
97
推荐
#2
MiniMax M2.7 MiniMax
代码补全、复杂任务拆解、工具调用
Arena Code 1445
95
#3
GLM-4.7 Zhipu AI
稳定编码、函数调用、工程问答
Arena Code 1439.1
93
#4
DeepSeek Coder V3 DeepSeek
代码生成、重构、开源生态认可度高
HumanEval / Repo-level coding 表现强
92
#5
Kimi K2 Moonshot AI
长上下文代码理解、文档到实现链路
长文代码库理解表现突出
90

TTS

语音合成自然度、情感表现与商用成熟度

Top 5
#1
MiniMax Speech-02 MiniMax
自然度高、情感表达、商业落地成熟
中文自然度与角色语音表现领先
96
推荐
#2
CosyVoice 2 FunAudioLLM / 阿里系生态
开源可控、零样本音色克隆、中文效果好
开源中文 TTS 代表方案
94
#3
Step-Audio TTS StepFun
对话式语音、拟人化表达、端到端体验好
语音交互体验强
92
#4
Doubao Voice ByteDance
产品集成强、延迟低、适合陪伴/内容场景
大规模应用落地能力强
90
#5
Tencent Cloud TTS Tencent
稳定性高、企业服务成熟、音库丰富
企业级接入成熟
88

ASR

语音识别准确率、实时性与行业适配

Top 5
#1
Paraformer Large FunAudioLLM / ModelScope
中文识别准确率高、流式与非流式都成熟
中文 ASR 开源标杆
96
推荐
#2
SenseVoice FunAudioLLM
多语言识别、情感/事件理解、实时性好
ASR + speech understanding 一体化
95
#3
Tencent Cloud ASR Tencent
稳定、工程接入成熟、行业方案丰富
企业级落地广泛
91
#4
iFLYTEK Spark ASR iFLYTEK
中文语音识别积累深、行业词表能力强
政企和教育场景强势
90
#5
Baidu Speech ASR Baidu
云服务稳定、普通话识别成熟、接入门槛低
通用云语音场景覆盖广
88

视频生成

镜头稳定性、动作表现与一致性

Top 5
#1
Seedance 2.0 ByteDance
音视频同步生成、镜头一致性、商业化能力强
同步音视频单次生成,Languages 8+
97
推荐
#2
Kling 2.0 Kuaishou
运动幅度、镜头语言、人物动作细节
国内视频生成头部产品
95
#3
Vidu Q1 ShengShu AI
叙事连贯性、风格控制、中文提示词友好
创作者社区反馈稳定
93
#4
Wan 2.1 Alibaba
开源路线、可控生成、生态联动强
开源视频模型代表之一
91
#5
Hailuo Video MiniMax
人物演绎、短视频生成、产品体验完整
消费级生成体验优秀
89

图片生成

中文提示词理解、审美质量与可控性

Top 5
#1
FLUX China / 国内优化版生态 Open ecosystem
中文提示词适配、写实质感、社区活跃
国内创作者生态采用广泛
94
推荐
#2
Kolors Kuaishou
中文语义理解强、海报和人物图表现好
中文文生图代表模型
93
#3
Tongyi Wanxiang Alibaba
电商和设计场景适配、企业服务能力强
企业图像生成落地成熟
92
#4
Doubao Image ByteDance
社媒内容生成、风格化、上手门槛低
内容创作场景增长快
90
#5
Ernie Image Baidu
企业集成、通用图像生成、中文理解稳定
云产品体系完整
88

API 定价对比

统一按每百万 tokens 展示,方便快速看输入、输出与上下文成本结构。

查看数据来源
筛选思路 先看输入成本,再看输出倍率和上下文窗口,避免只看单价忽略真实任务成本。
模型 提供商 输入 输出 上下文 最大输出
GPT-4.1 Nano OpenAI $0.10 $0.40 1M 16K
Qwen 3.5 Flash Alibaba $0.10 $0.40 1M 8K
Gemini 2.5 Flash Google $0.15 $0.60 1M 64K
Llama 4 Maverick Meta (via API) $0.20 $0.60 1M 16K
GPT-5 Mini OpenAI $0.25 $2.00 128K 16K
DeepSeek V3.2 DeepSeek $0.25 $0.40 128K 16K
Grok 3 Mini xAI $0.30 $0.50 128K 16K
GPT-4.1 Mini OpenAI $0.40 $1.60 1M 16K
DeepSeek R1 DeepSeek $0.55 $2.19 128K 16K
Claude Haiku 3.5 Anthropic $0.80 $4.00 200K 8K
o4-mini OpenAI $1.10 $4.40 200K 100K
GPT-5 OpenAI $1.25 $10.00 128K 16K
Gemini 3.1 Pro Google $1.25 $10.00 1M 64K
Gemini 2.5 Pro Google $1.25 $10.00 1M 64K
GPT-5.2 OpenAI $1.75 $14.00 400K 32K
GPT-4.1 OpenAI $2.00 $8.00 1M 32K
o3 OpenAI $2.00 $8.00 200K 100K
Claude Sonnet 4.6 Anthropic $3.00 $15.00 200K 64K
Grok 3 xAI $3.00 $15.00 128K 16K
Claude Opus 4.6 Anthropic $5.00 $25.00 200K 32K

模型发布时间线

按时间回看模型迭代节奏,判断各厂商当前重点押注方向。

查看数据来源
观察重点 留意各家在代码、推理、多模态和长上下文上的发布时间密度,能更快看出产品路线。
2026-02-19
Gemini 3.1 Pro Google LLM
ARC-AGI-2 77.1%,MMLU 93.8%
  • 2x reasoning improvement
  • ARC-AGI-2 score of 77.1%
  • Enhanced multimodal understanding
  • Deep Think mode
2026-02-17
Claude Sonnet 4.6 Anthropic LLM
SWE-bench 80.8%,MMLU 92.1%
  • Agent Teams: orchestrate 2-16 Claude instances
  • Near-Opus performance at 1/5th cost
  • 80.8% SWE-bench Verified
  • Fast mode research preview
2026-02-12
DeepSeek V3.2 DeepSeek LLM
MMLU 90.1%,HumanEval 92.5%
  • 1M+ token context window (10x expansion)
  • Improved reasoning capabilities
  • Open source release
  • Cost-effective inference
2026-02-11
GLM-5 Zhipu AI LLM
HLE 50.4%,Hallucination Rate 1.2%
  • First frontier model trained on Huawei Ascend chips (no NVIDIA)
  • #1 HLE score (50.4%)
  • 1.2% hallucination rate via Slime RL
  • 136x cheaper than Claude Opus 4.5
2026-02-10
Seedance 2.0 ByteDance Video
Audio Sync Excellent,Languages 8+
  • Synchronized audio + video generation in one pass
  • Lip-sync in 8+ languages
  • 10-30x cheaper than Sora 2
  • Commercial pricing: $0.10-$0.80/min
2026-02-05
GPT-5.3 Codex OpenAI Code
Terminal-Bench 77.3%,SWE-Bench Pro SOTA
  • Self-improving agentic coding
  • 25% faster than GPT-5.2-Codex
  • 1,000+ tokens/sec generation
  • First OpenAI model flagged 'high' on cybersecurity framework
2026-01-20
Kimi K2 Moonshot AI LLM
LMSYS Arena #1,Parameters 1.04T
  • First open-weight model #1 on LMSYS Chatbot Arena
  • 1.04 trillion parameters
  • K2.5 agent swarms with up to 100 sub-agents
  • $0.15/M input tokens
2025-12-18
GPT-5.2 Codex OpenAI Code
SWE-Bench SOTA,HumanEval 95.1%
  • Specialized for software engineering
  • Enhanced agentic coding
  • Multi-file refactoring
  • Advanced debugging capabilities
2025-12-15
Mistral Large 3 Mistral LLM
MMLU 89.4%,HumanEval 91.2%
  • 128K context window
  • Improved multilingual capabilities
  • Enhanced function calling
  • Competitive with GPT-5 class models
2025-12-11
GPT-5.2 OpenAI LLM
MMLU 92.8%,MATH 88.5%
  • Enhanced reasoning capabilities
  • Improved adaptive reasoning
  • Better multimodal understanding
  • Faster inference
2025-11-24
Claude Opus 4.5 Anthropic LLM
SWE-bench 80.9%,MMLU 92.8%
  • First model to break 80.9% on SWE-Bench Verified
  • 67% price reduction vs previous Opus
  • Extended reasoning capabilities
  • Advanced coding performance
2025-11-18
Gemini 3 Pro Google Multimodal
ARC-AGI 87.5%,MMLU 93.2%
  • 1M token context window
  • Deep Think reasoning mode
  • Solved 5/6 IMO 2025 problems
  • #1 on LMSYS Arena
2025-11-12
GPT-5.1 OpenAI LLM
ARC-AGI 87.5%,AIME 2025 100%
  • Adaptive reasoning modes
  • Perfect 100% on AIME 2025
  • 87.5% on ARC-AGI
  • Enhanced multimodal capabilities
2025-08-15
GPT-5 OpenAI LLM
MMLU 91.0%,HumanEval 93.5%
  • Adaptive reasoning (routes between quick and deep thinking)
  • Improved math and coding
  • Enhanced multimodal reasoning
  • New safety architecture
2025-07-15
Claude Opus 4.1 Anthropic LLM
SWE-bench 75.2%,MMLU 91.2%
  • Improved multi-file refactoring
  • Enhanced agentic capabilities
  • Better long-context performance
  • Reduced hallucinations
2025-06-20
Gemini 2.5 Flash Google Multimodal
MMLU 87.5%,Speed 2x Gemini 2.0 Flash
  • Enhanced image editing stabilization
  • Faster inference
  • Improved multimodal understanding
  • Cost-effective deployment
2025-05-22
Claude Sonnet 4 Anthropic LLM
MMLU 88.7%,HumanEval 94.5%
  • Enhanced reasoning capabilities
  • Improved safety measures
  • Advanced multimodal understanding
  • Extended context window
2025-02-24
Claude Sonnet 3.7 Anthropic LLM
MMLU 86.1%,HumanEval 93.2%
  • Improved reasoning
  • Better code generation
  • Enhanced safety
  • Reduced hallucinations
2024-12-26
DeepSeek-V3 DeepSeek LLM
MMLU 88.5%,HumanEval 82.6%
  • Mixture of Experts architecture
  • Cost-effective training
  • Open source release
  • Strong reasoning capabilities
2024-12-11
Gemini 2.0 Flash Google Multimodal
MMLU 85.8%,HumanEval 71.9%
  • Native multimodal generation
  • Real-time API
  • Agentic capabilities
  • Enhanced speed
2024-10-16
Moonshot Kimi Moonshot AI LLM
C-Eval 77.9%,CMMLU 75.2%
  • 200K context window
  • Multilingual support
  • Document processing
  • Chinese language optimization
2024-09-12
GPT-o1-preview OpenAI LLM
AIME 83rd percentile,GPQA 78%
  • Advanced reasoning capabilities
  • Chain-of-thought processing
  • Enhanced problem-solving
  • Improved mathematical reasoning
2024-08-13
Grok-2 xAI LLM
MMLU 84.0%,HumanEval 74.1%
  • Real-time information access
  • Multimodal understanding
  • X platform integration
  • Conversational AI
2024-07-25
GitHub Copilot Chat Microsoft Code
HumanEval 84.2%,MBPP 78.9%
  • Conversational coding
  • IDE integration
  • Code explanation
  • Multi-language support
2024-06-20
Claude 3.5 Sonnet Anthropic LLM
MMLU 88.7%,HumanEval 92.0%
  • 200K context window
  • Improved coding capabilities
  • Enhanced reasoning
  • Vision capabilities
2024-03-28
Grok-1.5 xAI LLM
MMLU 73.0%,HumanEval 63.2%
  • Improved reasoning
  • Code generation
  • 128K context window
  • Real-time data access
2024-03-04
Claude 3 Opus Anthropic LLM
MMLU 86.8%,HumanEval 84.9%
  • 200K context window
  • Advanced reasoning
  • Multimodal capabilities
  • Constitutional AI training
2024-03-04
Claude 3 Sonnet Anthropic LLM
MMLU 79.0%,HumanEval 73.0%
  • 200K context window
  • Balanced capability and speed
  • Multimodal input
  • Strong reasoning
2024-03-04
Claude 3 Haiku Anthropic LLM
MMLU 75.2%,HumanEval 75.9%
  • 200K context window
  • Fastest response times
  • Multimodal input
  • Cost-effective
2024-02-26
Mistral Large Mistral LLM
Top-tier reasoning model with strong multilingual capabilities。MMLU 81.2%,HumanEval 45%
  • 32K context window
  • Multilingual capabilities
  • Function calling
  • JSON mode
2024-02-15
Sora OpenAI Video
Video Quality High,Motion Coherence Excellent
  • Text-to-video generation
  • 60-second video clips
  • Complex scene generation
  • Realistic motion physics
2024-02-15
Gemini 1.5 Pro Google Multimodal
MMLU 81.9%,HumanEval 71.9%
  • 1M token context window
  • Multimodal understanding
  • Video analysis
  • Audio processing
2024-01-25
text-embedding-3-large OpenAI Embedding
MTEB Score 64.6%,Dimensions 3072
  • 3072 embedding dimensions
  • Improved retrieval performance
  • Reduced hallucinations
  • Multi-language support
2024-01-25
GPT-4 Turbo OpenAI LLM
MMLU 86.4%,HumanEval 67%
  • 128K context window
  • Improved instruction following
  • Enhanced reasoning capabilities
  • Reduced hallucinations
2023-12-21
Midjourney v6 Midjourney Vision
Image Quality Very High,Prompt Adherence 92%
  • Photorealistic image generation
  • Improved prompt understanding
  • Better human anatomy
  • Text rendering
2023-12-07
Grok-1 xAI LLM
MMLU 73.0%,HumanEval 63.2%
  • Real-time information
  • Conversational interface
  • X platform integration
  • Uncensored responses
2023-12-06
AlphaCode 2 Google DeepMind Code
Codeforces Rating 1747,Problem Solving 85th percentile
  • Advanced code generation
  • Competitive programming
  • Multi-language support
  • Problem decomposition
2023-12-06
Gemini Ultra Google Multimodal
MMLU 90.0%,HumanEval 74.4%
  • Multimodal reasoning
  • Text, image, audio, video understanding
  • Advanced mathematical reasoning
  • Code generation
2023-12-06
Gemini Pro Google Multimodal
MMLU 79.1%,HumanEval 67.7%
  • Multimodal capabilities
  • 32K context window
  • Fast inference
  • Scalable deployment
2023-11-21
Claude 2.1 Anthropic LLM
MMLU 73.1%,HumanEval 70.0%
  • 200K context window
  • Reduced hallucination rates
  • Enhanced accuracy
  • Tool use capabilities