AI 速递 2026-06-05

生成时间：2026/6/5 10:07:55（UTC: 2026-06-05T02:07:55.012Z）

论文精选

Audio Interaction Model

👍 88 · arXiv

Audio is an inherently interactive modality, yet today’s Large Audio Language Models (LALMs) are offline, and streaming audio models each handle only a single task such as streaming ASR or voice chatt…

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

👍 45 · arXiv

Deep-research agents solve tasks through long trajectories of search, tool use, evidence inspection, and answer synthesis. Evaluation based on final answers shows whether an agent succeeds, but not wh…

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

👍 35 · arXiv

Rubric-based reinforcement learning (RL) uses an LLM-as-a-Judge (LaaJ) to score model outputs according to rubrics as rewards. However, policy models may exploit latent biases in the judge, leading to…

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

👍 29 · arXiv

Multimodal agents in robotics, AR, and autonomous driving must reason about places and layouts from continuous egocentric streams, often using evidence outside the current view. Existing benchmarks ei…

ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning

👍 24 · arXiv

Large Reasoning Models (LRMs) have achieved remarkable progress thanks to Reinforcement Learning with Verifiable Rewards (RLVR) on Chain-of-Thoughts (CoTs). However, since long CoTs naturally contain …

版本更新

OpenClaw v2026.6.2-beta.1

2026.6.2

Highlights

Plugin and skill installs now use an operator install policy instead of the old dangerous-code scanner path, with clearer doctor, CLI, ClawHub, and troubleshooting surfa…

链接：https://github.com/openclaw/openclaw/releases/tag/v2026.6.2-beta.1

LangChain langchain-deepseek==1.1.0

Changes since langchain-deepseek==1.0.1

chore(infra): bump langchain-tests floor to 1.1.9 (#37610) chore: bump idna from 3.10 to 3.15 in /libs/partners/deepseek (#37560) ci(infra): harden Dependabo…

链接：https://github.com/langchain-ai/langchain/releases/tag/langchain-deepseek%3D%3D1.1.0

Ollama v0.30.5

What’s Changed

Fix gemma4:12b floating point exception crash
integrations: hermes windows install by @BruceMacD in https://github.com/ollama/ollama/pull/16487

Full Changelog: https://g…

链接：https://github.com/ollama/ollama/releases/tag/v0.30.5

CrewAI 1.14.7a1

What’s Changed

Features

Add crew trained agents file support
Add native Snowflake Cortex LLM provider
Add Databricks integration guide
Add Snowflake integration guide

Bug Fixes

链接：https://github.com/crewAIInc/crewAI/releases/tag/1.14.7a1

开发者工具

Goose v1.37.0

✨ Features

xAI SuperGrok OAuth subscription provider #9420
Replay ACP images on session load [#9496](https://github.com/aaif-goose/goose/pull/9…

链接：https://github.com/aaif-goose/goose/releases/tag/v1.37.0

OpenAI Codex CLI rust-v0.138.0-alpha.4

Release 0.138.0-alpha.4

…

链接：https://github.com/openai/codex/releases/tag/rust-v0.138.0-alpha.4

行业动态

Anthropic’s open-source framework for AI-powered vulnerability discovery

Article URL: https://github.com/anthropics/defending-code-reference-harness Comments URL: https://news.ycombinator.com/item?id=48403980 Points: 273

Comments: 94

来源：Hacker News AI

When AI Builds Itself: Our progress toward recursive self-improvement

Article URL: https://www.anthropic.com/institute/recursive-self-improvement Comments URL: https://news.ycombinator.com/item?id=48400842 Points: 336

Comments: 443

来源：Hacker News AI

Article URL: https://www.404media.co/google-employees-internally-share-memes-about-how-its-ai-sucks/ Comments URL: https://news.ycombinator.com/item?id=48400311 Points: 147

Comments: 103

来源：Hacker News AI

The LLM warnings Google fired Timnit Gebru over have all come true

Article URL: https://www.tumblr.com/dreaminginthedeepsouth/817865966907228160/darren-oconnor-timnit-gebru-was-fired-from Comments URL: https://news.ycombinator.com/item?id=48400213 Points: 105

Comments: 100

来源：Hacker News AI

AI, Ashby Engineering, and the future

Article URL: https://www.ashbyhq.com/blog/engineering/ai-ashby-engineering-and-the-future Comments URL: https://news.ycombinator.com/item?id=48399528 Points: 56

Comments: 38

来源：Hacker News AI

Show HN: Boxes.dev: ditch localhost; run Claude Code and Codex in the cloud

Hi HN, we’re Nick and Drew, and we’re building boxes.dev – the first cloud-only agentic dev environment (ADE) that gives every Codex and Claude Code agent its own cloud computer.We’re two engineers who previously built Gem (co-founder/CTO and first hire), and we spent the last year coding almost exc

来源：Hacker News AI

The ways we contain Claude across products

Article URL: https://www.anthropic.com/engineering/how-we-contain-claude Comments URL: https://news.ycombinator.com/item?id=48392082 Points: 221

Comments: 94

来源：Hacker News AI

Failing grades soar with AI usage, dwindling math skills in Berkeley CS classes

Article URL: https://www.dailycal.org/news/campus/academics/failing-grades-soar-as-professors-see-greater-ai-usage-dwindling-math-skills-in-uc-berkeley/article_16fad0bf-02cb-4b8c-8d88-888ffd9f8608.html Comments URL: https://news.ycombinator.com/item?id=48392004 Points: 733

Comments: 716

来源：Hacker News AI

AI 速递 2026-06-05

论文精选

Audio Interaction Model

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning

版本更新

OpenClaw v2026.6.2-beta.1

2026.6.2

Highlights

LangChain langchain-deepseek==1.1.0

Ollama v0.30.5

What’s Changed

CrewAI 1.14.7a1

What’s Changed

Features

Bug Fixes

开发者工具

Goose v1.37.0

✨ Features

OpenAI Codex CLI rust-v0.138.0-alpha.4

行业动态

Anthropic’s open-source framework for AI-powered vulnerability discovery

Comments: 94

When AI Builds Itself: Our progress toward recursive self-improvement

Comments: 443

Google employees internally share memes about how its AI sucks

Comments: 103

The LLM warnings Google fired Timnit Gebru over have all come true

Comments: 100

AI, Ashby Engineering, and the future

Comments: 38

Show HN: Boxes.dev: ditch localhost; run Claude Code and Codex in the cloud

The ways we contain Claude across products

Comments: 94

Failing grades soar with AI usage, dwindling math skills in Berkeley CS classes

Comments: 716