AI 速递 2026-06-05
生成时间:2026/6/5 10:07:55(UTC: 2026-06-05T02:07:55.012Z)
Audio Interaction Model
Section titled “Audio Interaction Model”👍 88 · arXiv
Audio is an inherently interactive modality, yet today’s Large Audio Language Models (LALMs) are offline, and streaming audio models each handle only a single task such as streaming ASR or voice chatt…
Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories
Section titled “Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories”👍 45 · arXiv
Deep-research agents solve tasks through long trajectories of search, tool use, evidence inspection, and answer synthesis. Evaluation based on final answers shows whether an agent succeeds, but not wh…
Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning
Section titled “Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning”👍 35 · arXiv
Rubric-based reinforcement learning (RL) uses an LLM-as-a-Judge (LaaJ) to score model outputs according to rubrics as rewards. However, policy models may exploit latent biases in the judge, leading to…
OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs
Section titled “OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs”👍 29 · arXiv
Multimodal agents in robotics, AR, and autonomous driving must reason about places and layouts from continuous egocentric streams, often using evidence outside the current view. Existing benchmarks ei…
ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning
Section titled “ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning”👍 24 · arXiv
Large Reasoning Models (LRMs) have achieved remarkable progress thanks to Reinforcement Learning with Verifiable Rewards (RLVR) on Chain-of-Thoughts (CoTs). However, since long CoTs naturally contain …
OpenClaw v2026.6.2-beta.1
Section titled “OpenClaw v2026.6.2-beta.1”2026.6.2
Section titled “2026.6.2”Highlights
Section titled “Highlights”- Plugin and skill installs now use an operator install policy instead of the old dangerous-code scanner path, with clearer doctor, CLI, ClawHub, and troubleshooting surfa…
链接:https://github.com/openclaw/openclaw/releases/tag/v2026.6.2-beta.1
LangChain langchain-deepseek==1.1.0
Section titled “LangChain langchain-deepseek==1.1.0”Changes since langchain-deepseek==1.0.1
chore(infra): bump langchain-tests floor to 1.1.9 (#37610)
chore: bump idna from 3.10 to 3.15 in /libs/partners/deepseek (#37560)
ci(infra): harden Dependabo…
链接:https://github.com/langchain-ai/langchain/releases/tag/langchain-deepseek%3D%3D1.1.0
Ollama v0.30.5
Section titled “Ollama v0.30.5”What’s Changed
Section titled “What’s Changed”- Fix gemma4:12b floating point exception crash
- integrations: hermes windows install by @BruceMacD in https://github.com/ollama/ollama/pull/16487
Full Changelog: https://g…
链接:https://github.com/ollama/ollama/releases/tag/v0.30.5
CrewAI 1.14.7a1
Section titled “CrewAI 1.14.7a1”What’s Changed
Section titled “What’s Changed”Features
Section titled “Features”- Add crew trained agents file support
- Add native Snowflake Cortex LLM provider
- Add Databricks integration guide
- Add Snowflake integration guide
Bug Fixes
Section titled “Bug Fixes”- …
链接:https://github.com/crewAIInc/crewAI/releases/tag/1.14.7a1
Goose v1.37.0
Section titled “Goose v1.37.0”✨ Features
Section titled “✨ Features”- xAI SuperGrok OAuth subscription provider #9420
- Replay ACP images on session load [#9496](https://github.com/aaif-goose/goose/pull/9…
链接:https://github.com/aaif-goose/goose/releases/tag/v1.37.0
OpenAI Codex CLI rust-v0.138.0-alpha.4
Section titled “OpenAI Codex CLI rust-v0.138.0-alpha.4”Release 0.138.0-alpha.4
…
链接:https://github.com/openai/codex/releases/tag/rust-v0.138.0-alpha.4
Anthropic’s open-source framework for AI-powered vulnerability discovery
Section titled “Anthropic’s open-source framework for AI-powered vulnerability discovery”Article URL: https://github.com/anthropics/defending-code-reference-harness Comments URL: https://news.ycombinator.com/item?id=48403980 Points: 273
Comments: 94
Section titled “Comments: 94”When AI Builds Itself: Our progress toward recursive self-improvement
Section titled “When AI Builds Itself: Our progress toward recursive self-improvement”Article URL: https://www.anthropic.com/institute/recursive-self-improvement Comments URL: https://news.ycombinator.com/item?id=48400842 Points: 336
Comments: 443
Section titled “Comments: 443”Google employees internally share memes about how its AI sucks
Section titled “Google employees internally share memes about how its AI sucks”Article URL: https://www.404media.co/google-employees-internally-share-memes-about-how-its-ai-sucks/ Comments URL: https://news.ycombinator.com/item?id=48400311 Points: 147
Comments: 103
Section titled “Comments: 103”The LLM warnings Google fired Timnit Gebru over have all come true
Section titled “The LLM warnings Google fired Timnit Gebru over have all come true”Article URL: https://www.tumblr.com/dreaminginthedeepsouth/817865966907228160/darren-oconnor-timnit-gebru-was-fired-from Comments URL: https://news.ycombinator.com/item?id=48400213 Points: 105
Comments: 100
Section titled “Comments: 100”AI, Ashby Engineering, and the future
Section titled “AI, Ashby Engineering, and the future”Article URL: https://www.ashbyhq.com/blog/engineering/ai-ashby-engineering-and-the-future Comments URL: https://news.ycombinator.com/item?id=48399528 Points: 56
Comments: 38
Section titled “Comments: 38”Show HN: Boxes.dev: ditch localhost; run Claude Code and Codex in the cloud
Section titled “Show HN: Boxes.dev: ditch localhost; run Claude Code and Codex in the cloud”Hi HN, we’re Nick and Drew, and we’re building boxes.dev – the first cloud-only agentic dev environment (ADE) that gives every Codex and Claude Code agent its own cloud computer.We’re two engineers who previously built Gem (co-founder/CTO and first hire), and we spent the last year coding almost exc
The ways we contain Claude across products
Section titled “The ways we contain Claude across products”Article URL: https://www.anthropic.com/engineering/how-we-contain-claude Comments URL: https://news.ycombinator.com/item?id=48392082 Points: 221
Comments: 94
Section titled “Comments: 94”Failing grades soar with AI usage, dwindling math skills in Berkeley CS classes
Section titled “Failing grades soar with AI usage, dwindling math skills in Berkeley CS classes”Article URL: https://www.dailycal.org/news/campus/academics/failing-grades-soar-as-professors-see-greater-ai-usage-dwindling-math-skills-in-uc-berkeley/article_16fad0bf-02cb-4b8c-8d88-888ffd9f8608.html Comments URL: https://news.ycombinator.com/item?id=48392004 Points: 733