Skip to content

AI 速递 2026-06-05

生成时间:2026/6/5 10:07:55(UTC: 2026-06-05T02:07:55.012Z)

👍 88 · arXiv

Audio is an inherently interactive modality, yet today’s Large Audio Language Models (LALMs) are offline, and streaming audio models each handle only a single task such as streaming ASR or voice chatt…

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Section titled “Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories”

👍 45 · arXiv

Deep-research agents solve tasks through long trajectories of search, tool use, evidence inspection, and answer synthesis. Evaluation based on final answers shows whether an agent succeeds, but not wh…

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

Section titled “Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning”

👍 35 · arXiv

Rubric-based reinforcement learning (RL) uses an LLM-as-a-Judge (LaaJ) to score model outputs according to rubrics as rewards. However, policy models may exploit latent biases in the judge, leading to…

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Section titled “OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs”

👍 29 · arXiv

Multimodal agents in robotics, AR, and autonomous driving must reason about places and layouts from continuous egocentric streams, often using evidence outside the current view. Existing benchmarks ei…

ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning

Section titled “ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning”

👍 24 · arXiv

Large Reasoning Models (LRMs) have achieved remarkable progress thanks to Reinforcement Learning with Verifiable Rewards (RLVR) on Chain-of-Thoughts (CoTs). However, since long CoTs naturally contain …

  • Plugin and skill installs now use an operator install policy instead of the old dangerous-code scanner path, with clearer doctor, CLI, ClawHub, and troubleshooting surfa…

链接https://github.com/openclaw/openclaw/releases/tag/v2026.6.2-beta.1

Changes since langchain-deepseek==1.0.1

chore(infra): bump langchain-tests floor to 1.1.9 (#37610) chore: bump idna from 3.10 to 3.15 in /libs/partners/deepseek (#37560) ci(infra): harden Dependabo…

链接https://github.com/langchain-ai/langchain/releases/tag/langchain-deepseek%3D%3D1.1.0

Full Changelog: https://g

链接https://github.com/ollama/ollama/releases/tag/v0.30.5

  • Add crew trained agents file support
  • Add native Snowflake Cortex LLM provider
  • Add Databricks integration guide
  • Add Snowflake integration guide

链接https://github.com/crewAIInc/crewAI/releases/tag/1.14.7a1

链接https://github.com/aaif-goose/goose/releases/tag/v1.37.0

Release 0.138.0-alpha.4

链接https://github.com/openai/codex/releases/tag/rust-v0.138.0-alpha.4

Anthropic’s open-source framework for AI-powered vulnerability discovery

Section titled “Anthropic’s open-source framework for AI-powered vulnerability discovery”

Article URL: https://github.com/anthropics/defending-code-reference-harness Comments URL: https://news.ycombinator.com/item?id=48403980 Points: 273

来源Hacker News AI

When AI Builds Itself: Our progress toward recursive self-improvement

Section titled “When AI Builds Itself: Our progress toward recursive self-improvement”

Article URL: https://www.anthropic.com/institute/recursive-self-improvement Comments URL: https://news.ycombinator.com/item?id=48400842 Points: 336

来源Hacker News AI

Google employees internally share memes about how its AI sucks

Section titled “Google employees internally share memes about how its AI sucks”

Article URL: https://www.404media.co/google-employees-internally-share-memes-about-how-its-ai-sucks/ Comments URL: https://news.ycombinator.com/item?id=48400311 Points: 147

来源Hacker News AI

The LLM warnings Google fired Timnit Gebru over have all come true

Section titled “The LLM warnings Google fired Timnit Gebru over have all come true”

Article URL: https://www.tumblr.com/dreaminginthedeepsouth/817865966907228160/darren-oconnor-timnit-gebru-was-fired-from Comments URL: https://news.ycombinator.com/item?id=48400213 Points: 105

来源Hacker News AI

Article URL: https://www.ashbyhq.com/blog/engineering/ai-ashby-engineering-and-the-future Comments URL: https://news.ycombinator.com/item?id=48399528 Points: 56

来源Hacker News AI

Show HN: Boxes.dev: ditch localhost; run Claude Code and Codex in the cloud

Section titled “Show HN: Boxes.dev: ditch localhost; run Claude Code and Codex in the cloud”

Hi HN, we’re Nick and Drew, and we’re building boxes.dev – the first cloud-only agentic dev environment (ADE) that gives every Codex and Claude Code agent its own cloud computer.We’re two engineers who previously built Gem (co-founder/CTO and first hire), and we spent the last year coding almost exc

来源Hacker News AI

The ways we contain Claude across products

Section titled “The ways we contain Claude across products”

Article URL: https://www.anthropic.com/engineering/how-we-contain-claude Comments URL: https://news.ycombinator.com/item?id=48392082 Points: 221

来源Hacker News AI

Failing grades soar with AI usage, dwindling math skills in Berkeley CS classes

Section titled “Failing grades soar with AI usage, dwindling math skills in Berkeley CS classes”

Article URL: https://www.dailycal.org/news/campus/academics/failing-grades-soar-as-professors-see-greater-ai-usage-dwindling-math-skills-in-uc-berkeley/article_16fad0bf-02cb-4b8c-8d88-888ffd9f8608.html Comments URL: https://news.ycombinator.com/item?id=48392004 Points: 733

来源Hacker News AI