Skip to content

AI 速递 2026-06-14

生成时间:2026/6/14 10:30:04(UTC: 2026-06-14T02:30:04.025Z)

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Section titled “EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments”

👍 118 · arXiv

Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dyna…

👍 103 · arXiv

Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memory all require the model to jointly attend over hundre…

WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces

Section titled “WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces”

👍 94 · arXiv

Computer-use agents (CUAs) increasingly operate in runtimes that combine visual desktop control, command-line execution, code editing, browsers, and external tools. Existing benchmarks, however, often…

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Section titled “SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning”

👍 80 · arXiv

Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision-language models (VLMs). Tool-augmented agents attemp…

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

Section titled “Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?”

👍 74 · arXiv

Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in visual understanding, yet their performance degrades significantly under real-world visual corruptions. While existing …

  • Telegram and WhatsApp channel delivery are richer and less brittle: Telegram can send structured rich text with tables, lists, expandable blockquotes, prompt-preserving …

链接https://github.com/openclaw/openclaw/releases/tag/v2026.6.8-beta.1

Changes since langchain-openai==1.3.1

release(openai): 1.3.2 (#38130)…

链接https://github.com/langchain-ai/langchain/releases/tag/langchain-openai%3D%3D1.3.2

Please note that Minimax M3 is not yet supported in this version. Please follow vLLM recipe for usage guides for M3.

链接https://github.com/vllm-project/vllm/releases/tag/v0.23.0

  • Fixed ollama launch selecting the wrong provider in some cases
  • Improved prompt caching by decoupling it from context shift for better KV cache reuse
  • More stable MLX infere…

链接https://github.com/ollama/ollama/releases/tag/v0.30.8

Release 0.140.0-alpha.19

链接https://github.com/openai/codex/releases/tag/rust-v0.140.0-alpha.19

Meta reportedly moves to unwind $2B Manus deal after Beijing’s demand

Section titled “Meta reportedly moves to unwind $2B Manus deal after Beijing’s demand”

Meta starts dismantling its $2 billion Manus acquisition after Beijing ordered the deal reversed.

来源TechCrunch AI

KPMG pulls report on AI usage due to apparent hallucinations

Section titled “KPMG pulls report on AI usage due to apparent hallucinations”

Once again, AI proves to be an unreliable source of information about AI.

来源TechCrunch AI

Amazon CEO reportedly raised Anthropic model concerns before government crackdown

Section titled “Amazon CEO reportedly raised Anthropic model concerns before government crackdown”

Amazon CEO Andy Jassy may have been the source of security concerns that led Anthropic to cut off worldwide access to two models on Friday.

来源TechCrunch AI

OpenAI faces investigation from state attorneys general

Section titled “OpenAI faces investigation from state attorneys general”

It’s not clear which states are involved, but they’re asking about everything from OpenAI’s ad policies to its handling of health data.

来源TechCrunch AI

Andrew Yang thinks the next big startup opportunity is lowering the cost of living

Section titled “Andrew Yang thinks the next big startup opportunity is lowering the cost of living”

Andrew Yang made a list of everything Americans overpay for — housing, food, wireless — and thinks the next startup gold rush is giving that money back.

来源TechCrunch AI

Anthropic’s safety warnings may have just backfired — the government has pulled the plug on its most powerful AI

Section titled “Anthropic’s safety warnings may have just backfired — the government has pulled the plug on its most powerful AI”

Anthropic isn’t hiding its frustration. “We disagree that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people,” the company wrote in a blog post.

来源TechCrunch AI

SpaceX IPO: Live updates on everything you need to know

Section titled “SpaceX IPO: Live updates on everything you need to know”

TechCrunch has followed SpaceX’s start, struggles, and successes from the early days. And we’re here for what happens next too. This package of SpaceX IPO coverage includes who stands to win (and maybe some who won’t), pre-IPO deals, and what’s tucked inside its S-1 registration document.

来源TechCrunch AI

Meta’s months-old AI unit is a soul-crushing gulag, say the engineers stuck inside it

Section titled “Meta’s months-old AI unit is a soul-crushing gulag, say the engineers stuck inside it”

A new report suggests the unit, which employs 6,500 people, is on the verge of revolt.

来源TechCrunch AI