Skip to content

周报 2026-06-08 ~ 2026-06-14

生成时间:2026/6/14 13:52:07(UTC: 2026-06-14T05:52:07.694Z)

本周自动总结未启用或调用失败,以下为原始内容合并。

生成时间:2026/6/8 10:31:33(UTC: 2026-06-08T02:31:33.927Z)

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Section titled “Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution”

👍 72 · arXiv

Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing methods inject this knowledge as long inputs (retrieved through RAG or dependency analysi…

ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?

Section titled “ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?”

👍 45 · arXiv

Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a give…

TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration

Section titled “TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration”

👍 38 · arXiv

Agents are widely deployed as assistants over documents, tools, and code. However, they typically act only on explicit user requests, which surface only the problems the user has noticed, while many o…

AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints

Section titled “AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints”

👍 37 · arXiv

Planning for real-world problems by language models often involves both world and user constraints, which may not be fully specified upfront and are progressively disclosed through interaction. Howeve…

RobotValues: Evaluating Household Robots When Human Values Conflict

Section titled “RobotValues: Evaluating Household Robots When Human Values Conflict”

👍 24 · arXiv

While household robots are often evaluated based on task completion, everyday domestic environments involve value-conflicting situations in which robots are expected to choose actions that prioritize …

  • QQBot now strips model reasoning/thinking scaffolding before native delivery, preventing raw <thinking> content from leaking into channel replies. (#89913, #90132) Thanks @openper…

链接https://github.com/openclaw/openclaw/releases/tag/v2026.6.5-beta.2

链接https://github.com/ollama/ollama/releases/tag/v0.30.7-rc1

Release 0.138.0-alpha.6

链接https://github.com/openai/codex/releases/tag/rust-v0.138.0-alpha.6

We’re likely to see more price increases as the big AI companies plan to go public.

来源TechCrunch AI

Notion restores access to Anthropic after service disruption

Section titled “Notion restores access to Anthropic after service disruption”

Notion’s head of product said he was “astonished” at “the amount of people RT-ing this.”

来源TechCrunch AI

OpenAI is still working on that ‘super app’

Section titled “OpenAI is still working on that ‘super app’”

“Chat is dead” — at least, according to a senior OpenAI employee.

来源TechCrunch AI

OpenAI unveils Lockdown Mode to protect sensitive data from prompt injection attacks

Section titled “OpenAI unveils Lockdown Mode to protect sensitive data from prompt injection attacks”

Even with Lockdown Mode, ChatGPT could be still vulnerable to prompt injections, but the goal is to reduce the likelihood that sensitive data gets shared in the process.

来源TechCrunch AI

What to expect from WWDC 2026: Siri’s highly anticipated revamp and Apple Intelligence updates

Section titled “What to expect from WWDC 2026: Siri’s highly anticipated revamp and Apple Intelligence updates”

Apple’s WWDC nears: Here’s what you can look forward to.

来源TechCrunch AI

Sriram Krishnan is leaving his role as White House AI advisor

Section titled “Sriram Krishnan is leaving his role as White House AI advisor”

Krishnan is reportedly starting a new institution to continue shaping Trump’s AI policy.

来源TechCrunch AI

The Trump administration might take an equity stake in OpenAI

Section titled “The Trump administration might take an equity stake in OpenAI”

President Donald Trump said he’s discussing deals “where the American people can benefit from the success of AI.”

来源TechCrunch AI

Startup Battlefield 200 applications officially close in 3 days

Section titled “Startup Battlefield 200 applications officially close in 3 days”

Applications for Startup Battlefield 200 officially close on June 8, 11:59 p.m. PT. Don’t wait any longer. Secure your shot at competing on the Disrupt Stage at TechCrunch Disrupt 2026 this October at San Francisco’s Moscone West.

来源TechCrunch AI


生成时间:2026/6/9 09:56:03(UTC: 2026-06-09T01:56:03.617Z)

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

Section titled “Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings”

👍 71 · arXiv

Large language models exhibit impressive zero-shot capabilities across a wide range of downstream tasks. However, they struggle to function as off-the-shelf embedding models, leading to suboptimal per…

SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations

Section titled “SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations”

👍 43 · arXiv

Evaluating LLM mediators remains challenging, as mediation unfolds as a real-time trajectory shaped by disputants’ shifting emotions, intentions, and context. Existing testbeds rely on a few expert-au…

GENEB: Why Genomic Models Are Hard to Compare

Section titled “GENEB: Why Genomic Models Are Hard to Compare”

👍 42 · arXiv

Progress in genomic foundation models is difficult to assess due to fragmented benchmarks, incompatible evaluation protocols, and task-specific reporting. As a result, claims of superiority or general…

MMAE: A Massive Multitask Audio Editing Benchmark

Section titled “MMAE: A Massive Multitask Audio Editing Benchmark”

👍 39 · arXiv

We introduce MMAE, a Massive Multitask Audio Editing benchmark, serving as the first comprehensive evaluation testbed designed for general-purpose instruction-based audio editing. Spurred by the shift…

AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization

Section titled “AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization”

👍 24 · arXiv

Despite being a pivotal frontier, interactive world modeling remains underexplored in terms of the versatile controllability required by practical scenarios. To bridge this gap, we present AnchorWorld…

  • QQBot now strips model reasoning/thinking scaffolding before native delivery, preventing raw <thinking> content from leaking into channel replies. (#89913, #90132) Thanks @openper…

链接https://github.com/openclaw/openclaw/releases/tag/v2026.6.5-beta.5

Changes since langchain-core==1.4.1

release(core): 1.4.2 (#37968) feat(core): deprecate problematic dict() method (#31685)…

链接https://github.com/langchain-ai/langchain/releases/tag/langchain-core%3D%3D1.4.2

Ollama Launch now supports Hermes Desktop, a native desktop interface for the Hermes agent. Run it alongside your Hermes agent to get a visual interface for managing conversations, integrations, and m…

链接https://github.com/ollama/ollama/releases/tag/v0.30.7

  • The /app command can now hand off the current CLI thread into Codex Desktop on macOS and native Windows, and Windows workspace launches can open directly into Desktop instead of s…

链接https://github.com/openai/codex/releases/tag/rust-v0.138.0

Apple reveals new AI architecture built around Google Gemini models

Section titled “Apple reveals new AI architecture built around Google Gemini models”

Article URL: https://www.macrumors.com/2026/06/08/apple-reveals-new-ai-architecture/ Comments URL: https://news.ycombinator.com/item?id=48450142 Points: 332

来源Hacker News AI

Article URL: https://developer.apple.com/documentation/coreai/ Comments URL: https://news.ycombinator.com/item?id=48449665 Points: 196

来源Hacker News AI

Ask HN: What are tools you have made for yourself since the advent of AI?

Section titled “Ask HN: What are tools you have made for yourself since the advent of AI?”

Comments URL: https://news.ycombinator.com/item?id=48449187 Points: 148

来源Hacker News AI

Article URL: https://www.apple.com/apple-intelligence/ Comments URL: https://news.ycombinator.com/item?id=48449084 Points: 412

来源Hacker News AI

Article URL: https://www.wheresyoured.at/ai-is-slowing-down/ Comments URL: https://news.ycombinator.com/item?id=48446893 Points: 397

来源Hacker News AI

SDSU Wired Its Dorms with 1,300 AI Cameras Without Telling Students

Section titled “SDSU Wired Its Dorms with 1,300 AI Cameras Without Telling Students”

Article URL: https://reclaimthenet.org/sdsu-adds-1300-ai-cameras-330-in-student-dorms Comments URL: https://news.ycombinator.com/item?id=48440994 Points: 52

来源Hacker News AI

DeepSeek V4 Pro beats GPT-5.5 Pro on precision

Section titled “DeepSeek V4 Pro beats GPT-5.5 Pro on precision”

Article URL: https://runtimewire.com/article/deepseek-v4-pro-beats-gpt-5-5-pro-on-precision Comments URL: https://news.ycombinator.com/item?id=48440448 Points: 389

来源Hacker News AI

Article URL: https://leoveanu.com/2026-06-06-qwen3.7max/ Comments URL: https://news.ycombinator.com/item?id=48435371 Points: 144

来源Hacker News AI


生成时间:2026/6/10 10:08:06(UTC: 2026-06-10T02:08:06.131Z)

👍 83 · arXiv

Recent AI systems have achieved strong results on a wide range of benchmarks, yet these gains have not translated into economically meaningful deployment across many professional domains. We argue tha…

LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents

Section titled “LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents”

👍 51 · arXiv

Agent systems increasingly use textual skills to encode reusable task procedures, but injecting these skills into the prompt at every step incurs substantial context overhead and exposes skill content…

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

Section titled “OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics”

👍 16 · arXiv

Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single first-attempt score per (agent, game) pair, …

A Geometric Account of Activation Steering through Angle-Norm Decomposition

Section titled “A Geometric Account of Activation Steering through Angle-Norm Decomposition”

👍 15 · arXiv

Linear activation steering has gained popularity as a simple and empirically effective way to control language model behavior. More recently, spherical steering paradigms have been proposed to address…

SwiftVR: Real-Time One-Step Generative Video Restoration

Section titled “SwiftVR: Real-Time One-Step Generative Video Restoration”

👍 12 · arXiv

Real-time video restoration (VR) for live streams requires high-resolution outputs under strict per-frame latency constraints. Existing one-step diffusion-based VR models remain difficult to deploy on…

  • QQBot now strips model reasoning/thinking scaffolding before native delivery, preventing raw <thinking> content from leaking into channel replies. (#89913, #90132) Thanks @openper…

链接https://github.com/openclaw/openclaw/releases/tag/v2026.6.5

Changes since langchain==1.3.5

release(langchain): 1.3.6 (#38001) fix(langchain): preserve summarization trigger compatibility (#38000)…

链接https://github.com/langchain-ai/langchain/releases/tag/langchain%3D%3D1.3.6

  • Migrate @listen/@router runtime to read from FlowDefinition
  • Add pluggable default backends for memory, knowledge, rag, and flow
  • Update changelo…

链接https://github.com/crewAIInc/crewAI/releases/tag/1.14.7a4

  • Code mode can now call standalone web search directly, including from nested JavaScript tool calls, and receive plaintext search results. (#26719)
  • Tool and connector input schemas …

链接https://github.com/openai/codex/releases/tag/rust-v0.139.0

AI misidentification results in wrongful arrest; man seeks justice

Section titled “AI misidentification results in wrongful arrest; man seeks justice”

Article URL: https://www.wsoctv.com/news/local/ai-misidentification-results-wrongful-arrest-man-seeks-justice/I7UQJWV33FBN3LMKHCSXI6FIVA/ Comments URL: https://news.ycombinator.com/item?id=48468789 Points: 75

来源Hacker News AI

If Claude Fable stops helping you, you’ll never know

Section titled “If Claude Fable stops helping you, you’ll never know”

Article URL: https://jonready.com/blog/posts/claude-fable5-is-allowed-to-sabotage-your-app-if-youre-a-competitor.html Comments URL: https://news.ycombinator.com/item?id=48467896 Points: 504

来源Hacker News AI

Apple’s AI Can Now Change Your Passwords. What Could Possibly Go Wrong?

Section titled “Apple’s AI Can Now Change Your Passwords. What Could Possibly Go Wrong?”

Article URL: https://www.kylereddoch.me/blog/apples-ai-can-now-change-your-passwords-what-could-possibly-go-wrong/ Comments URL: https://news.ycombinator.com/item?id=48465744 Points: 78

来源Hacker News AI

CEOs who think AI replaces their employees are just bad CEOs

Section titled “CEOs who think AI replaces their employees are just bad CEOs”

Article URL: https://www.techdirt.com/2026/06/09/ceos-who-think-ai-replaces-their-employees-are-just-bad-ceos/ Comments URL: https://news.ycombinator.com/item?id=48465675 Points: 435

来源Hacker News AI

Article URL: https://naokishibuya.github.io/blog/2022-12-30-gpt-2-2019/ Comments URL: https://news.ycombinator.com/item?id=48465269 Points: 255

来源Hacker News AI

Article URL: https://www.apollo.com/wealth/the-daily-spark/where-is-the-ai-jobs-crisis Comments URL: https://news.ycombinator.com/item?id=48464333 Points: 137

来源Hacker News AI

System Card: Claude Fable 5 and Claude Mythos 5 [pdf]

Section titled “System Card: Claude Fable 5 and Claude Mythos 5 [pdf]”

Article URL: https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf Comments URL: https://news.ycombinator.com/item?id=48463811 Points: 211

来源Hacker News AI

System Card [pdf]: https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3

Comments URL: https://news.ycombinator.com/item?id=48463808 Points: 1807

来源Hacker News AI


生成时间:2026/6/11 10:31:51(UTC: 2026-06-11T02:31:51.497Z)

👍 196 · arXiv

We present ABot-Earth 0.5, a generative 3D framework designed to synthesize vast, seamless 3D environments from ubiquitous, geospatially referenced satellite imagery. To achieve this, we propose a nov…

👍 171 · arXiv

We introduce Kwai Keye-VL-2.0-30B-A3B, an open-source Mixture-of-Experts (MoE) multimodal foundation model designed to advance long-video understanding and agentic intelligence. To address the challen…

Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

Section titled “Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution”

👍 73 · arXiv

Although Large Language Model (LLM) agents have demonstrated strong performance on complex tasks, their learning is often limited by inefficient interaction feedback and static training environments, …

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Section titled “Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts”

👍 48 · arXiv

AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods…

SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

Section titled “SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research”

👍 46 · arXiv

Large language models are increasingly expected to handle complex, long-horizon real-world tasks whose context demands can grow without bound, yet model context windows remain inherently finite. Recen…

  • Security boundaries are substantially tighter across transcripts, sandbox binds, host environment inheritance, MCP stdio, Codex HTTP access, native search policy, elevat…

链接https://github.com/openclaw/openclaw/releases/tag/v2026.6.6-beta.1

Changes since langchain==1.3.6

release(langchain): 1.3.7 (#38024) style(langchain): add ruff rules ARG (#34435) feat(langchain): add ProviderToolSearchMiddleware (#37969) chore(langchain): activate…

链接https://github.com/langchain-ai/langchain/releases/tag/langchain%3D%3D1.3.7

  • Add reset_runtime_state to release accumulated bus state
  • Handle supporting both custom prompts
  • Decouple conversation logic from runtime and add a `conversationa…

链接https://github.com/crewAIInc/crewAI/releases/tag/1.14.7rc1

Release 0.140.0-alpha.7

链接https://github.com/openai/codex/releases/tag/rust-v0.140.0-alpha.7

xAI fired an engineer who raised alarms about Grok safety, new lawsuit claims

Section titled “xAI fired an engineer who raised alarms about Grok safety, new lawsuit claims”

A former xAI engineer is suing the company and SpaceX, alleging he was fired for raising AI safety concerns about Grok days before SpaceX’s historic IPO.

来源TechCrunch AI

Fresh off bond sale, Amazon borrows $17.5B from banks as AI spending continues

Section titled “Fresh off bond sale, Amazon borrows $17.5B from banks as AI spending continues”

Companies are burning through exorbitant sums of money to keep pace in the AI arms race. Debt is climbing.

来源TechCrunch AI

‘AI-pilled’ firms spend $7,500 per employee each month on AI

Section titled “‘AI-pilled’ firms spend $7,500 per employee each month on AI”

The most AI-obsessed firms are spending roughly $7,500 monthly per employee on AI, per Ramp AI Index. That’s not more than an engineer’s salary — yet.

来源TechCrunch AI

New research suggests that AI memory systems can degrade model performance and encourage sycophantic tendencies.

来源TechCrunch AI

Cybersecurity researchers aren’t happy about the guardrails on Anthropic’s Fable

Section titled “Cybersecurity researchers aren’t happy about the guardrails on Anthropic’s Fable”

Cybersecurity researchers are complaining that Anthropic’s new model Fable has guardrails that are too strict for any cybersecurity work.

来源TechCrunch AI

Datadog veterans launch AI coding startup Niteshift on a bet against Big AI lock-in

Section titled “Datadog veterans launch AI coding startup Niteshift on a bet against Big AI lock-in”

AI coding agent startup Niteshift has raised a $7 million seed round from a who’s who of angels. It’s betting companies will want power over, not lock-in with model makers.

来源TechCrunch AI

The three hard-tech moonshots fueling SpaceX’s unbelievable IPO

Section titled “The three hard-tech moonshots fueling SpaceX’s unbelievable IPO”

Most of the value in SpaceX’s IPO is effectively a call option on the company’s ambitious space data center plans.

来源TechCrunch AI

Warner Music acquires AI attribution startup Sureel AI

Section titled “Warner Music acquires AI attribution startup Sureel AI”

Through the acquisition, WMG aims to better track when its artists’ work is used in AI-generated content or for training AI models.

来源TechCrunch AI


生成时间:2026/6/12 10:14:50(UTC: 2026-06-12T02:14:50.395Z)

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Section titled “Redesign Mixture-of-Experts Routers with Manifold Power Iteration”

👍 76 · arXiv

Router is the cornerstone component to the Mixture-of-Experts models. Serving as expert proxies, the rows of the router matrix compute their similarity to the MoE inputs to determine which subset of e…

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Section titled “Toward Generalist Autonomous Research via Hypothesis-Tree Refinement”

👍 71 · arXiv

Scientific progress depends on a repeated loop of exploration, experimentation, and abstraction. Researchers test candidate directions, interpret the evidence, and carry the resulting lessons into lat…

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

Section titled “Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks”

👍 56 · arXiv

General-purpose agents such as OpenClaw are increasingly used as autonomous tool users, but their coding ability is difficult to measure under SWE-bench: a generic agent does not by itself satisfy the…

Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application

Section titled “Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application”

👍 56 · arXiv

Environments serve as interactive systems for large language model (LLM) based agents across diverse scenarios and play a crucial role in driving the continual evolution of model capabilities. Despite…

Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions

Section titled “Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions”

👍 52 · arXiv

Reward models are central to text-to-image post-training, but visual preference is subjective and better represented as a distribution over rubric scores than as a deterministic scalar. Existing scala…

  • Security boundaries are substantially tighter across transcripts, sandbox binds, host environment inheritance, MCP stdio, Codex HTTP access, native search policy, elevat…

链接https://github.com/openclaw/openclaw/releases/tag/v2026.6.6-beta.1

Changes since langchain-model-profiles==0.0.5

release(model-profiles): 0.0.6 (#38057) feat(standard-tests): validate tool call chunks during streaming (#34707) hotfix(core): bump lockfile(s) (#38032)…

链接https://github.com/langchain-ai/langchain/releases/tag/langchain-model-profiles%3D%3D0.0.6

  • Add pluggable default backends for memory, knowledge, rag, and flow.
  • Surface real finish_reason, sampling params, and response.id on LLM events.
  • Type DSL triggers…

链接https://github.com/crewAIInc/crewAI/releases/tag/1.14.7

Release 0.140.0-alpha.13

链接https://github.com/openai/codex/releases/tag/rust-v0.140.0-alpha.13

Theker just raised $85M to build the factory robot that doesn’t specialize in anything

Section titled “Theker just raised $85M to build the factory robot that doesn’t specialize in anything”

Unlike humanoid robots designed around a fixed form — think Boston Dynamics — Theker’s machines are built to be reconfigured.

来源TechCrunch AI

Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world

Section titled “Jeff Bezos’s Prometheus raises $12B to build an ‘artificial general engineer’ for the physical world”

The new round values the physical AI startup that aims to automate heavy engineering and drug design at $41 billion.

来源TechCrunch AI

SpaceX officially prices shares at $135 in the largest IPO ever

Section titled “SpaceX officially prices shares at $135 in the largest IPO ever”

Wits its official share pricing announcement, SpaceX’s IPO has begun.

来源TechCrunch AI

SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift

Section titled “SpaceX SPV investors won’t know their true holdings until post-IPO lock-ups lift”

After SpaceX makes its public debut, lower-tier SPV investors face hidden fees, lengthy payout delays, and the risk of outright fraud.

来源TechCrunch AI

Deezer’s new tool can identify AI music from Spotify, Apple Music, and others

Section titled “Deezer’s new tool can identify AI music from Spotify, Apple Music, and others”

Deezer introduced a tool that scans playlists from Spotify, Apple Music, and other platforms to identify AI music.

来源TechCrunch AI

Pool’s new app turns your screenshots into something useful

Section titled “Pool’s new app turns your screenshots into something useful”

Pool’s new app automatically sorts screenshots into personalized collections, tracks down the original links behind saved content, and helps you rediscover products, recipes, travel ideas, and other things you meant to revisit.

来源TechCrunch AI

DoorDash’s new AI chatbot lets you order with prompts and photos

Section titled “DoorDash’s new AI chatbot lets you order with prompts and photos”

The new chatbot, called Ask DoorDash, allows users to search the app for what they’re looking for in their own words instead of having to scroll through restaurants and stores to build a cart.

来源TechCrunch AI

Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing

Section titled “Opendoor’s India exit is fueling a bigger conversation about AI and outsourcing”

The decision comes as India emerges as the world’s largest GCC market.

来源TechCrunch AI


生成时间:2026/6/13 10:07:34(UTC: 2026-06-13T02:07:34.635Z)

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Section titled “EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments”

👍 105 · arXiv

Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dyna…

👍 83 · arXiv

Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memory all require the model to jointly attend over hundre…

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Section titled “SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning”

👍 80 · arXiv

Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision-language models (VLMs). Tool-augmented agents attemp…

InterleaveThinker: Reinforcing Agentic Interleaved Generation

Section titled “InterleaveThinker: Reinforcing Agentic Interleaved Generation”

👍 73 · arXiv

Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-image generation and editing. However, constrained by their architectures, they canno…

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

Section titled “Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?”

👍 71 · arXiv

Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in visual understanding, yet their performance degrades significantly under real-world visual corruptions. While existing …

  • Security boundaries are substantially tighter across transcripts, sandbox binds, host environment inheritance, MCP stdio, Codex HTTP access, native search policy, elevated sender ch…

链接https://github.com/openclaw/openclaw/releases/tag/v2026.6.6

Changes since langchain==1.3.8

release(anthropic): 1.4.6 (#38105) release(langchain): 1.3.9 (#38104) fix(langchain,anthropic): confine file-search results and tighten anthropic allowed_prefixes (#3…

链接https://github.com/langchain-ai/langchain/releases/tag/langchain%3D%3D1.3.9

Please note that Minimax M3 is not yet supported in this version. Please follow vLLM recipe for usage guides for M3.

链接https://github.com/vllm-project/vllm/releases/tag/v0.23.0

  • Fixed ollama launch selecting the wrong provider in some cases
  • Improved prompt caching by decoupling it from context shift for better KV cache reuse
  • More stable MLX infere…

链接https://github.com/ollama/ollama/releases/tag/v0.30.8

  • Add pluggable default backends for memory, knowledge, rag, and flow.
  • Surface real finish_reason, sampling params, and response.id on LLM events.
  • Type DSL triggers…

链接https://github.com/crewAIInc/crewAI/releases/tag/1.14.7

Release 0.140.0-alpha.17

链接https://github.com/openai/codex/releases/tag/rust-v0.140.0-alpha.17

How to setup a local coding agent on macOS

Section titled “How to setup a local coding agent on macOS”

Article URL: https://ikyle.me/blog/2026/how-to-setup-a-local-coding-agent-on-macos Comments URL: https://news.ycombinator.com/item?id=48507020 Points: 266

来源Hacker News AI

Show HN: Script to bulk delete Claude chats from the web UI

Section titled “Show HN: Script to bulk delete Claude chats from the web UI”

I haven’t found a way to delete all chats in bulk like you can on Chatgpt. With Claude, you have to scroll to the bottom, select everything, and delete. The problem is, if you have a lot of chats, it becomes impossible. I created this script. It does it alone. I hope it helps someone.(conversations

来源Hacker News AI

Slightly reducing the sloppiness of AI generated front end

Section titled “Slightly reducing the sloppiness of AI generated front end”

Article URL: https://envs.net/~volpe/blog/posts/reduce-slop.html Comments URL: https://news.ycombinator.com/item?id=48504912 Points: 165

来源Hacker News AI

AI agent bankrupted their operator while trying to scan DN42

Section titled “AI agent bankrupted their operator while trying to scan DN42”

Article URL: https://lantian.pub/en/article/fun/ai-agent-bankrupted-their-operator-scan-dn42lantian.lantian/ Comments URL: https://news.ycombinator.com/item?id=48500012 Points: 1394

来源Hacker News AI

Article URL: https://simonwillison.net/2026/Jun/11/fable-is-relentlessly-proactive/ Comments URL: https://news.ycombinator.com/item?id=48498573 Points: 727

来源Hacker News AI

Shall we play a game? My AI nuclear simulation

Section titled “Shall we play a game? My AI nuclear simulation”

https://arxiv.org/pdf/2602.14740

Comments URL: https://news.ycombinator.com/item?id=48495575 Points: 204

来源Hacker News AI

Article URL: https://xenodium.com/agent-shell-0-55-updates Comments URL: https://news.ycombinator.com/item?id=48493273 Points: 62

来源Hacker News AI

Claude Fable 5: mid-tier results on coding tasks

Section titled “Claude Fable 5: mid-tier results on coding tasks”

Article URL: https://www.endorlabs.com/learn/claude-fable-5-mythos-grade-hype Comments URL: https://news.ycombinator.com/item?id=48492210 Points: 394

来源Hacker News AI