BlueleafBlueleaf
Computer Science & AI
Back to issueComputer Science & AI

Silmaril CTO Weekly

The Week Agents Entered the Blast Radius

12 min read16 min audio

Summary

For six minutes on Monday evening, a trusted JavaScript release path did exactly what it had been designed to do. It built packages, attached provenance, and published to npm under a legitimate identity. By the time the week ended, TanStack had documented 84 malicious versions across 42 packages, OpenAI had rotated certificates after two employee devices were hit, and Google had described what it believes was a zero-day exploit developed with AI assistance. The read for you is that agent security is leaving the world of policy statements and entering the boring, expensive, measurable parts of software operations. The systems now under pressure are release pipelines, package caches, browser agents, coding sandboxes, mobile GUIs, humanoid controllers, and finance workflows. Every one of them has a model somewhere near an action surface. For Silmaril, that makes the week unusually useful. It tied together three threads that usually arrive separately. Attackers are finding the tool paths around models. Competitors are selling governance primitives instead of abstract safety. Research labs are trying to make agents more capable with memory, belief states, verification, and cheaper inference. The product question is no longer whether an LLM can be tricked. It is how much of the surrounding system must become inspectable before a customer can trust the next action. Six Minutes in the Build Cache TanStack's postmortem (security advisory) is the cleanest incident anatomy from the window. On May 11, the project said an attacker published malicious versions across the @tanstack/* namespace by chaining a pullrequesttarget workflow risk, GitHub Actions cache poisoning, and extraction of an OIDC token from a runner process. The important feature sits in the release path. The packages were published through the legitimate trusted-publisher path, while the publish workflow itself was not modified. That distinction matters for any security product built around AI development. Provenance did not disappear. It became insufficient as a standalone confidence measure. A package can be signed by the expected pipeline while that pipeline has been steered through a poisoned intermediate state. For Eduardo, the useful product lesson is to treat provenance as one input in a larger behavioral record. A detector that only asks whether the artifact came from the expected identity will miss the part of the story where the action became unsafe. OpenAI's response (company security post) turned the same episode into an operational benchmark. The company said two employee devices in its corporate environment were impacted, with limited credential material exfiltrated from a subset of internal source repositories, while customer data, production systems, intellectual property, and software were not found to be affected. The remediation list is the part worth keeping close: system and identity isolation, session revocation, credential rotation, deployment workflow restrictions, and scrutiny of credential behavior. That is a ready-made customer conversation. AI application security buyers do not only need a classifier that says "prompt injection" or "malware." They need a chain of custody from suspicious content to action, credential, repository, and deployment consequence. The Mini Shai-Hulud incident happened in conventional software infrastructure, yet it describes the exact failure mode agent platforms inherit when they install packages, browse pages, call tools, and write code. Once a model is allowed to operate a developer environment, supply-chain security and prompt-injection security become neighbors. Google's AI Threat Tracker (threat intelligence report) sharpened the other side of the week. The company said it had high confidence that a threat actor used AI to help develop and weaponize a zero-day exploit against a web-based administration tool, and that it worked with the vendor before a planned mass exploitation event. Google also described broader adversary use of AI for malware development, defense evasion, vulnerability research, and initial access. The public details are intentionally thin, and they should stay thin. The defender implication is still concrete. If AI compresses the time between bug discovery and exploit preparation, weekly or monthly review cycles become harder to defend. Silmaril's opportunity is to frame itself around earlier interruption. Catch the untrusted instruction, suspicious tool invocation, unsafe code path, or abnormal agent trace before it graduates into deployment, credential use, or customer-visible action. The Governance Sale Becomes Specific Competitors and platform vendors spent the same week translating agent risk into controls a CISO can approve. Anthropic's May 12 security webinar page (company post) framed agentic AI review around action scope, blast radius, scoped access, egress control, SIEM-routed telemetry, and paced rollout. Those terms are not glamorous. They are exactly the terms that make procurement possible. Giskard's Guards launch language (company claim) moved in the same direction from a startup angle. The company described an EU-sovereign guardrail platform for regulated enterprises, with on-premise deployment, context-aware controls, policy-as-code, and compliance packs for the EU AI Act and OWASP Top 10 for LLM Applications. Whether the product outperforms alternatives is a separate diligence question. The positioning is instructive because it puts guardrails inside an enterprise control plane rather than a model-output filter. OpenAI's Windows Codex sandbox writeup (engineering security post) gave the builder version of that story. Codex needed a Windows sandbox that could constrain writes and network access while still acting on the user's checkout, tools, and environment. The design details are Windows-specific, but the buyer-facing message generalizes. Agents need operating-system and identity boundaries. Policy prompts alone do not create a boundary. Anthropic's PwC announcement (company announcement) showed where this governance sale is headed commercially. PwC is pairing Claude, Claude Cowork, and Claude Code with finance transformation in regulated industries, naming auditability and accuracy as central requirements and citing production deployments such as insurance underwriting and mainframe modernization. For Silmaril, the competitive read is that agent security will increasingly be sold through workflow modernization projects rather than as a separate purity layer. When a consulting partner rewires finance, legal, or software maintenance around agents, the security decision is embedded in the delivery architecture. This is where the product bar rises. A firewall that only returns a binary allow or block decision risks becoming invisible infrastructure. A firewall that explains the risky instruction source, the tool or data boundary involved, the credential or policy at stake, and the recommended containment action can sit inside the same governance budget that Anthropic, Giskard, and OpenAI are teaching buyers to create. Research Moved Toward Interactive Failure The prompt-injection papers from the window were useful because they pushed past static jailbreak scoring. IPI-proxy (preprint) proposed an intercepting proxy for red-teaming web-browsing agents by rewriting real HTTP responses from whitelisted domains and inserting attack strings into live retrieval surfaces. That is directly relevant to Silmaril because it tests the place where indirect prompt injection actually arrives: content the agent thinks it is allowed to read, rather than a benchmark page built for evaluation. AI Agents May Always Fall for Prompt Injections (preprint) made the argument more conceptual. The authors used contextual integrity to explain why data-instruction separation fails when an adversary can reshape the apparent context of an information flow. The paper's impossibility framing should be handled carefully, since it is a preprint and such claims need pressure. Still, it supports a practical point. The defense cannot rely on a universal separator between "instruction" and "data" because real workflows keep changing what counts as legitimate context. ASPI (preprint) added a subtle product cue. The authors found that clarification-seeking, usually treated as a sign of a better assistant, can increase prompt-injection vulnerability. In their benchmark, models became more susceptible when they had to ask for and incorporate additional input before acting. That matters because many enterprise agents are designed to clarify ambiguity before touching a business system. The polite interaction state can become a weaker security state. WARD (preprint) attacked the same web-agent problem from a defense angle. Its guard model was trained on a large web-agent dataset and an adversarial guard-targeting dataset, with the goal of preserving utility while improving recall under shifted domains and adaptive attacks. The details need replication, but the architecture points toward a useful Silmaril research direction: measure the guard as part of a workflow, under attack against both the agent and the guard, rather than scoring isolated strings. Jailbreak evaluation also became less comfortable. The Great Pretender (preprint) argued that attack success rates are unstable when generation and evaluation are stochastic, and proposed consecutive-success evaluation to expose inflated results. For Eduardo, this is a reminder to be careful with public benchmark wins. A model-firewall product can win credibility by reporting stability, false positive cost, and repeated-attempt behavior rather than one neat attack-success number. Agents Get Memory, Belief, and Cheaper Context AI capability research this week kept pushing agents toward longer action loops. TMAS (preprint) used specialized agents, experience banks, and guideline banks to improve test-time reasoning. Agent-BRACE (preprint) separated belief tracking from action selection by having one model maintain structured claims about the environment, with uncertainty labels, while another model chooses actions. M2A (preprint) merged mathematical reasoning ability into agent behavior in parameter space and reported a SWE-Bench Verified gain for a Qwen3-8B coding agent. These papers disagree on mechanism while pointing in the same direction. Agent performance is being improved by adding memory structures, uncertainty representations, collaboration patterns, and reasoning depth. Each improvement creates a corresponding audit problem. What did the agent remember? Which belief was uncertain? Which previous trajectory shaped this action? Which merged capability changed the coding behavior? That is why the systems papers on inference cost are not just infrastructure trivia. TriAxialKV (preprint) assigned mixed precision to KV-cache entries by temporal, modality, and semantic-role tags for computer-use agents. VeriCache (preprint) proposed using compressed cache drafts while verifying against a full KV cache, aiming for lossless output with higher throughput. KVCapsule (preprint) focused on vision-language KV-cache compression, where image tokens produce heavier memory pressure than text alone. Cheaper context lets agents carry more history, more screenshots, more retrieved pages, and more tool observations. That helps product utility and increases the amount of untrusted material riding inside the decision state. Silmaril should watch for the moment customers stop thinking of a prompt as a request and start thinking of it as a moving container of memory, retrieved evidence, tool output, and partial plans. The firewall has to reason about that container. Robots Show What "Acting" Really Costs Deep tech sources supplied the physical version of the same lesson. RIO (preprint) introduced an open-source robot I/O framework for cross-embodiment learning, validating workflows across single-arm, bimanual, and humanoid platforms. SafeManip (preprint) argued that task success is inadequate for robot manipulation because a successful rollout can still violate temporal safety properties such as contamination, contact, release, or enclosure access. PRIME (preprint) refined humanoid and legged robot motion estimates into physically consistent trajectories with contact forces and inertial parameters. The robotics thread may look far from a model firewall, but it is a useful analogy for customer language. In robotics, an action is never only a text decision. It is a timed sequence with contact, force, uncertainty, and recovery. Enterprise agents are starting to look similar. A finance agent touching a spreadsheet, a coding agent changing a repository, or a support agent updating a customer record has temporal safety properties too. It matters what happened before the tool call, whether the agent had authority, whether the intermediate state was contaminated, and whether a rollback path exists. WIRobotics' funding announcement (press release) gave that research direction a market wrapper. The company said it raised roughly 68 million dollars to expand from wearable walking assistance into humanoid robotics and Physical AI, citing data from its WIM deployments and plans around ALLEX. Isomorphic Labs' Series B announcement (company announcement) did something similar for AI drug design, raising 2.1 billion dollars to scale its drug design engine and move programs across therapeutic areas. The startup beat this week is larger than fresh money flowing into AI. Capital is chasing domains where models touch expensive substrates: bodies, molecules, factories, and enterprise workflows. Those domains will not tolerate vague safety claims. They will ask for evidence about allowed actions, observed state, failure modes, and containment. Monday's Operating Read The week leaves you with a sharper product position for Silmaril. The old pitch was that prompt injection and jailbreaks are dangerous. The better pitch now is that agents create mixed-trust execution traces, and customers need a way to judge each action before it becomes an incident. That suggests three near-term priorities. First, make provenance broader than identity. Record where the instruction came from, how it entered context, which tool boundary it approached, and whether the surrounding state had been shaped by untrusted content. Second, treat clarification, retrieval, and memory as first-class risk states. The most helpful agent behaviors may be the places where attacker-controlled context becomes harder to separate. Third, report in the language buyers used this week: scope, blast radius, egress, telemetry, policy, and rollback. A public firewall demo that blocks a malicious string is less interesting after this week. A demo that follows a poisoned web page into a browser agent, maps the requested action to a tool and credential boundary, explains why the flow violates policy, and leaves an audit trail a CISO can read would meet the moment more closely. The week began with a poisoned build cache and ended with a bigger lesson. Trust in agent systems is becoming an evidence problem, and evidence has to be collected before the agent acts. Sources TanStack postmortem: npm supply-chain compromise: https://tanstack.com/blog/npm-supply-chain-compromise-postmortem GitHub Advisory GHSA-g7cv-rxg3-hmpx: https://github.com/advisories/GHSA-g7cv-rxg3-hmpx OpenAI response to TanStack npm supply-chain attack: https://openai.com/index/our-response-to-the-tanstack-npm-supply-chain-attack/ Google Threat Intelligence: AI vulnerability exploitation, augmented operations, and initial access: https://cloud.google.com/blog/topics/threat-intelligence/ai-vulnerability-exploitation-initial-access OpenAI: Building a safe, effective sandbox to enable Codex on Windows: https://openai.com/index/building-codex-windows-sandbox/ Anthropic: Secure the Advantage, a CISO's Guide to Agentic AI: https://www.anthropic.com/webinars/secure-the-advantage-a-cisos-guide-to-agentic-ai Anthropic and PwC expanded partnership: https://www.anthropic.com/news/pwc-expanded-partnership Giskard AI security platform: https://www.giskard.ai/ Isomorphic Labs Series B announcement: https://www.isomorphiclabs.com/articles/isomorphic-labs-announces-series-b-investment-round WIRobotics Series B funding announcement: https://www.prnewswire.com/news-releases/wirobotics-secures-approximately-krw-100-billion-usd-68-million-series-b-funding-302772164.html IPI-proxy: https://arxiv.org/abs/2605.11868 AI Agents May Always Fall for Prompt Injections: https://arxiv.org/abs/2605.17634 ASPI: Seeking Ambiguity Clarification Amplifies Prompt Injection Vulnerability in LLM Agents: https://arxiv.org/abs/2605.17324 WARD: Adversarially Robust Defense of Web Agents Against Prompt Injections: https://arxiv.org/abs/2605.15030 The Great Pretender: A Stochasticity Problem in LLM Jailbreak: https://arxiv.org/abs/2605.14418 TMAS: Scaling Test-Time Compute via Multi-Agent Synergy: https://arxiv.org/abs/2605.10344 Agent-BRACE: https://arxiv.org/abs/2605.11436 M2A: Synergizing Mathematical and Agentic Reasoning in Large Language Models: https://arxiv.org/abs/2605.09879 TriAxialKV: https://arxiv.org/abs/2605.17170 VeriCache: https://arxiv.org/abs/2605.17613 KVCapsule: https://arxiv.org/abs/2605.16439 RIO: Flexible Real-Time Robot I/O for Cross-Embodiment Robot Learning: https://arxiv.org/abs/2605.11564 SafeManip: https://arxiv.org/abs/2605.12386 PRIME: https://arxiv.org/abs/2605.17681

Read the full article in Blueleaf.

Get the complete story with rich visuals, audio narration, and the context you need to understand this breakthrough.

Download on the App Store