
Silmaril CTO Weekly
The Week Agents Became Infrastructure
Summary
Week ending May 10, 2026. Eduardo, the week opened with a small change in the place where software is made. On May 5, GitHub made secret scanning generally available inside its MCP server, so a coding agent can ask GitHub to inspect local changes for exposed credentials before a commit or pull request exists. The next item on the same changelog page put dependency vulnerability scanning into public preview for the same flow. The agent edits. The agent checks. The human sees the file, line, affected package, severity, or fix before the risk leaves the machine. That matters because the security boundary is shifting from the model endpoint into the agent session. GitHub is moving checks into the developer's hands. OpenAI, three days later, published the controls it uses to run Codex internally: sandbox rules, approval policies, network allow and deny lists, keyring-backed credential storage, managed configuration, OpenTelemetry logs, and compliance logs that preserve prompts, tool approvals, MCP usage, tool results, and network decisions. Google DeepMind, in the same window, described AlphaEvolve as an algorithm-search system now touching genomics, power-grid optimization, quantum circuits, and Google infrastructure. IBM Research said its quantum workflows are now simulating protein complexes, fusion-relevant molten salts, and materials systems at scales meant to be commercially useful. The common thread is action under measurement. The most interesting systems this week did something outside chat: they changed code, queried a tool, explored an algorithm, simulated chemistry, governed an agent, or tried to keep an unsafe action inside a boundary. For Silmaril, that is the useful read. The question for the coming week is where the control should sit, what trace it should preserve, and how quickly a discovered failure becomes a better deployed defense. The Control Plane Moved Closer To The Work GitHub's May 5 release is worth treating as more than a feature note. The MCP server now lets compatible agents and IDEs scan local code changes for secrets, and the scan honors existing push-protection customization. A separate May 5 public preview adds dependency scanning through the Dependabot toolset, returning affected packages, severity, and recommended fixed versions before the developer commits. The design pattern is simple and important: security inspection is being pulled into the agent's own path of action. The limitation is equally useful. GitHub says the secret-scanning results surfaced through MCP are part of the working flow, not persisted as normal GitHub alerts. That makes them early enough to stop mistakes and too ephemeral to serve as the whole record. The control catches a credential before it reaches the repository. The audit system still needs its own durable account of what the agent saw, requested, and changed. OpenAI's Codex safety post fills in that missing half. The details are ordinary in the best sense. Codex can be constrained by filesystem boundaries, network rules, blocked domains, and approval requirements. CLI and MCP OAuth credentials can live in the operating system keyring. Administrators can enforce requirements across the desktop app, CLI, and IDE extension. Logs can capture user prompts, tool approval decisions, tool results, MCP server usage, and network allow or deny events. OpenAI says it uses those Codex logs alongside an AI-powered security triage agent to understand whether a suspicious endpoint event was expected behavior, a benign mistake, or something that needs escalation. This is the market moving toward inspectable agent runs. A static prompt filter cannot answer why a file changed or why a tool call happened. A trace can. A classifier over the right state can. Customers will ask for a timeline: original instruction, retrieved context, tool boundary, policy decision, blocked action, reviewer approval, and final effect. Research This Week Was About Verification The strongest AI research signal came from a paper with an unglamorous premise: citations in deep-research reports may work as links and still fail as evidence. "Cited but Not Verified," posted to arXiv on May 7, introduces a framework that parses Markdown citations, retrieves the actual cited content, and checks whether the claim attached to the citation is supported. The authors report that strong frontier models kept link validity above 94 percent and topical relevance above 80 percent, while factual accuracy landed between 39 and 77 percent. In one ablation, fact-check accuracy dropped by roughly 42 percent as tool calls scaled from 2 to 150. That result should sit on the same whiteboard as prompt injection. More retrieval can make an answer look better researched while weakening the link between source and claim. More context can make an agent more capable while increasing the number of places where authority can blur. The engineering lesson is direct: the important object is the relationship between claim, source, context, and action. "When No Benchmark Exists," also posted May 7, is a quieter but valuable companion. The authors describe comparative safety scoring when there is no labeled benchmark for a particular language, sector, or regulatory setting. Their contract is disciplined: a score only has meaning under a fixed scenario pack, rubric, auditor, judge, sampling setup, and rerun budget. In a Norwegian safety pack, safe and abliterated targets separated with AUROC values between 0.89 and 1.00, but the larger point is the paperwork around the number. A score without its measurement conditions becomes a sales asset. A score with its conditions becomes evidence. Two other May 7 papers push the agent question forward. "AI Co-Mathematician" describes a stateful mathematical workbench that supports ideation, literature search, computation, theorem proving, and theory building while preserving failed hypotheses and reusable artifacts. The authors report 48 percent on FrontierMath Tier 4, which they describe as a new high among evaluated systems. "Recursive Agent Optimization" trains agents to spawn subtasks and delegate work to new instances of themselves, giving them a way to attack problems that exceed one context window. For Silmaril, both papers point at the same future failure mode. The useful agent keeps memory, delegates, searches, revises, and learns from failure. The dangerous agent does those things with unclear authority and incomplete history. A defense that only reads the final answer will miss the moment when a bad source entered memory, a delegated child inherited too much access, or a failed path taught the system the wrong lesson. Deep Tech Is Turning Search Into A Production Method DeepMind's AlphaEvolve update was the broadest deep-tech item of the week. The system uses Gemini models to generate candidate code, then scores candidates with automated evaluators. DeepMind says AlphaEvolve improved DeepConsensus for DNA sequencing with a 30 percent reduction in variant-detection errors, lifted a graph neural network's feasible-solution rate for AC Optimal Power Flow from 14 percent to over 88 percent, improved natural-disaster risk prediction accuracy across 20 categories by 5 percent, and suggested quantum circuits for molecular simulations on Willow with 10x lower error than conventionally optimized baselines. The numbers are company-reported, so they deserve normal caution. The architecture is still important. AlphaEvolve searches because each candidate meets an evaluator outside the language model: a benchmark, simulator, compiler, production metric, or scientific constraint. This is what makes the system feel closer to engineering than ideation. IBM Research made a similar claim from a different stack. At Think 2026, IBM described quantum-centric supercomputing workflows that combine quantum hardware with classical systems. Cleveland Clinic and RIKEN used sample-based quantum diagonalization on IBM hardware to simulate a 12,635-atom protein-ligand complex, with IBM reporting a 210x accuracy improvement over prior quantum-centric approaches. Oak Ridge work used the same style of workflow on FLiBe molten-salt chemistry relevant to fusion fuel production. Q-CTRL reported a materials simulation on IBM's platform more than 3,000 times faster than a leading classical method, with accuracy within 1 percent. The quantum claims still need ordinary buyer caution. The durable pattern is that frontier search systems become credible when they are tightly coupled to external checks. That should sharpen Silmaril's own threat-hunting thesis. Attackers can search too. They can mutate prompt chains, poisoned documents, tool parameters, MCP configurations, and exfiltration routes until one path works. The defender needs an evaluator loop of its own: reproduce the chain, score the harm, turn it into training data, and ship the new boundary quickly. GlazyBench, a May 7 arXiv benchmark for ceramic glaze design, is the small deep-tech story I would not skip. It packages 23,148 real glaze formulations for property prediction and image generation. The domain sounds artisanal. The lesson is industrial. New discovery systems need data about materials, process, appearance, and outcome before they can become useful. Security has the same burden. The clean trace is the dataset. The Hacking News Was About Agent Surfaces Cloud Security Alliance published a May 4 note on Hermes Agent and OpenClaw that reads like a checklist of what goes wrong when persistent agents behave like infrastructure before they are secured like infrastructure. CSA cited nine OpenClaw CVEs in four days in March, including CVE-2026-22172, a CVSS 9.9 authorization issue in which an authenticated user could self-assign administrative scopes during a WebSocket handshake. It also summarized Hermes Agent audit findings: unrestricted shell execution, broad file access, containerized approval bypass, persistent skill injection, and memory-store exposure. Several of those findings are older than the source window, but CSA's May 4 synthesis is current and useful. Persistent workstation agents combine long-lived memory, broad tool access, skill installation, provider credentials, and prompt-driven execution. That stack creates vulnerability classes that fit poorly into normal CVE thinking. The agent may read a poisoned memory entry, install a malicious skill, or execute a tool through a permission path that looks normal to endpoint software. WorkOS published a May 7 guide that frames prompt injection in agentic systems as an action problem. The article revisits EchoLeak in Microsoft 365 Copilot, the newer public discussion around obfuscated wallet instructions, Google's Antigravity file-write exploit, and a reported Cursor incident in which an over-privileged coding agent deleted production data and backups. The useful part is the containment model: scoped credentials, resource-level authorization, invocation policy, tool-chain checks, and audit logging. Two older advisories remain part of the context. GitHub's reviewed advisory for Flowise CVE-2026-40933 describes authenticated remote command execution through unsafe MCP adapter handling, patched in Flowise 3.1.0. OX Security's April MCP research argues that unsafe stdio configuration can turn agent tool setup into command execution across multiple implementations. I would keep these in the appendix rather than the lede. They explain why this week's control-plane releases matter, but they should not carry the week by themselves. Competitors Are Multiplying Around The Same Pain The competitor field widened again this week. Operant AI launched Endpoint Protector on May 4, positioning it as endpoint-native discovery and inline defense for AI tools, coding agents, MCP servers, skills, tools, and plugins. Its claimed control points include agent loop tracing, data exfiltration defense, runtime access governance for MCP clients and servers, and a CodeInjectionGuard for package and shell execution attacks. The language is promotional, but the placement is serious: Operant is betting that the workstation is where AI risk now concentrates. SlashID launched AI Identity Governance on May 5. Its frame is different and worth watching. Instead of treating agents primarily as content risks, SlashID models OAuth grants, MCP servers, AI applications, cloud-hosted models, and shadow AI as identity edges in an access graph. The release ties itself to an April Vercel security incident involving a malicious OAuth application originating from a third-party AI tool. That is a good commercial move because it gives buyers a familiar category, identity governance, for a new problem: non-human actors with delegated access. Guardrail Technologies launched Traffic Light for Code & AI on May 5, claiming a red, amber, green signal for AI-generated and human-written code, with results feeding into an AI Command Center. Guardrail says the product runs inside tools including Claude, OpenAI, Cursor, GitHub Copilot, and Google tools, and that the company was founded in Q2 2025 with three issued patents and six pending. Treat the patent and "first" language skeptically. The concrete signal is a new entrant trying to collapse code security, human provenance, agent behavior, and compliance proof into one operator-facing dashboard. Collibra launched AI Command Center on May 6 with a strategic Giskard partnership. The product promises real-time visibility into agent behavior, ownership, and decision-making, plus drift detection and compliance templates aligned with AI UC-1 standards. Giskard, meanwhile, is pushing Guards as a context-aware, EU-sovereign guardrail platform for agents, with policy-as-code, on-premise deployment, tool and parameter inspection, and quality checks. Cognizant followed on May 7 with Secure AI Services, a consulting and control-plane bundle around secure agent development, model security, data protection, identity, behavior controls, and audit-supporting evidence. Put those launches beside GitHub and OpenAI and the pattern is clear enough to act on. Startups are attacking the endpoint, identity graph, code-creation moment, guardrail runtime, and governance dashboard. Incumbents are embedding controls into developer workflows and enterprise services. Buyers will hear many versions of the same promise: visibility, policy, proof, and control before the agent damages something. The opening for Silmaril is precision. Most of the market is naming the problem at a high altitude. Very few will be able to show, on a real trace, that a defense understood user intent, application context, tool state, retrieved content, and accumulated execution history well enough to block a harmful outcome without smothering normal work. Startup Watch One clean startup signal this week came from XBOW. On May 6, the company announced an additional $35 million in Series C financing from strategic investors including Accenture Ventures, NVIDIA's venture arm, Samsung Ventures, and SentinelOne's S Ventures, framing it as an extension of a previously announced $120 million round. XBOW positions itself as autonomous offensive security, and the notable detail is the investor mix. When customers and ecosystem partners write checks, it is rarely just a belief in a category. It is usually a bet that the product is already sitting in a real workflow. The reason to include these is not because they threaten Silmaril. They show where agents will land next: warehouses, clinics, manufacturing lines, and logistics desks where a bad action has operational weight. Agent security will not stay inside developer tools. The same prompt-chain and tool-boundary questions will show up wherever a natural-language instruction can move inventory, create a clinical follow-up, update a compliance record, or touch an ERP system. Monday Read For this week, I would treat the center of gravity as execution state. The model-launch news matters. The research papers matter. The competitor launches matter. The practical question underneath all of them is whether a defender can see enough of an agent run to make a decision before the action lands. That means Silmaril's story should stay concrete. Show the run. Show the contaminated source. Show the tool boundary. Show the policy node. Show the classifier decision. Show the retraining artifact. Show that the next tenant benefits without exposing the first tenant's data. The buyer does not need another sermon about prompt injection. The buyer needs to know that when a calendar invite, README, lead form, memory entry, package, MCP server, or delegated subtask bends an agent toward harm, the firewall sees the bend while there is still time to stop it. Sources GitHub Changelog, "Secret scanning with GitHub MCP Server is now generally available," May 5, 2026: https://github.blog/changelog/2026-05-05-secret-scanning-with-github-mcp-server-is-now-generally-available/ GitHub Changelog, "Dependency scanning with GitHub MCP Server is in public preview," May 5, 2026: https://github.blog/changelog/2026-05-05-dependency-scanning-with-github-mcp-server-is-in-public-preview/ OpenAI, "Running Codex safely at OpenAI," May 8, 2026: https://openai.com/index/running-codex-safely/ OpenAI, "GPT-5.5 Instant System Card," May 5, 2026: https://openai.com/index/gpt-5-5-instant-system-card/ Google DeepMind, "AlphaEvolve: How our Gemini-powered coding agent is scaling impact across fields," May 7, 2026: https://deepmind.google/blog/alphaevolve-impact/ arXiv, "Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents," May 7, 2026: https://arxiv.org/abs/2605.06635 arXiv, "When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels," May 7, 2026: https://arxiv.org/abs/2605.06652 arXiv, "AI Co-Mathematician: Accelerating Mathematicians with Agentic AI," May 7, 2026: https://arxiv.org/abs/2605.06651 arXiv, "Recursive Agent Optimization," May 7, 2026: https://arxiv.org/abs/2605.06639 arXiv, "GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation," May 7, 2026: https://arxiv.org/abs/2605.06641 IBM Research, "How IBM is using quantum computing to understand the operating system of the universe," May 7, 2026: https://research.ibm.com/blog/ibm-think-2026-modeling-the-universe-with-quantum-computing Cloud Security Alliance Labs, "9 CVEs in 4 Days: What Hermes Agent Enterprises Must Learn," May 4, 2026: https://labs.cloudsecurityalliance.org/research/csa-research-note-hermes-agent-cves-20260504-csa-styled/ WorkOS, "Securing agentic apps: How to contain AI agent prompt injection," May 7, 2026: https://workos.com/blog/ai-agent-prompt-injection Operant AI, "Operant AI Launches Endpoint Protector," May 4, 2026: https://www.globenewswire.com/news-release/2026/05/04/3286769/0/en/Operant-AI-Launches-Endpoint-Protector-Securing-Shadow-AI-Coding-Agents-and-MCP-Across-the-Enterprise.html SlashID, "SlashID Launches AI Identity Governance," May 5, 2026: https://www.prnewswire.com/news-releases/slashid-launches-ai-identity-governance-the-first-access-graphnative-solution-built-to-govern-oauth-connected-ai-apps-agents-and-mcp-servers-302762454.html Guardrail Technologies, "Traffic Light for Code & AI," May 5, 2026: https://www.businesswire.com/news/home/20260505225337/en/ Collibra, "Collibra Launches AI Command Center to Scale Agentic AI with Real-Time Oversight and Continuous Control," May 6, 2026: https://www.collibra.com/company/newsroom/press-releases/collibra-launches-ai-command-center-to-scale-agentic-ai Cognizant, "Secure AI Services," May 7, 2026: https://news.cognizant.com/2026-05-07-Cognizant-Launches-Secure-AI-Services-to-Help-Enterprises-Safely-Scale-Agentic-Systems XBOW, "XBOW Secures Additional $35M from Strategic Investors, Including Select Customers and Ecosystem Partners," May 6, 2026: https://xbow.com/news/xbow-secures-additional-35m-from-strategic-investors GitHub Advisory Database, "Flowise: Authenticated RCE Via MCP Adapters," updated April 16, 2026: https://github.com/advisories/GHSA-c9gw-hvqq-f33r OX Security, "Securing the AI Supply Chain: How OX VibeSec Defends Against Anthropic MCP Vulnerability," April 24, 2026: https://www.ox.security/blog/anthropic-mcp-vulnerability-ox-vibesec-ai-supply-chain/