The Missing Layer in AI Security: Execution

Reasoning Isn’t Security - Execution Is

Most of the industry is focused on using AI to reason about vulnerabilities.

However frontier models require more than reasoning to be harnessed in modern security programs. They requires memory, context, and systems that drive work to closure.

Anthropic’s own follow-up to their research points to the same conclusion:once discovery is solved, closing the patch gap becomes the defining problem for security teams.

If it’s not patched, it’s still exploitable - no matter how well it was triaged. Most security programs aren’t built for that world. A small security team cannot effectively interface with a large engineering organization to burn down the backlog. There simply isn’t enough human security talent to keep up.

So while the industry is solving for reasoning, the real gap is execution.


A handful of security engineers cannot coordinate, negotiate, and drive remediation across thousands of developers.

In practice, product security teams struggle with tasks downstream of discovery, like triage, exploit validation, negotiating with engineering, and chasing fixes across the org.

That’s thousands of hours of labor. And the humans to perform that labor do not exist.

At Nullify we’ve built a system of agents that replaces that work, and it’s creating the outcomes typically expected from a fully ramped product security team or managed offering in areas like automated vulnerability remediation, as they can now patch real exploits at scale within their SLAs without being constrained by headcount.

The Last Mile Is the Whole Problem

We didn’t start here. The first agents we built at Nullify were focused on investigating findings to triage them for exploitability and impact. As we made progress here and replaced legacy scanners by helping security teams focus on findings that were important to fix, we realised that even with a perfectly triaged backlog there is a clear dependency on labor to drive work to closure.

Signal to noise on the backlog is just one subset of the product security value chain, and while customers had fewer, higher-quality issues, the work didn’t go away - it just moved.

Even with perfect triage, the same pattern kept showing up: findings would enter the backlog, and then get stuck in the last mile.

That last mile looks like:

  • Figuring out who should fix the issue: finding the right assignee to review the fix
    • Maybe the developer who committed the vulnerable lines has left the company?
    • Maybe the team to workload ownership model is convoluted, or lives in Compass or in some old Confluence page?
  • Back-and-forth over how to fix the issue: fixes getting partially implemented, then sent back for rework
    • How does security know that refactoring the insecure lines won’t cause the unit tests to break?
    • Does a Junior AppSec Engineer even have enough context on that part of the codebase to propose a fix?
  • Deciding when the work needs to be accepted: negotiating with engineering on priorities and capacity
    • What if engineering has a major product release coming up ?
    • Are there available story points in the sprint to be allocated
    • What Jira project should the ticket associated with the fix PR be created in?
    • What’s the latest date by which the patch can be merged per the SLAs?

Debating over ownership, severity and timing all while SLAs slip waiting on approval isn’t security work. It’s a coordination problem born from the lack of headcount in security that can interface with engineering drive work to closure.

Across customers, we saw the same thing: the majority of time wasn’t spent validating our findings — it was spent negotiating, routing, and following up on fixes.

Improving triage only solved a small part of the end-to-end product security value chain.

It only made it more visible.

Better signal just meant more confidence in the issue and a faster handoff to engineering, which only made the same bottlenecks in coordination, ownership, and execution more apparent.

This is when it became clear:

The last mile isn’t a small part of the problem — it is the problem.

In a post-Mythos world, where discovery is effectively infinite, this layer becomes the limiting factor. Not detection or reasoning, but execution. If a vulnerability isn’t fixed, it doesn’t matter how well it was triaged.

It’s still exploitable.

That’s the gap most product security programs are leaving open, and it’s where breaches will happen more and more.

Execution, Measured

Our answer to this gap was Campaigns, an agentic interface was orchestrating the resolution of vulnerabilities at scale.

The paradigm shift that we wanted to make with Campaigns in how security teams ran their product security program was inspired by Amazon’s Security Guardians - an internal Amazon program that was built to empower developers to act as embedded security champions within their own product teams. These volunteers then acted as a "security conscience," bridging the gap between development and central security teams to improve security by design, reduce bottlenecks, and accelerate secure product launches.

It scales through distributed ownership.

But distributed ownership only works if execution does. The result is a different operating model:

But most companies cannot hire product security headcount in central security to build a program like that.

With Campaigns, Nullify enables you to mirror your remediation goals with your security programs objectives, no matter your headcount

First, tell Nullify the subset of the backlog you want remediated  by what date, and how much of your developers storypoints you want assigned to reviewing Nullify’s fixes. Or let Nullify generate default campaigns based off of what it thinks you need to fix first with the capacity you have available, using it’s understanding of your company’s security posture, risk model in vault and your backlog.

With Campaigns, the concept of distributed security ownership is democratized as the labor constraint

Customers achieve the output of a fully ramped product security team — without being constrained by headcount.

Exploits are not just identified, but patched at scale

Backlogs don’t grow, they get burned down in alignment with engineering as security can mirror their remediation goals with the program’s broader objectives.

With a system designed for execution:

  • Fixes are merge-ready — engineers review and ship, not rewrite
  • MTTR collapses — exploits are patched in days, not quarters
  • Negotiation disappears — impact is proven upfront, not debated
  • Work is schedulable — remediation lands when teams have capacity
  • Backlogs burn down — they don’t accumulate - wesfarmers

Entire classes of work disappear:

  • triage debates
  • ownership routing
  • fix iteration cycles
  • SLA chasing

What remains is a system that takes responsibility for outcomes.

This is the missing layer in AI security: execution.

Next, we’ll go deeper into how we solved triage — how Vault assembles context and ontology across code, cloud, and business logic to validate exploitability with agentic tool use, and how we built systems to ingest and reason over unstructured organizational context.