OpenAI just announced [their agentic security researcher(https://openai.com/index/introducing-aardvark/). They named it Aardvark, which is either a reference to eating bugs or someone at OpenAI really wanted to be first alphabetically in the AI agent rankings. From OpenAI:
“Aardvark continuously analyzes source code repositories to identify vulnerabilities, assess exploitability, prioritize severity, and propose targeted patches. Aardvark works by monitoring commits and changes to codebases, identifying vulnerabilities, how they might be exploited, and proposing fixes. Aardvark does not rely on traditional program analysis techniques like fuzzing or software composition analysis. Instead, it uses LLM-powered reasoning and tool-use to understand code behavior and identify vulnerabilities. Aardvark looks for bugs as a human security researcher might: by reading code, analyzing it, writing and running tests, using tools, and more.”
I don’t know if it’s just me, but I’ve never really trusted SAST and SCA tools that claim to be able to identify vulnerable packages and code in my team’s products. The signal-to-noise ratio has been well off, and I’ve never been able to shake the feeling that their total lack of understanding of the application architecture and threat model renders the whole thing an exercise in performative compliance (aka the worst kind). You know, the kind of security theater where the SAST tool finds 47 ‘critical’ SQL injection vulnerabilities in your GraphQL API that doesn’t use SQL.
But does Aardvark know what matters? Does it know that your admin API is only exposed to VPN’d employees, or is it going to freak out that you don’t have rate limiting on an endpoint that three people use twice a month? I’m deeply, deeply curious about how the threat modelling part of Aardvark works - do you give it a filled out architecture (as code?) with STRIDE analysis, or does it infer it? How does it make sure that the vulnerabilities it identifies are for real?
A few months back, I played around with building my own security analyst agent with some interesting results. It found a few security issues with a product before it went out, but equally it identified a few theoretical issues that it swore blind were exploitable, but that weren’t really. My agent was like that friend who’s convinced every headache is a brain tumor - technically possible, but buddy, maybe it’s just caffeine withdrawal. Those findings needed to be investigated and took up a bunch of engineering time, ultimately for nothing.
Look, if Aardvark can actually reason about threat models instead of just pattern-matching CVE descriptions, it’ll be huge. If it can’t, it’s just another scanner with a more expensive API bill. I’m rooting for the former, but my money’s on a lot of confused engineers wondering why the AI is so worried about their internal monitoring dashboard.