What happens when you add AI to SAST

Nearly a year ago, I wrote an article titled “How to pick the right SAST tool.” It was a look at the pros and cons of two different generations of static application security testing (SAST):

  • Traditional SAST (first generation): Deep scans for the best coverage, but creates massive friction due to long run times.
  • Rules-based SAST (second generation): Prioritized developer experience via faster, customizable rules, but coverage was limited to explicitly defined rules.

At that time, these two approaches were really the only options. And to be honest, neither option was all that great. Basically, both generations were created to alert for code weaknesses that have mostly been solved in other ways (i.e., improvements in compilers and frameworks eliminated whole classes of CWEs), and the tools haven’t evolved at the same pace as modern application development. They rely on syntactic pattern matching, occasionally enhanced with intraprocedural taint analysis. But modern applications are much more complex and often use middleware, frameworks, and infrastructure to address risks.

So while responsibility for weaknesses shifted to other parts of the stack (thanks to memory safety, frameworks, and infrastructure), SAST tools spew out false positives (FPs) found at the granular, code level. Whether you’re using first or second generation SAST, 68% to 78% of findings are FPs. That’s a lot of manual triaging by the security team. Worse, today’s code weaknesses are more likely to come from logic flaws, abuse of legitimate features, and contextual misconfigurations. Unfortunately, those aren’t problems a regex-based SAST can meaningfully understand. So in addition to FPs, you also have high rates of false negatives (FNs). And as organizations adopt AI code assistants at high volumes, we can also expect more logic and architecture flaws that SASTs can’t catch.

Can AI solve the SAST problem?

As the security community started adopting AI to solve previously unsolvable/hard problems, an interesting question was repeatedly posed: Can AI help produce a SAST that actually works?

In fact, it can. And so dawned the third generation of SAST:

  • AI SAST (third generation): Uses AI agents and multi-modal analysis to target business logic flaws and achieve extremely high FP reduction.

Let’s be clear! Good quality AI SAST should be more than just a first or second generation tool with a ChatGPT wrapper around it. For the tool to perform well, it needs the context of your code and architecture. But don’t just dump your entire code bases into a large language model (LLM). That will burn tokens and quickly become prohibitively costly at enterprise scale.

When evaluating AI SAST solutions, I suggest looking for a multi-modal analysis that includes a combination of rules, dataflow analysis, and LLM reasoning. This multi-modal approach replicates the same process security teams use manually: read the code, trace the dataflow, reason about business logic.

Rules for syntax

Rules are dead, long live rules!

Deterministic checks (via rules) are still an excellent way to catch specific patterns at a near-zero runtime cost. To use a security truism, a good AI SAST will leverage a defense-in-depth strategy, with the rules identifying obvious security bugs while AI is used later in the flow. For example, a rule can quickly flag the use of an outdated encryption algorithm or the absence of input validation on a critical API endpoint.

When looking at AI SAST products, find out where the rules come from:

  • Are they generic linters, or is there a research team tuning them for accuracy?
  • Are the rules tested against real code?
  • Do the findings include detailed context and remediation guidance?
  • Does the tool let you add natural language rules to the system? (This is key because, well, writing rules is no fun.)

All of these points can really benefit AI-triage-at-scale by reducing the tokens needed to parse a code base.

Dataflow analysis

Let’s suppose a rule flags the usage of a vulnerable encryption function in two different places in the code. Finding those weaknesses doesn’t mean they’re true positives. Here’s where dataflow analysis is useful. The AI SAST follows the dataflow across multiple files and functions, looking through the code in the tested source file to perform a taint analysis, tracing input from sources to sinks. The purpose of this step is to remove or deprioritize findings that aren’t exploitable. (It’s a bit like reachability for software composition analysis, or SCA.) And while AI can do this, it’s also beneficial for the tool to have some non-AI way of conducting program analysis to speed things up.

When you’re evaluating AI SAST products to see how they do dataflow analysis, ask:

  • How is the analysis performed? Is it done by AI or with program analysis, or both?
  • Can the tool handle multi-file, multi-function analysis?
  • What evidence is provided to justify whether the code is exploitable?
  • What percentage of false positives is the analysis able to detect?

You should expect the tool to show the path an attacker could take to exploit a weakness within the context of your application, turning hypothetical issues into actionable knowledge. Dataflow analysis is also a good use case for AI agents, so you might expect to see AI at this step.

Reasoning with LLMs

Not long ago, the combination of rules and analysis might have been considered adequate. But it still generates FPs because the tool is just flagging potential vulnerabilities without understanding what other compensating controls might be in place. The culprit is often a SAST tool’s inability to perform cross-file analysis, and unfortunately adding more rules can backfire. That’s because more patterns yield more findings, but without context, many of those findings will be of low quality. And of course, those older tools can’t catch complex logic flaws.

This is where AI SAST can add more value, by telling you if a finding is high-priority. Using AI-based triage, the tool can review findings in the context of the entire code base and any additional metadata, much like a human security expert would, to make final determinations and prioritizations. This final triage step can identify logic flaws, eliminate more FPs, or potentially downgrade the severity of findings based on specific runtime configurations, the relationships between components, or the nuances of business logic.

Some questions to ask the AI SAST vendor include:

  • Does the tool have cross-file awareness?
  • What kinds of files or documentation is the tool prompted to look at?
  • Can the tool detect complex logic flaws?
  • Can the tool learn from engineer feedback?

How will the vendor handle your data?

Finally, before you decide to try an AI SAST, be sure you understand the vendor’s data handling practices. Ask:

  • How is the analysis scoped?
  • Is my code retained?
  • Is my data used for training?
  • What can I opt out of, and how will that impact accuracy?

You might be tempted to say, well I’ll just bring my own model (BYO LLM). That sounds like an easy fix, but maintaining your own LLM requires a massive infrastructure that is neither easy nor cheap. A potential compromise could be bringing your own API key, even with something simple like AZURE_OPENAI_API_KEY=your_azure_openai_api_key.

Is AI SAST for you?

If SAST has become a painful checkbox in your organization, with developers and security engineers alike bemoaning its existence, then definitely look into whether an AI SAST is right for you. As AI coding tools get better in the future, we’ll get to a world where design, architecture, and logic risks are really the only remaining flaws. Someday (perhaps soon), your first-generation or second-generation SAST may no longer detect the risks that are present in your code. AI SAST could well prepare you for that future.

Here’s a quick reference table to think about the pros and cons of each.

  Traditional SAST (first generation) Rules-based SAST (second generation) AI SAST (third generation)
TL;DR Slow but accurate Fast but noisy Fast and accurate
Pros – Best coverage possible – Fast, CI/CD compatible
– Highly customizable, tailored rules
– Developer-oriented, seamless integration
– Detects complex logic flaws (low FNs)
– Understands code context (low FPs)
– Potential for the agents to learn what matters    
Cons – Slow, not ideal for agile workflows, happens very late in the SDLC
– Limited customization options
– Coverage comes at the cost of FPs
– Can’t detect complex business flaws (FNs)
– Requires separate tools or processes
– Rule-dependent, may require expertise
– Requires ensuring rules meet specific use cases (e.g., language support)
– Speed comes at the cost of FNs and FPs
– Can’t detect complex business flaws (FNs)
– Must be comfortable with LLM having access to code

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

Go to Source

Author: