AI Set Up My Security Pipeline and Got It Wrong

The toughest bugs aren’t the ones that break everything, they’re the ones that slip through unnoticed. I let AI set up a security pipeline to save time.

It worked.

It passed.

The problem wasn’t AI’s ability to set up a security pipeline, it was that it convincingly got it right.

It was not correct. It used outdated standards from 2019 and pulled in a deprecated API. Even though everything appeared “valid”, it was not current, creating a real AI security risk.

The Goal

The goal of this experiment was to try to automate the repetitive stuff. I set up a CI pipeline to handle:

Coding Standards
Surfacing potential security issues
Early feedback on pull requests

These were all pretty standard checks. Nothing unusual here. I wanted to see what AI could do to speed up the process. It should have been straightforward.

The Setup

Initially everything was looking good. I got the configurations that I needed. The workflow looked sound. There were no immediate red flags. The pipeline ran successfully.

No errors, no warnings, and no obvious misconfigurations.

I decided to run an initial test using intentionally malformed code and known issues to see if the setup was actually working.

This is where the problems started to become visible.

The Problems

At first, nothing looked broken. However, nothing was being detected either. Intentional vulnerabilities were being passed through every check. That is when it became clear that this was failing silently.

Deprecated API

Part of the setup relied on an API version that had already been deprecated. Nothing was flagged as incorrect. The problem was seen during testing of the pipeline.

Legacy security rules

The security rulesets were based on standards from 2019. It had gone years without a meaningful update. It technically executed, but the results weren’t reliable or useful. More importantly, it created a false sense of coverage.

The Real Problem

This wasn’t a broken pipeline. It was a misleading one that highlighted a real AI security risk, creating a false sense of security. The fact that it appeared “successful” makes it easier to assume everything was correct. The problem wasn’t that AI got it wrong, it was that it was convincingly right.

At some point it stops being productive to keep trying to fix it. Even if the setup could be made to work, the approach had fundamental flaws.

Static rules were out-of-date.
Security configurations require ongoing maintenance.
Tool reliability depends on version and update frequency.

I was no longer debugging failures in the pipeline, I was debugging false confidence. So I changed direction.

The Change

Instead of relying on a single system to do everything, I split the responsibilities into layers. Each layered component has a narrow responsibility.

The final output still requires a human-reviewed decision based on each of these results. The key shift here wasn’t replacing tools, but reducing over-reliance on any single system to define what is correct. Now we have multiple signals. It is not about a perfect pipeline, it’s about a quality one.

Final Thoughts

A few things became clear along the way:

A passing CI does not mean the setup is correct.
AI is great at generating plausible configurations even if they are outdated.
Outdated tools will fail quietly, requiring additional validation.
A human still has to interpret the signals and decide what to do with them.

The most dangerous bugs aren’t the ones that crash your system. They are the ones that blend into them. If you’re having questions about how to meaninfully incorporate AI into your website framework, let’s talk.