AI-Augmented SRE Workflows

Two recurring tasks were consuming hours of SRE time each week: manually reviewing Prisma Cloud vulnerability reports and investigating rate-limiting alerts by querying CloudFront and ALB logs. Both were analytical, multi-step, and followed a consistent logic — good candidates for automation. But neither fit neatly into a traditional script.

Why Skill Files, Not Scripts

Traditional scripts work well when the steps are deterministic. Vulnerability analysis isn’t: different CVE categories need different remediation paths, reports vary in format, and the output needs to be readable by a non-expert. A script would need hundreds of conditional branches and would still miss edge cases.

Skill files are structured instruction sets for AI CLI agents (Claude/Codex). Instead of encoding every branch, the skill describes the domain logic: what to look for, how to categorize, what format to output. The agent reasons through the specifics.

The Two Skills

Vulnerability Analysis Skill

Input: Prisma Cloud vulnerability report for the EC2 fleet
Agent categorizes vulnerabilities by severity, determines fix method (yum upgrade vs manual patching), identifies packages involved
Output: structured summary of P0/P1 remaining, CVEs fixable via yum upgrade, packages per CVE
Time: 60 minutes manual → under 2 minutes

DDoS Alert Investigation Skill

Input: PagerDuty alert with rate-limiting context
Agent queries CloudFront logs (via Athena), correlates with ALB logs, identifies source IP, targeted endpoint, user details, attack pattern
Output: structured investigation report — what happened, who, from where, what was blocked
Time: 30+ minutes manual → under 5 minutes

Team Distribution

Packaged both skills in a shared GitHub repo with a single npx install command. Any engineer can run the investigation without understanding log schemas or Athena query syntax — the skill handles the context.

Results

Metric	Before	After
Vulnerability analysis time	~60 min per report	~2 min
Alert investigation time	30+ min, inconsistent	~5 min, structured output
Who can investigate	SRE only	Any engineer (self-serve)
Process consistency	Variable	Uniform output format

The broader outcome: this established a pattern. Before writing a new Python script for a repetitive task, we now ask whether a skill file handles it better. Skills compose; scripts accumulate.