Kernel Upgrade Automation
Automated kernel and package upgrades across GoGuardian's EC2 fleet using Python and AWS Systems Manager — reducing security vulnerabilities by 80% and cutting the patch cycle from 2-3 days to 4 hours.
Kernel Upgrade Automation
Patching a fleet of EC2 instances manually is slow, error-prone, and doesn’t scale. Every patch cycle required SSH-ing into instances, running updates, validating services, and moving to the next — a 2-3 day process that accumulated security debt between cycles.
The Problem
Manual patching had three failure modes:
- Inconsistency — different engineers, different procedures, different outcomes
- Lag — vulnerability window between patch release and application stretched to days or weeks
- Toil — time spent patching was time not spent on anything else
Security scan reports showed the EC2 fleet consistently had a long tail of unpatched vulnerabilities because the manual process couldn’t keep up.
The Automation
Built a Python orchestration layer on top of AWS Systems Manager (SSM):
- Inventory first — query EC2 tags to build the target fleet; filter by environment, role, and patch window
- Pre-patch validation — check instance health, verify SSM agent connectivity, confirm no active deployments
- Staged execution — patch dev, wait and validate, then QA, then prod in batches
- SSM Run Command — executes
yum update-minimal --security(kernel + security packages only) on each instance; SSM handles parallel execution and output collection - Post-patch validation — check service health after reboot, flag any failures for manual review
- Reporting — generates per-run report: instances patched, packages updated, failures, before/after kernel versions
Results
| Metric | Before | After |
|---|---|---|
| Patch cycle duration | 2–3 days | 4 hours |
| Security vulnerabilities (EC2 fleet) | Baseline | 80% reduction |
| Process consistency | Manual, variable | Automated, uniform |
| Engineer time per cycle | ~2 days manual effort | ~30 min oversight |
The 80% vulnerability reduction came from two things: faster cycles (less accumulation between patches) and completeness (automation doesn’t miss instances or skip steps when time-pressured).