All Projects

Kernel Upgrade Automation

Automated kernel and package upgrades across GoGuardian's EC2 fleet using Python and AWS Systems Manager — reducing security vulnerabilities by 80% and cutting the patch cycle from 2-3 days to 4 hours.

Tech Stack
PythonAWS SSMAWS EC2AutomationSecurity

Kernel Upgrade Automation

Patching a fleet of EC2 instances manually is slow, error-prone, and doesn’t scale. Every patch cycle required SSH-ing into instances, running updates, validating services, and moving to the next — a 2-3 day process that accumulated security debt between cycles.

The Problem

Manual patching had three failure modes:

  1. Inconsistency — different engineers, different procedures, different outcomes
  2. Lag — vulnerability window between patch release and application stretched to days or weeks
  3. Toil — time spent patching was time not spent on anything else

Security scan reports showed the EC2 fleet consistently had a long tail of unpatched vulnerabilities because the manual process couldn’t keep up.

The Automation

Built a Python orchestration layer on top of AWS Systems Manager (SSM):

  • Inventory first — query EC2 tags to build the target fleet; filter by environment, role, and patch window
  • Pre-patch validation — check instance health, verify SSM agent connectivity, confirm no active deployments
  • Staged execution — patch dev, wait and validate, then QA, then prod in batches
  • SSM Run Command — executes yum update-minimal --security (kernel + security packages only) on each instance; SSM handles parallel execution and output collection
  • Post-patch validation — check service health after reboot, flag any failures for manual review
  • Reporting — generates per-run report: instances patched, packages updated, failures, before/after kernel versions

Results

MetricBeforeAfter
Patch cycle duration2–3 days4 hours
Security vulnerabilities (EC2 fleet)Baseline80% reduction
Process consistencyManual, variableAutomated, uniform
Engineer time per cycle~2 days manual effort~30 min oversight

The 80% vulnerability reduction came from two things: faster cycles (less accumulation between patches) and completeness (automation doesn’t miss instances or skip steps when time-pressured).