Jenkins EC2 to EKS Migration
Migrated Jenkins from always-on EC2 agents to dynamic Kubernetes pods on EKS — cutting CI/CD infrastructure costs by 50% and eliminating agent queue wait times from 20-30 minutes to under 45 seconds.
Jenkins EC2 to EKS Migration
Jenkins was running on four always-on EC2 agents that sat idle most of the day but couldn’t keep up during peak hours. Engineers routinely waited 20–30 minutes for a build agent during morning deploys. The infrastructure cost was high; the developer experience was worse.
The Problem
The EC2 model had two fundamental issues: fixed capacity and always-on cost. You either over-provision (pay for idle) or under-provision (queue builds). There was no middle ground, and maintenance overhead was entirely on the SRE team.
The Migration
Moved to an EKS-based Jenkins setup where build agents are Kubernetes pods — created on demand when a build starts, destroyed when it finishes. Key decisions:
- Custom Docker images for each agent type (JDK, Python, Docker-in-Docker) stored in ECR
- EBS snapshot for Jenkins home directory migration — zero job config loss, zero secrets re-entry
- Helm chart for Jenkins controller so upgrades are a
helm upgrade, not an SSH session - Parallel testing period — old EC2 agents stayed live for two weeks while the new setup ran in parallel, with a clean rollback path
Results
| Metric | Before | After |
|---|---|---|
| CI/CD infrastructure cost | Baseline | 50% reduction |
| Agent wait time (peak) | 20–30 min | Under 45 sec |
| Queue failures | Frequent | Zero |
| Upgrade process | Manual SSH | helm upgrade |
The biggest quality-of-life win: engineers stopped timing deploys around queue availability. Builds start instantly, and the cluster scales down to zero agents overnight.