Open to work
Hi, I'm Hemanth
Site Reliability Engineer
I'm an SRE with 5+ years of experience in cloud infrastructure, Kubernetes, and CI/CD automation on AWS. At GoGuardian, I drive reliability and cost efficiency across EKS-based systems — including AI-augmented workflows that cut hours of manual security analysis to minutes.
I'm drawn to problems where automation has a clear leverage point: a 90-minute DDoS outage that prompted a full architecture redesign, a 3-day manual patch cycle that became a 4-hour automated run, a 60-minute vulnerability review that now takes 2 minutes. My focus is building infrastructure that's observable, secure, and boring to operate.
Professional Experience
Site Reliability Engineer
- Migrated Jenkins from EC2 to EKS — cut CI/CD costs by 50%, eliminated 20–30 min agent queues
- Designed CloudFront + WAF DDoS defense that blocked 4 attacks over 2 years with zero production impact
- Led EKS cluster upgrade v1.23 → v1.28 via blue-green strategy with under 5 min of user-facing impact
- Built AI skill files (Claude/Codex) cutting vulnerability analysis from 60 min to under 2 min — enabling engineers to self-serve security investigations without SRE involvement
- Automated kernel patching across the EC2 fleet, reducing security vulnerabilities by 80% and cutting patch cycle from 2–3 days to 4 hours
- Developed Python automation scripts reducing manual operational effort by 70%, freeing teams to focus on higher-impact work
- Built centralized Datadog dashboards for all-services health, traffic breakdowns, and DDoS monitoring; defined SLOs with burn-rate alerts to surface reliability risk before customer impact
- Owned end-to-end incident management — configured PagerDuty routing and escalation policies, authored runbooks for common failure scenarios, and led post-mortem reviews
- Designed disaster recovery strategy with defined RTO targets: stateless workloads reprovisionable in <1 hr via Terraform; databases restorable in 1–6 hrs with tested restore procedures
- Centralized secrets management with AWS Secrets Manager, eliminating hardcoded credentials and enforcing least-privilege IAM policies across services
- Led MongoDB Atlas version upgrades via staged rollout (dev → QA → prod), validated index integrity post-upgrade, and maintained a tested revert plan
DevOps Engineer
- Replaced Cluster Autoscaler with Karpenter — 25% compute cost reduction, node startup 3–5 min → 45 sec
- Built GitLab CI/CD pipelines from scratch across 5+ microservices, reducing manual intervention by 60%
- Developed Python scripts to automate log analysis, reporting, and data processing — 300% increase in processing speed
- Designed AWS VPC architecture with public/private subnet segmentation, routing tables, IGWs, and security group/NACL rules across dev, QA, and production
- Set up and managed GKE clusters including node pool configuration, workload deployment, and version upgrades across environments
- Managed EKS cluster upgrades across environments with zero downtime
- Mentored 2 junior engineers on CI/CD, Kubernetes, and cloud infrastructure — both delivering independently within 3 months
Technical Skills
Cloud Platforms
Infrastructure as Code
Containers & Orchestration
CI/CD
Programming & AI
Observability
Security
Incident Management
Databases
Education
Bachelor of Engineering
Electronics and Communication Engineering
Visvesvaraya Technological University
2015 – 2019