We are seeking a Senior Cloud Platform Engineer (7+ years experience) to lead the development, automation, and reliability of our AWS-based infrastructure. This role requires deep expertise in Terraform, Amazon EKS, AWS Lambda (supporting PHP-based microservices), and observability tooling. You’ll collaborate closely with engineering, product, and data teams to ensure the performance, scalability, and resilience of our cloud-native systems.
Key Responsibilities:
- Provision and maintain EKS clusters running production MongoDB and Elasticsearch clusters using official operators and Helm charts
- Re-deploy and operate Terraform-based infrastructure, including networking, IAM, compute, and stateful services (EKS, RDS, Redis)
- Support and maintain compute platforms including EC2-based services (with AMI builds via Packer) and PHP Laravel microservices deployed on AWS Lambda using the Bref runtime.
- Manage CI/CD pipelines using Jenkins and Bitbucket Pipelines (custom shared libraries in Groovy)
- Manage observability tools such as Prometheus, Grafana, CloudWatch, New Relic, and Status Cake across infrastructure, APM, and alerting layers.
- Manage Secrets, IAM, KMS policies, and secure cross-account integrations (via VPC peering or AWS Private Link)
- Handle VPN infrastructure and manage infrastructure access for globally distributed teams
- Lead root cause investigations and performance optimizations for key services
- Guide and mentor junior DevOps engineers on best practices, reviews, and design
Required Experience & Skills:
- 7+ years of experience in DevOps, or Cloud Infrastructure roles
- Expertise with AWS core services: EKS, Lambda, EC2, S3, IAM, Route 53, RDS, etc.
- Advanced experience with Terraform for managing AWS and Kubernetes resources
- Hands-on experience running stateful workloads (MongoDB/Elasticsearch) in Kubernetes using Helm and custom operators
- Solid understanding of container orchestration: Kubernetes (EKS preferred)
- Must hold a valid Certified Kubernetes Administrator (CKA) credential.
- Strong CI/CD exposure: Jenkins, Bitbucket Pipelines, Groovy-based shared libraries
- Familiarity with PHP environments, including Laravel on Lambda (Bref)
- Deep understanding of monitoring/observability tools: Prometheus, Grafana, CloudWatch, New Relic.
- Linux system administration, shell scripting, and at least one scripting language (Python/Ruby preferred)
- Experience working in multi-account AWS environments with VPC Peering, NAT Gateways, or Private Link.
Nice to Have:
- Past ownership of complex migrations (K8s clusters, account transfers, IAM realignments)
- Background in infrastructure cost control (Savings Plans, RI management)
- Strong automation experience using Packer, Ansible, and scripting for operational workflows such as AMI creation, queue processing, and scheduled.
- system reporting.
- Familiarity with data pipelines or ETL processes, including DMS
- Security-first mindset, with focus on secrets management, identity policies, and audit compliance