Senior DevOps Engineer (remote) - AWS Cloud Hosting Platform
Platinumlist
- Казахстан
- Постоянная работа
- Полная занятость
- Own production reliability on AWS: availability, latency, throughput, capacity, and incident response.
- Architect and operate scalable infrastructure (multi-AZ as a baseline; DR strategy and regular testing).
- Build and maintain Infrastructure as Code (Terraform / CloudFormation / CDK) and Git-based workflows.
- Improve CI/CD pipelines and deployment strategies (blue/green, canary, progressive delivery).
- Implement strong observability: metrics, logs, traces, alerting, dashboards; define SLO/SLI and reduce noise.
- Own database operations on AWS (Aurora/RDS MySQL): backups/restores (including restore drills), read replicas, performance troubleshooting, and capacity planning.
- Improve caching and traffic handling (CDN, Redis/ElastiCache, queues) to sustain peak demand.
- Harden security posture: IAM least privilege, secrets management, patching, WAF, audit trails.
- Drive adoption of relevant AWS managed services (where it increases reliability and reduces ops burden).
- Drive cloud cost efficiency (FinOps): cost visibility, tagging, budgets/alerts, rightsizing, and smart usage of AWS pricing models without compromising reliability.
- Lead post-incident reviews (RCA, corrective actions, prevention), and ensure improvements are implemented and verified.
- 10+ years of experience in similar role.
- Strong hands-on AWS in production (typical stack: VPC, IAM, EC2, ALB/NLB, Auto Scaling, S3, CloudFront, Route53, CloudWatch/CloudTrail, WAF; plus Aurora/RDS).
- Proven experience designing/operating high-load web systems with strict uptime requirements.
- IaC and automation mindset (Terraform/CloudFormation/CDK + scripting Bash/Python).
- Production MySQL on AWS (Aurora/RDS): backups & restores (including restore drills), read replicas, monitoring, and performance troubleshooting.
- Ability to troubleshoot production web stacks (Nginx + PHP-FPM) and identify bottlenecks across app ↔ DB ↔ infrastructure.
- Containers and deployment automation (ECS/EKS, Docker; understanding of scaling and rollout patterns).
- Solid Linux + networking fundamentals (DNS, TLS, routing, LB, troubleshooting).
- Observability practices and incident management experience.
- Must be reachable for critical production incidents; occasional after-hours support may be required (critical-only).
- PHP ecosystem familiarity (PHP-FPM/Nginx, Composer; Laravel/Symfony is a plus).
- MySQL internals/performance tuning and advanced replication/proxying (e.g., ProxySQL).
- Serverless & event-driven AWS (Lambda, SQS/SNS, EventBridge, Step Functions).
- Security & compliance frameworks; chaos testing/load testing.
- Competitive salary.
- Remote-friendly work setup.
- A chance to make a real impact in a fast-growing market.
- Space to grow, experiment, and push boundaries.