Senior DevOps Engineer (remote) - AWS Cloud Hosting Platform

Platinumlist

Казахстан
Постоянная работа
Полная занятость

1 мес. назад
Быстрый отклик

About Us: , a pioneering leader in the online event guide and ticketing solution industry, has been revolutionizing the event landscape in the Gulf region since 2009. As the largest ticketing provider in the GCC, we proudly serve an extensive array of events across the United Arab Emirates, Saudi Arabia, Oman, Bahrain, Qatar, and Kuwait from our Dubai-based headquarters.About the Role: We’re looking for a Senior DevOps / SRE Engineer to own and evolve our AWS infrastructure with a strong focus on reliability, scalability, performance under peak load, and safe delivery of new AWS capabilities. You’ll partner with engineering teams to ensure our platform stays fast and resilient during traffic spikes while continuously improving automation, observability, security, and cost efficiency.Key Responsibilities:

Own production reliability on AWS: availability, latency, throughput, capacity, and incident response.
Architect and operate scalable infrastructure (multi-AZ as a baseline; DR strategy and regular testing).
Build and maintain Infrastructure as Code (Terraform / CloudFormation / CDK) and Git-based workflows.
Improve CI/CD pipelines and deployment strategies (blue/green, canary, progressive delivery).
Implement strong observability: metrics, logs, traces, alerting, dashboards; define SLO/SLI and reduce noise.
Own database operations on AWS (Aurora/RDS MySQL): backups/restores (including restore drills), read replicas, performance troubleshooting, and capacity planning.
Improve caching and traffic handling (CDN, Redis/ElastiCache, queues) to sustain peak demand.
Harden security posture: IAM least privilege, secrets management, patching, WAF, audit trails.
Drive adoption of relevant AWS managed services (where it increases reliability and reduces ops burden).
Drive cloud cost efficiency (FinOps): cost visibility, tagging, budgets/alerts, rightsizing, and smart usage of AWS pricing models without compromising reliability.
Lead post-incident reviews (RCA, corrective actions, prevention), and ensure improvements are implemented and verified.

Requirements

10+ years of experience in similar role.
Strong hands-on AWS in production (typical stack: VPC, IAM, EC2, ALB/NLB, Auto Scaling, S3, CloudFront, Route53, CloudWatch/CloudTrail, WAF; plus Aurora/RDS).
Proven experience designing/operating high-load web systems with strict uptime requirements.
IaC and automation mindset (Terraform/CloudFormation/CDK + scripting Bash/Python).
Production MySQL on AWS (Aurora/RDS): backups & restores (including restore drills), read replicas, monitoring, and performance troubleshooting.
Ability to troubleshoot production web stacks (Nginx + PHP-FPM) and identify bottlenecks across app ↔ DB ↔ infrastructure.
Containers and deployment automation (ECS/EKS, Docker; understanding of scaling and rollout patterns).
Solid Linux + networking fundamentals (DNS, TLS, routing, LB, troubleshooting).
Observability practices and incident management experience.
Must be reachable for critical production incidents; occasional after-hours support may be required (critical-only).

Nice-to-have:

PHP ecosystem familiarity (PHP-FPM/Nginx, Composer; Laravel/Symfony is a plus).
MySQL internals/performance tuning and advanced replication/proxying (e.g., ProxySQL).
Serverless & event-driven AWS (Lambda, SQS/SNS, EventBridge, Step Functions).
Security & compliance frameworks; chaos testing/load testing.

Benefits

Competitive salary.
Remote-friendly work setup.
A chance to make a real impact in a fast-growing market.

Space to grow, experiment, and push boundaries.

Platinumlist