Tech Lead Manager
Super Dispatch
- Казахстан
- Постоянная работа
- Полная занятость
- Own the technical roadmap for the Platform squad, including infrastructure modernization, reliability improvements, and cost optimization.
- Make and document architecture decisions (ADRs) that affect the entire engineering organization - service decomposition, API contracts, database strategies, and infrastructure patterns.
- Design and implement cross-cutting platform capabilities: identity/auth services, observability pipelines, deployment infrastructure, and security controls.
- Drive API-first design practices across services, including OpenAPI specifications and generated client libraries for Go, Python, and Java consumers.
- Evaluate and adopt new technologies and tools (e.g., transitioning observability to Datadog, implementing row-level security in databases, adopting infrastructure-as-code with Terraform).
- Lead cost analysis and optimization of cloud infrastructure.
- Be the go-to technical escalation point for platform-related questions from all product squads.
- Own incident response for platform-level outages (SEV-0/SEV-1), coordinating across squads to restore service.
- Define and maintain runbooks, monitoring alerts, and escalation procedures.
- Conduct post-incident reviews and drive follow-up action items to prevent recurrence.
- Set and track platform reliability metrics (uptime, latency percentiles, deployment frequency, MTTR).
- Design and implement resilience patterns: circuit breakers, graceful degradation, database failover strategies.
- Contribute directly to platform services - writing production code in Python, Go, or other languages as needed.
- Review code and architecture proposals from platform engineers and cross-squad contributions to shared infrastructure.
- Manage deployment configurations (Kubernetes manifests, Helm charts, ArgoCD), secrets management (Vault, 1Password), and CI/CD pipelines (GitHub Actions).
- Set high standards for coding, testing, deployment, and monitoring practices within the squad and across the organization.
- Manage a team of ~4-5 platform engineers with regular 1:1s, career development conversations, and performance reviews.
- Coach engineers on both technical depth and breadth - helping backend engineers grow into infrastructure and reliability expertise.
- Identify hiring needs and technical skill gaps; lead recruiting efforts for the platform squad.
- Onboard new team members effectively, building their context on a complex, cross-cutting codebase.
- Foster a collaborative culture where product squads feel supported (not blocked) by the platform team.
- Delegate effectively - empower team members to own subsystems while maintaining architectural coherence.
- Proactively communicate platform changes, maintenance windows, and new capabilities to engineering and non-engineering stakeholders.
- Partner with product squad EMs to understand their infrastructure needs and pain points.
- Coordinate with Security and Compliance on audit logging, access controls, and data protection requirements.
- Collaborate with Data/Analytics teams on database access policies, ETL pipelines, and data governance.
- Technically deep - you can debug a production database replication issue at 2 AM, design a new service architecture on a whiteboard, and review a Kubernetes deployment manifest with equal confidence.
- Proactive - you act without being told what to do. You identify reliability risks before they become incidents and technical debt before it slows the team.
- Pragmatic - you make sound trade-offs between engineering perfection and business velocity. You know when 4ms response time is good enough and when to stop optimizing.
- Move fast - you execute quickly and get things done, while maintaining the quality bar expected of platform infrastructure.
- Growth driven - you seek growth in learning, efficiency, and celebrate wins.
- Customer focus - you treat product squads as your customers and empathize with their needs and constraints.
- Strong communicator - you can explain a database failover strategy to engineers and a platform investment to leadership with equal clarity. You communicate comfortably in English (speaking and writing).
- You have strong opinions, loosely held - you drive decisions forward while remaining open to better ideas.
- You can evaluate complex system designs and identify where they will break at scale.
- You balance "build vs. buy" decisions thoughtfully, considering long-term maintenance burden.
- You write clear ADRs and technical documentation that help future engineers understand the "why" behind decisions.
- You stay current with infrastructure and platform engineering trends without chasing every new tool.
- You can manage engineers with different skill sets from your own.
- You communicate expectations clearly, solicit and deliver feedback frequently.
- You run effective 1:1s, planning sessions, and retrospectives.
- You can develop processes and remove hurdles to facilitate great execution.
- You have a high tolerance for ambiguity, especially around organizational boundaries.
- You value empathetic and direct communication, particularly when giving and receiving feedback.
- Advanced-level English skills, especially speaking and writing.
- At least 7+ years of experience as a software engineer, with at least 3 years in infrastructure, platform, or SRE roles.
- At least 2 years of experience managing a team of 3-10 engineers.
- Deep hands-on experience with cloud infrastructure (GCP or AWS), Kubernetes, and container orchestration.
- Strong background in at least two of: Java, Python, Go, - with willingness to work across all three.
- Production experience with relational databases at scale (PostgreSQL), including replication, failover, and performance tuning.
- Experience with message brokers (RabbitMQ, Kafka, or similar) and event-driven architectures.
- Experience with observability and monitoring tools (Datadog, Grafana, Prometheus, or similar).
- Track record of leading incident response for production systems and driving reliability improvements.
- Experience with CI/CD pipeline design and deployment automation.
- Demonstrated ability to make architectural decisions and communicate them clearly through ADRs or similar documentation.
- Experience with Elasticsearch at scale (cluster management, index optimization, migration strategies).
- Experience building identity/authentication services.
- Experience with infrastructure-as-code (Terraform, Pulumi).
- Experience with API-first design and OpenAPI code generation workflows.
- Experience managing platform/infrastructure teams at startups during periods of rapid growth.
- Experience rolling out engineering practices and processes where they didn't exist before.
- Experience managing partially or entirely remote teams across multiple time zones.
- Familiarity with cost optimization strategies for cloud infrastructure.
- Experience with database security controls.