DevOps/k8s engineer
- Алматы
- Постоянная работа
- Полная занятость
- Maintain and improve managed Kubernetes clusters (control plane, node pools, autoscaling, PDB, network policies).
- Support API and ML workloads.
- Set up monitoring, alerting, logging, backups, and disaster recovery procedures.
- Investigate and resolve incidents, including on-call participation.
- Research, optimize, and automate the current infrastructure setup.
- Designing and maintaining production clusters (preferably with autoscaling, PDB, network policies).
- Confident use of Deployments, StatefulSets, Ingress, RBAC, StorageClass, Helm/Kustomize.
- Experience integrating Kubernetes with cloud providers (EKS, GKE, AKS, etc.).
- Understanding of kernel operations, networking stack, cgroups, and namespaces.
- Ability to diagnose performance issues (CPU, memory, IO, network).
- Deploying and managing GPU-based applications in Kubernetes.
- Basic knowledge of CUDA, NVIDIA drivers, and nvidia-device-plugin.
- Experience monitoring GPU utilization, memory, thermals, and errors.
- Integrating external services into Kubernetes (logging, monitoring, security, storage).
- Building monitoring and alerting aligned with SLO/SLA standards; incident analysis end-to-end.
- Writing runbooks and automating routine operations.