DevOps/k8s engineer

Tothemoon

  • Алматы
  • Постоянная работа
  • Полная занятость
  • 1 д. назад
  • Быстрый отклик
About TothemoonTothemoon is a user-centric, multiservice digital assets trading platform. At Tothemoon, we prioritize what matters most in finance: reliability. Whether it’s buying, selling, exchanging, or investing in cryptocurrencies, you can trust us to protect your financial interests and propel you towards a prosperous future. Join a rapidly growing community of users who choose Tothemoon for their digital transactions.We offer hands-on experience, challenging tasks, and opportunities for professional and career growth within a dynamic fintech project. We’re looking for a specialist to test our product, including the mobile and web applications, as well as APIs and backend services.Key ResponsibilitiesProduction infrastructure operations and development (90%)
  • Maintain and improve managed Kubernetes clusters (control plane, node pools, autoscaling, PDB, network policies).
  • Support API and ML workloads.
  • Set up monitoring, alerting, logging, backups, and disaster recovery procedures.
  • Investigate and resolve incidents, including on-call participation.
R&D and automation (10%)
  • Research, optimize, and automate the current infrastructure setup.
Tech Stack / Core of the ProjectOrchestration: Kubernetes (multi-pool, autoscaling, GPU workloads)GPU / ML: NVIDIA H100, NVIDIA stack (CUDA, drivers, nvidia-device-plugin), LLM inferenceRequirementsDeep Kubernetes experience (3+ years):
  • Designing and maintaining production clusters (preferably with autoscaling, PDB, network policies).
  • Confident use of Deployments, StatefulSets, Ingress, RBAC, StorageClass, Helm/Kustomize.
  • Experience integrating Kubernetes with cloud providers (EKS, GKE, AKS, etc.).
Strong Linux background:
  • Understanding of kernel operations, networking stack, cgroups, and namespaces.
  • Ability to diagnose performance issues (CPU, memory, IO, network).
GPU and high-load ML/LLM experience — a strong advantage:
  • Deploying and managing GPU-based applications in Kubernetes.
  • Basic knowledge of CUDA, NVIDIA drivers, and nvidia-device-plugin.
  • Experience monitoring GPU utilization, memory, thermals, and errors.
Operational and integration experience:
  • Integrating external services into Kubernetes (logging, monitoring, security, storage).
  • Building monitoring and alerting aligned with SLO/SLA standards; incident analysis end-to-end.
  • Writing runbooks and automating routine operations.
Why Join UsA senior-level team and a friendly, collaborative environment open to innovation and experimentation.Real technical challenges: high load, performance optimization, GPU infrastructure, and real-time workloads.A product team, not outsourcing — your contribution directly impacts the company’s core technology.Opportunities for professional growth and development in AI, ML infrastructure, and blockchain computing.Supportive culture and a comfortable, modern workspace.ConditionsFormat: On-site work in Almaty, Kulan Business Center.Compensation: Competitive salary in USDT or fiat, including paid vacation and sick leave.Benefits: Comfortable office and free lunches.Schedule: Full-time, flexible working hours.Powered by JazzHR

Tothemoon