DevOps Engineer (AI Infrastructure)

Armeta KZ

  • Нур-Султан
  • Постоянная работа
  • Полная занятость
  • 22 д. назад
Armeta Inc. is developing advanced AI-driven systems that transform how large-scale engineering and construction projects are evaluated and approved. Our technology automates complex, compliance-heavy processes, ensuring accuracy and trustworthiness.We are building a high-performance, on-premise computing platform to power our complex multi-agent, data, and backend systems, and we are looking for a DevOps engineer to build and manage this critical infrastructure.Key Responsibilities
  • Design, build, and maintain our high-availability on-premise infrastructure, built on Kubernetes and bare-metal (including supercomputers and NVIDIA DGX systems).
  • Develop and manage robust CI/CD pipelines (e.g., GitLab CI, Jenkins) for automated building, testing, and deployment of all services.
  • Manage the deployment, scaling, and operation of our core technology stack, including:
  • Backend microservices (FastAPI);
  • AI multi-agent systems and LLM-serving platforms;
  • Distributed compute clusters (specifically Ray);
  • Object storage systems (specifically Minio).
  • Implement and manage comprehensive monitoring, logging, and alerting solutions (e.g., Prometheus, Grafana, ELK/Loki) to ensure system health and performance.
  • Manage NVIDIA DGX hardware, including GPU drivers, CUDA, and high-performance networking (e.g., Infiniband).
  • Automate infrastructure provisioning and configuration management using IaC tools (e.g., Ansible, Terraform).
  • Work closely with AI and Backend teams to ensure a smooth, reliable path from research and development to production.
  • Implement and maintain on-premise security best practices, including network policies, access control, and vulnerability management.
Qualifications
  • Expert-level knowledge of Kubernetes (K8s) and the container ecosystem (Docker).
  • Proven experience managing on-premise, bare-metal server environments. Experience with public cloud (AWS, GCP) is a plus, but on-premise expertise is essential.
  • Strong experience with CI/CD tools (e.g., GitLab CI, Jenkins, GitHub Actions).
  • Strong experience with Infrastructure as Code (IaC) tools (especially Ansible, Terraform).
  • 5+ years of hands-on experience in DevOps, SRE, or a similar role.
  • Deep understanding of networking principles (TCP/IP, load balancing, firewalls, VPCs).
  • Proficiency in scripting and automation (e.g., Python, Bash).
  • Experience with monitoring and logging stacks (e.g., Prometheus, Grafana).
Preferred Qualifications (Bonus Points)
  • Strong experience with MLOps tools and platforms (e.g., KubeFlow, MLflow, Seldon Core, KServe).
  • Hands-on experience with NVIDIA GPU management, CUDA, and the NVIDIA GPU Operator for K8s.
  • Direct experience deploying and managing Ray clusters.
  • Direct experience deploying and managing Minio clusters.
  • Experience with high-performance networking (e.g., Infiniband).
  • Experience with distributed storage systems (e.g., Ceph).
Armeta AI is a U.S.– based startup building an engineering intelligence platform that automates construction project reviews and compliance. Our full-stack AI system understands both engineering drawings and technical documents, performing multimodal reasoning across codes, standards, and design disciplines.Unique Advantage Armeta AI has exclusive access to 20+ years of proprietary engineering archives through a family-owned engineering group, along with data from joint ventures and partnerships with Technip Energies, Maire Tecnimont, and FLSmidth. This unmatched data foundation enables us to train highly specialized AI models that general-purpose systems cannot replicate.Current Focus: We are running a national pilot with Kazakhstan’s permitting authority to automate the review and approval of construction projects - creating a repeatable model for governments and large enterprises worldwide. Our early traction validates the demand for domain-specific AI that brings speed, accuracy, and compliance to the construction ecosystem.Our mission is to turn the world’s static engineering knowledge into structured, actionable intelligence - transforming how projects are designed, verified, and approved.Armeta AI is building the foundation for a smarter, faster, and more transparent built world.

HeadHunter