Назад
12 дней назад

Infrastructure Support Engineer (GPUs)

Формат работы
onsite
Тип работы
fulltime
Грейд
middle
Английский
b2
Страна
Singapore
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Infrastructure Support Engineer (GPUs): Maintaining and troubleshooting high-performance GPU cloud infrastructure for AI workloads with an accent on service reliability and rapid incident response. Focus on managing Kubernetes clusters, Linux-based systems, and GPU-specific diagnostics to ensure seamless AI development for customers.

Location: Singapore (includes availability to travel to Nscale or Customer locations)

Company

Nscale is a GPU cloud provider engineered specifically for AI startups and large enterprises to reduce the complexity of AI development.

What you will do

  • Handle day-to-day tickets and alerts within the support duty rotation, escalating complex incidents to Engineering.
  • Resolve common issues using established runbooks and contribute to their improvement and incremental fixes.
  • Monitor, troubleshoot, and triage platform issues, capturing logs and facts for efficient handover.
  • Collaborate with cross-functional teams and serve as the escalation point for onsite operations staff.
  • Document validated steps and contribute to training materials to build team capability.
  • Identify and implement automation opportunities to optimize support processes.

Requirements

  • 2-4 years of experience in support, operations, or infrastructure engineering, ideally within cloud or Data Centre environments.
  • Proficiency in Linux CLI, system services, filesystems, permissions, and basic networking tools.
  • Solid grasp of networking basics: IP addressing, subnets, VLANs, routing, DNS, and firewalls.
  • Exposure to Kubernetes core concepts (nodes, pods, services, logs) and basic troubleshooting.
  • Familiarity with GPU diagnostics such as nvidia-smi.
  • Ability to write simple Bash or Python scripts and use Git for version control.

Nice to have

  • Hands-on experience with Kubernetes administration, operators, or specialized storage/networking add-ons.
  • Knowledge of RDMA/InfiniBand, HPC concepts, and NCCL for performance troubleshooting.
  • Experience with Infrastructure as Code tools like Ansible or Terraform.
  • Participation in GitOps and CI/CD pipelines using GitHub Actions.
  • Experience with security tooling such as Teleport or Vault.

Culture & Benefits

  • Culture of relentless innovation, ownership, and accountability.
  • Commitment to openness, transparency, and an open-source approach to build trust.
  • Dedicated focus on sustainability and reducing the environmental impact of AI technologies.
  • Fast, efficient, and respectful collaboration within a global team.
  • Inclusive environment with an equal opportunities statement for diverse backgrounds.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →