Назад
обновлено 21 день назад

Staff Infrastructure Engineer (Storage)

Формат работы
onsite
Тип работы
fulltime
Грейд
lead
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff Infrastructure Engineer (Storage): Designing and operating large-scale distributed storage platforms for high-performance AI/ML workloads with an accent on system resilience, scalability, and performance tuning. Focus on integrating Ceph with Kubernetes and solving complex bottlenecks across disk subsystems and RDMA network paths.

Location: Las Vegas, Nevada (Must be authorized to work in the United States)

Company

TensorWave provides seamless and resilient AI compute at scale via a versatile cloud platform that eliminates infrastructure barriers for AI builders.

What you will do

  • Design and evolve storage architectures supporting Kubernetes and high-performance compute workloads, prioritizing resilience and failure-domain awareness.
  • Own production storage platforms, including Ceph (RBD, CephFS, RGW) and high-performance NAS (Weka, VAST).
  • Lead lifecycle operations: cluster design, deployment, scaling, upgrades, and migrations.
  • Analyze storage performance (IOPS, throughput, latency) and resolve bottlenecks across disk subsystems and network paths.
  • Implement Kubernetes storage patterns including CSI drivers and StorageClasses for stateful workloads.
  • Develop automation for storage deployment and lifecycle management using Ansible, Terraform, and Helm.

Requirements

  • 7+ years of experience in infrastructure, storage, or distributed systems.
  • Deep hands-on experience with Ceph (RBD, CephFS, RGW) in production environments.
  • Experience with high-performance storage platforms such as Weka or VAST Data.
  • Strong Linux systems expertise and the ability to troubleshoot across storage, network, and compute layers.
  • Must have valid authorization to work in the United States.

Nice to have

  • Experience supporting AI/ML or HPC workloads.
  • Familiarity with NVMe-based architectures and RDMA or high-throughput Ethernet.
  • Experience integrating storage with Kubernetes at scale across multiple data centers.
  • Exposure to object storage and S3-compatible APIs.

Culture & Benefits

  • Equity through stock options.
  • 100% paid medical, dental, and vision insurance.
  • Company contributions to Health Savings Account (HSA) and 401(k) plan.
  • Flexible PTO and paid holidays.
  • Comprehensive insurance coverage including short/long term disability and life insurance.
  • Parental leave and various in-office perks.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →