Назад
25 дней назад

Principal Software Engineer (Compute Fleet Management)

Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Principal Software Engineer (Compute Fleet Management): Lead the technical direction for Roblox’s compute fleet management layer across provisioning, data plane, control planes, and internal self-serve capacity products with an accent on Kubernetes-style declarative control planes, fleet-wide automation contracts, and maintaining compute supply/demand balance at scale. Focus on designing reliable, secure fleet operations for on-prem and cloud Kubernetes clusters while writing code daily on the hardest systems and implementation problems.

Company

Roblox builds tools and a platform that help its community create and run 3D immersive digital experiences.

What you will do

  • Serve as overall technical lead for three Fleet Management pods, aligning technical direction across provisioning, data plane, and control plane surfaces.
  • Architect declarative, Kubernetes-style control planes for operating the compute fleet across on-prem and cloud, including reconciliation and scalable exposure of capacity.
  • Own internal customer contracts and APIs that govern automation across the fleet so infrastructure teams can operate capacity safely and predictably.
  • Drive self-serve capacity strategy via internal-facing products and UIs for requesting, managing, and reasoning about compute.
  • Centralize and raise the bar on security, maintenance operations, and uptime for all Roblox Kubernetes clusters, ensuring fleet-wide changes ship reliably.
  • Write code daily and partner with stakeholders to understand compute needs and drive innovation for backend services, AI, and edge computing.

Requirements

  • 10+ years of experience building and operating large-scale distributed systems and infrastructure.
  • Proven technical leadership as an organization’s technical anchor across multiple teams, with ability to set direction and raise engineering standards.
  • Strong proficiency in Go and deep experience designing/operating production services at fleet scale.
  • Hands-on experience building declarative, Kubernetes-style control planes and reconciliation patterns.
  • Strong proficiency with gRPC for service-to-service APIs and with SQL/Postgres for durable, high-scale state.
  • Experience operating compute capacity across both on-prem data centers and cloud providers at the scale of hundreds of thousands of instances.

Culture & Benefits

  • Onsite schedule for headquarters-based roles: onsite Tuesday, Wednesday, and Thursday, with optional presence on Monday and Friday.
  • Full-time employees are eligible for equity compensation and benefits.
  • Focus on solving unique technical challenges at scale and building safer, more civil shared experiences.
  • Equal employment opportunity and reasonable accommodations during the recruiting process.

Hiring process

  • Interviews and evaluations focused on technical leadership, systems design, and hands-on engineering depth.
  • Additional checks for US-based roles regarding work authorization and potential H-1B sponsorship constraints.

Location: San Mateo, CA (onsite Tuesday–Thursday; optional Monday/Friday)

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →