Назад
Company hidden
4 дня назад

Principal Site Reliability Engineer

Формат работы
remote (только USA)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Principal Site Reliability Engineer (Kubernetes/AWS/GCP): Shape the long-term strategy and architecture for cloud and on-premise infrastructure powering a high-demand sports betting and gaming platform, with an accent on Kubernetes reliability, scalability, and operational consistency. Focus on defining SLOs/error budgets, building automation-first infrastructure (Infrastructure as Code, GitOps, self-healing), leading major incidents and post-incident improvements, and mentoring senior engineers to elevate platform resilience and developer experience.

Location: Remote (US)

Company

hirify.global is a publicly traded technology company powering sports betting and gaming.

What you will do

  • Define and execute long-term strategy for the Kubernetes platform across Google Kubernetes Engine, Amazon Elastic Kubernetes Service, RKE2, and on-premise environments.
  • Drive architectural decisions for cluster lifecycle management, networking, identity and access management, observability, autoscaling, capacity planning, and cost optimization.
  • Lead large-scale platform initiatives across multiple engineering teams, setting technical direction, standards, and measurable reliability outcomes.
  • Establish and evolve reliability practices using SLOs, SLIs, and error budget frameworks aligned to business priorities.
  • Build automation-first infrastructure with Infrastructure as Code, GitOps workflows, self-healing systems, and internal platform tooling.
  • Lead critical platform incidents and drive post-incident improvements to strengthen resilience; mentor senior engineers through architecture reviews and coaching.

Requirements

  • Location: Must be based in the United States (Remote - US)
  • Bachelor’s degree in Computer Science or a related technical field.
  • At least 8 years of experience designing, operating, and scaling distributed cloud and on-premise infrastructure, including at least 3 years at Staff/Principal (or equivalent) technical leadership level.
  • Proven experience leading large-scale infrastructure or platform initiatives with cross-functional alignment and long-term technical ownership.
  • Deep expertise with Kubernetes (cluster architecture, networking, storage, security, operators, lifecycle management) and large-scale production operations.
  • Extensive production infrastructure experience on AWS and Google Cloud Platform using Infrastructure as Code (e.g., Terraform, Pulumi), plus strong software development experience in Go and/or Python.

Culture & Benefits

  • Opportunity to shape infrastructure strategy for one of the most demanding sports betting and gaming platforms.
  • Automation-first approach to improve engineering velocity and reduce operational overhead.
  • Responsible adoption of AI-powered engineering capabilities to improve operational efficiency and incident response.
  • Mentorship and technical leadership through architecture reviews, coaching, and measurable reliability outcomes.
  • Equal-opportunity employer; support through the licensing process if required by state gaming regulations.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →