Staff Site Reliability Engineer (Cloud)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Staff Site Reliability Engineer (Cloud): Driving reliability, scalability, and operational excellence across a multi-cloud infrastructure with an accent on embedding SRE practices into product development lifecycles. Focus on defining production-readiness standards, leading incident management, and building paved roads for a globally distributed engineering team.
Company
is the team behind WebContainers and Bolt.new, an AI-powered app builder that enables developers to create, edit, and deploy full-stack applications directly in the browser.
What you will do
- Partner with development teams throughout the project lifecycle to design observable and scalable systems.
- Define and evolve production-readiness standards, launch checklists, and operational acceptance criteria.
- Establish meaningful SLIs, SLOs, and error budgets to guide engineering prioritization.
- Build frameworks and golden paths across AWS, GCP, and Azure using Terraform.
- Lead incident management and blameless postmortems to drive systematic improvements.
- Participate in an on-call rotation to ensure system reliability and resolve live incidents.
Requirements
- Strong verbal and written English communication skills required.
- Significant experience as an SRE, production engineer, or software engineer operating at scale.
- Fluency across AWS, GCP, and Azure with deep expertise in Terraform.
- Proficiency in TypeScript and Ruby on Rails to contribute to service codebases.
- Proven track record of technical leadership and influencing teams without formal authority.
- Ability to drive ambiguous, high-scope initiatives to completion with minimal oversight.
Nice to have
- Experience maturing an SRE practice at a growth-stage company.
- Background in embedded SRE roles partnering closely with product teams.
- Experience with chaos engineering, resilience testing, or progressive delivery practices.
Culture & Benefits
- Fully remote and globally distributed team environment.
- High-influence role with the opportunity to shape organizational reliability culture.
- Collaborative atmosphere focused on solving complex problems and shipping fast.
- Opportunity to work on cutting-edge AI-powered development tools.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →