Company hidden

1 день назад

Manager, Site Reliability Engineering

Формат работы

hybrid

Тип работы

fulltime

Грейд

lead

Английский

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Manager, Site Reliability Engineering (SRE/DevOps): Lead a team of Site Reliability Engineers to maintain reliability, scalability, and performance of hirify.global systems with an accent on multi-cloud reliability strategy, incident response, and automation. Focus on building SLO/SLI/SLA practices, improving observability and deployment processes, and driving infrastructure resilience (high availability and disaster recovery) with continuous learning through RCA and post-mortems.

Location: BGR Sofia (Hybrid)

Company

hirify.global is a travel technology company powering intelligent offer and revenue optimization for airlines.

What you will do

Lead and mentor the SRE team, driving reliability, accountability, and continuous improvement.
Develop and implement strategies for multi-cloud reliability, monitoring, and incident response.
Drive automation for deployment processes, infrastructure as code (IaC), and operational efficiency.
Manage observability tooling for logging, metrics, and alerting; establish SLOs/SLIs/SLAs.
Oversee root cause analysis (RCA) and post-mortems to improve systems and processes.
Ensure high availability and disaster recovery strategies are in place and regularly tested; optimize cloud infrastructure costs.

Requirements

7+ years of experience in software engineering, SRE, or DevOps, including 3+ years in a managerial or leadership role.
Strong cloud platform knowledge (Azure, AWS, IBM Cloud) and containerization (Docker, Kubernetes).
Proficiency with automation and configuration management tools (Terraform, Ansible, Puppet, The Foreman).
Experience with monitoring and observability tools (e.g., Prometheus, Grafana, PagerDuty, Graylog).
Solid programming/scripting skills in Python, Go, Bash, or similar languages.
Expertise in CI/CD pipelines and modern deployment strategies; strong analytical and problem-solving skills.

Nice to have

Experience with large-scale distributed systems.
Knowledge of networking, security, and compliance best practices.
Experience with incident response and ITIL framework.
Background in high-availability, customer-facing production environments.

Culture & Benefits

Flexible ways of working with a hybrid setup.
Culture focused on ownership, innovation, and care.
Continuous learning and support to grow and innovate.
Collaboration between software development and operations teams.

Hiring process

Interviews to assess leadership, SRE/DevOps experience, and technical depth across reliability, automation, and observability.
Discussion of collaboration approach and experience improving production reliability through incident management and RCA.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Похожие вакансии