Назад
Company hidden
обновлено 18 дней назад

Site Reliability Engineer (Kubernetes)

Формат работы
remote
Тип работы
fulltime
Грейд
middle
Английский
b2
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Site Reliability Engineer (Kubernetes): Improving the availability, performance, and scalability of large-scale, multi-cloud SaaS environments with an accent on automation, observability, and incident response. Focus on designing backend services and production engineering tools while integrating AI-assisted workflows to enhance operational efficiency.

Company

hirify.global is a software company providing a platform to manage, accelerate, and secure software delivery from code to production.

What you will do

  • Support the reliability, performance, and scalability of large-scale, multi-cloud, Kubernetes-based SaaS environments.
  • Investigate and troubleshoot production issues across distributed systems and infrastructure in collaboration with Engineering teams.
  • Design and develop backend services, internal platforms, and production engineering tools using Python or Go.
  • Improve observability and operational readiness through SRE practices, monitoring, and capacity planning.
  • Evaluate and contribute to AI-assisted automation solutions to improve troubleshooting and production workflows.
  • Participate in on-call rotations and lead incident response to ensure system stability.

Requirements

  • 2-4 years of experience in SRE, Production Engineering, or DevOps roles.
  • Hands-on experience with Kubernetes-based containerized workloads.
  • Experience with at least one public cloud provider: AWS, GCP, or Azure.
  • Proficiency in developing backend services or automation tools using Python, Go, or similar languages.
  • Strong understanding of Linux fundamentals, networking, and production troubleshooting.
  • Familiarity with CI/CD tools and observability platforms like Prometheus or Grafana.

Nice to have

  • Experience using AI-assisted operational workflows for log analysis or incident triage.
  • Familiarity with agentic automation frameworks such as LangGraph or LangChain.
  • Experience with AI-assisted development tools like GitHub Copilot or Cursor.

Culture & Benefits

  • Opportunity to work on a mission-critical platform used by the majority of the Fortune 100.
  • Collaborative, impact-focused environment with a focus on modern SRE practices.
  • Continuous learning culture with exposure to cutting-edge technologies and AI integration.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →