Назад
1 месяц назад

Performance Engineer (AI Inference)

350 000 - 850 000$
Формат работы
hybrid
Тип работы
fulltime
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Performance Engineer (AI Inference): Developing and optimizing high-throughput inference systems for Claude with an accent on throughput, latency, reliability, and correctness. Focus on cross-layer performance investigations, building observability tools, and bridging the gap between actual fleet performance and theoretical rooflines.

Location: Hybrid; must be based in or attend one of the offices (San Francisco, New York City, or Seattle) at least 25% of the time.

Salary: $350,000 - $850,000 USD

Company

Anthropic is a public benefit corporation focused on creating reliable, interpretable, and steerable AI systems for the benefit of society.

What you will do

  • Conduct cross-layer performance investigations to identify root causes for gaps in throughput, latency, and reliability.
  • Own and improve the correctness evaluation pipeline to validate model output quality across hardware platforms and serving configurations.
  • Develop observability dashboards and modeling tools to make system interactions legible across the stack.
  • Partner with kernel, serving, routing, and capacity teams to implement high-impact optimizations.
  • Prioritize and stack-rank a large surface area of optimization opportunities based on impact and effort.

Requirements

  • Hands-on experience in performance engineering, including profiling, roofline analysis, and root-cause investigation in production systems.
  • Proficiency in Python and the ability to instrument and contribute to large production codebases.
  • Strong data analysis skills using SQL, pandas, or similar tools.
  • Ability to communicate quantitative results clearly to influence priorities across teams.
  • Genuine interest in correctness as an engineering discipline, including numerics and regression detection.
  • Must be based in or able to attend the US offices (SF, NYC, or Seattle) at least 25% of the time.

Nice to have

  • Experience with ML systems, specifically training or inference infrastructure and LLM serving stacks.
  • Familiarity with GPU/TPU/accelerator performance concepts such as memory bandwidth and quantization.
  • Reliability engineering experience for high-throughput services, including autoscaling and load balancing.
  • Experience building observability or telemetry for distributed systems.
  • Experience with model evaluation or numerical regression-detection pipelines.

Culture & Benefits

  • Collaborative research environment based on the "big science" approach.
  • Competitive compensation and optional equity donation matching.
  • Generous vacation and parental leave.
  • Flexible working hours and high-quality collaborative office spaces.
  • Visa sponsorship available for qualified candidates.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →