Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Performance Engineer (AI Inference): Developing and optimizing high-throughput inference systems for Claude with an accent on throughput, latency, reliability, and correctness. Focus on cross-layer performance investigations, building observability tools, and bridging the gap between actual fleet performance and theoretical rooflines.
Location: Hybrid; must be based in or attend one of the offices (San Francisco, New York City, or Seattle) at least 25% of the time.
Salary: $350,000 - $850,000 USD
Company
Anthropic is a public benefit corporation focused on creating reliable, interpretable, and steerable AI systems for the benefit of society.
What you will do
- Conduct cross-layer performance investigations to identify root causes for gaps in throughput, latency, and reliability.
- Own and improve the correctness evaluation pipeline to validate model output quality across hardware platforms and serving configurations.
- Develop observability dashboards and modeling tools to make system interactions legible across the stack.
- Partner with kernel, serving, routing, and capacity teams to implement high-impact optimizations.
- Prioritize and stack-rank a large surface area of optimization opportunities based on impact and effort.
Requirements
- Hands-on experience in performance engineering, including profiling, roofline analysis, and root-cause investigation in production systems.
- Proficiency in Python and the ability to instrument and contribute to large production codebases.
- Strong data analysis skills using SQL, pandas, or similar tools.
- Ability to communicate quantitative results clearly to influence priorities across teams.
- Genuine interest in correctness as an engineering discipline, including numerics and regression detection.
- Must be based in or able to attend the US offices (SF, NYC, or Seattle) at least 25% of the time.
Nice to have
- Experience with ML systems, specifically training or inference infrastructure and LLM serving stacks.
- Familiarity with GPU/TPU/accelerator performance concepts such as memory bandwidth and quantization.
- Reliability engineering experience for high-throughput services, including autoscaling and load balancing.
- Experience building observability or telemetry for distributed systems.
- Experience with model evaluation or numerical regression-detection pipelines.
Culture & Benefits
- Collaborative research environment based on the "big science" approach.
- Competitive compensation and optional equity donation matching.
- Generous vacation and parental leave.
- Flexible working hours and high-quality collaborative office spaces.
- Visa sponsorship available for qualified candidates.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →