Data Scientist (AI)
ΠΡΡΡ & Π‘ΠΎΠΏΡΠΎΠ²ΠΎΠ΄
ΠΠ»Ρ ΠΌΡΡΡΠ° Ρ ΡΡΠΎΠΉ Π²Π°ΠΊΠ°Π½ΡΠΈΠ΅ΠΉ Π½ΡΠΆΠ΅Π½ Plus
ΠΠΏΠΈΡΠ°Π½ΠΈΠ΅ Π²Π°ΠΊΠ°Π½ΡΠΈΠΈ
TL;DR
Data Scientist (AI): Architecting and maintaining automated evaluation pipelines to assess answer quality for an LLM-first search engine with an accent on designing evaluation sets for tool calls and developing VLM-based solutions for visual rendering. Focus on continuous review of public benchmarks and directly shaping product changes through evaluation metrics.
Location: Hybrid in London, New York City, or Belgrade. USD salary ranges apply only to U.S.-based positions. International salaries are set based on the local market.
Salary: $210,000β$385,000
Company
Perplexity serves tens of millions of users daily with a reliable, high-quality LLM-first search engine and specialized data sources.
What you will do
- Architect and maintain automated evaluation pipelines to assess answer quality across Perplexity's products.
- Design evaluation sets and methods specifically to measure the impact of tool calls on final answer quality.
- Develop VLM-based solutions to programmatically evaluate how final answers render visually across platforms and devices.
- Continuously review and incorporate public benchmarks into regular performance measurements.
- Collaborate closely with technical leadership to measure and improve Answer Quality.
Requirements
- PhD or MS in a technical field or equivalent experience.
- 4+ years of experience in data science or machine learning.
- Strong proficiency in Python and SQL (expected to write production-grade code).
- Experience building within a modern cloud data stack, specifically AWS and Databricks.
- Comfortable with agentic coding workflows and using AI-assisted development tools.
Nice to have
- 1+ years of experience working with LLMs at scale, specifically with LLM-as-a-judge setups.
- Prior experience working on customer-facing web products or consumer apps, with real user traffic at scale.
- A strong research background, with experience applying research methods to real-world ML problems.
- Experience defining evaluation metrics and building ground truth datasets.
Culture & Benefits
- Comprehensive benefits program including equity, health, dental, vision, retirement, fitness, commuter, and dependent care accounts for U.S. employees.
- Full-time employees outside the U.S. enjoy a comprehensive benefits program tailored to their region of residence.
- Operate within a small, high-impact team.
- Evaluation metrics directly shape product changes.
ΠΡΠ΄ΡΡΠ΅ ΠΎΡΡΠΎΡΠΎΠΆΠ½Ρ: Π΅ΡΠ»ΠΈ ΡΠ°Π±ΠΎΡΠΎΠ΄Π°ΡΠ΅Π»Ρ ΠΏΡΠΎΡΠΈΡ Π²ΠΎΠΉΡΠΈ Π² ΠΈΡ ΡΠΈΡΡΠ΅ΠΌΡ, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΡ iCloud/Google, ΠΏΡΠΈΡΠ»Π°ΡΡ ΠΊΠΎΠ΄/ΠΏΠ°ΡΠΎΠ»Ρ, Π·Π°ΠΏΡΡΡΠΈΡΡ ΠΊΠΎΠ΄/ΠΠ, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡΠ΅ ΡΡΠΎΠ³ΠΎ - ΡΡΠΎ ΠΌΠΎΡΠ΅Π½Π½ΠΈΠΊΠΈ. ΠΠ±ΡΠ·Π°ΡΠ΅Π»ΡΠ½ΠΎ ΠΆΠΌΠΈΡΠ΅ "ΠΠΎΠΆΠ°Π»ΠΎΠ²Π°ΡΡΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡΠΈΡΠ΅ Π² ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΡ. ΠΠΎΠ΄ΡΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β