Senior Software Engineer (AI)
ΠΡΡΡ & Π‘ΠΎΠΏΡΠΎΠ²ΠΎΠ΄
ΠΠ»Ρ ΠΌΡΡΡΠ° Ρ ΡΡΠΎΠΉ Π²Π°ΠΊΠ°Π½ΡΠΈΠ΅ΠΉ Π½ΡΠΆΠ΅Π½ Plus
ΠΠΏΠΈΡΠ°Π½ΠΈΠ΅ Π²Π°ΠΊΠ°Π½ΡΠΈΠΈ
TL;DR
Senior Software Engineer (AI): Building and optimizing distributed training infrastructure and scalable pipelines for large-scale foundation models with an accent on GPU utilization, training performance, and model adaptation. Focus on designing and implementing efficient training systems, collaborating cross-functionally, and advancing scalable AI model training technology.
Company
Baseten powers inference for leading AI companies by uniting applied AI research, flexible infrastructure, and developer tooling, backed by $150M Series D funding.
What you will do
- Design, build, and maintain distributed training infrastructure for foundation models
- Implement scalable pipelines for fine-tuning and training on heterogeneous GPU clusters
- Optimize training performance using advanced techniques like FSDP, DDP, ZeRO, and mixed precision
- Develop frameworks and tooling to improve training workflow efficiency and reproducibility
- Collaborate with product and infrastructure teams to meet customer needs
- Research and productionize emerging training efficiency techniques
Requirements
- Must have 5+ years experience in ML infrastructure or distributed systems, including 2+ years in tech lead or manager role
- Strong expertise in distributed training frameworks and GPU utilization
- Bachelorβs degree or equivalent experience in Computer Science or related field
- Excellent communication skills bridging technical and business needs
- Location: San Francisco or New York
Nice to have
- Experience building APIs, SDKs, or developer tools for ML workflows
- Familiarity with cluster management and scheduling tools
- Knowledge of parameter-efficient fine-tuning methods and evaluation pipelines
- Open-source contributions in distributed training or ML infrastructure
- Experience with cloud environments and container orchestration
Culture & Benefits
- Competitive compensation with meaningful equity
- Full medical, dental, and vision insurance coverage
- Generous PTO including company-wide Winter Break
- Paid parental leave and 401(k) plan
- Exposure to diverse ML startups and networking opportunities
ΠΡΠ΄ΡΡΠ΅ ΠΎΡΡΠΎΡΠΎΠΆΠ½Ρ: Π΅ΡΠ»ΠΈ ΡΠ°Π±ΠΎΡΠΎΠ΄Π°ΡΠ΅Π»Ρ ΠΏΡΠΎΡΠΈΡ Π²ΠΎΠΉΡΠΈ Π² ΠΈΡ ΡΠΈΡΡΠ΅ΠΌΡ, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΡ iCloud/Google, ΠΏΡΠΈΡΠ»Π°ΡΡ ΠΊΠΎΠ΄/ΠΏΠ°ΡΠΎΠ»Ρ, Π·Π°ΠΏΡΡΡΠΈΡΡ ΠΊΠΎΠ΄/ΠΠ, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡΠ΅ ΡΡΠΎΠ³ΠΎ - ΡΡΠΎ ΠΌΠΎΡΠ΅Π½Π½ΠΈΠΊΠΈ. ΠΠ±ΡΠ·Π°ΡΠ΅Π»ΡΠ½ΠΎ ΠΆΠΌΠΈΡΠ΅ "ΠΠΎΠΆΠ°Π»ΠΎΠ²Π°ΡΡΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡΠΈΡΠ΅ Π² ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΡ. ΠΠΎΠ΄ΡΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β