AI Senior Staff Systems Engineer (AI Infrastructure)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
AI Senior Staff Systems Engineer (AI Infrastructure): Leading the development, operations, and support of enterprise AI infrastructure with an accent on high-performance GPU clusters and LLM deployment. Focus on architecting next-generation AI systems, optimizing inference throughput using vLLM/TGI, and building production-grade Agentic AI workflows.
Location: San Jose, CA
Salary: $136,500 – $253,500
Company
is a leading provider of intelligent system design software, enabling the development of the world's most advanced electronic products.
What you will do
- Design and implement next-generation AI infrastructure to support Agentic AI initiatives.
- Lead the configuration, installation, and optimization of on-premise GPU server clusters and storage solutions.
- Integrate and secure public cloud AI services, specifically Azure OpenAI and Google Cloud Platform (GCP) services like Gemini.
- Deploy and optimize Large Language Models (LLMs) using techniques like quantization and frameworks such as vLLM, TGI, and TensorRT-LLM.
- Architect production-grade Agentic AI workflows that integrate LLMs with external tools, APIs, and databases.
- Develop automation scripts in Python, Bash, or Perl and implement monitoring for GPU utilization and system health.
Requirements
- 10+ years of technical experience, with at least 5 years focused on HPC or AI infrastructure.
- Expert-level knowledge of NVIDIA GPU architecture, CUDA, and cuDNN.
- Proven experience managing access, usage, and billing for Azure OpenAI and GCP AI services.
- Extensive hands-on experience with Docker, Kubernetes, and Linux system administration (RHEL preferred).
- Proficiency in scripting languages such as Python, Bash, or Perl.
- Must be based in or able to work from San Jose, CA.
Nice to have
- Experience administering LSF clusters in production or research environments (Slurm is a plus).
- Understanding of AI job profiling and tuning (memory, GPU, I/O).
- Experience with macOS/AppleSilicon system administration and troubleshooting.
Culture & Benefits
- Competitive compensation package including bonus and equity.
- Comprehensive health benefits: medical, dental, and vision plan options.
- Financial security through a 401(k) plan with employer match and an employee stock purchase plan.
- Generous time off including paid vacation and paid holidays.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →