Назад
24 дня назад

AI Research Engineer (LLM Inference)

Формат работы
hybrid
Тип работы
fulltime
Английский
b2
Страна
France
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

AI Research Engineer (LLM Inference): Designing and running experiments to understand how model architecture decisions propagate into LLM inference behavior, morphing open-weight models into architecture variants optimized for speed, and turning results into measurable gains in generation speed and model quality with an accent on inference-aware architecture research under hardware and distributed communication constraints. Focus on scaling MoE inference, owning the post-training pipeline (fine-tuning/evaluation/adaptation), and writing up findings for top venues and conferences.

Location: Hybrid (at least 50% of time in Paris office), Paris, France

Company

KOG builds an LLM inference engine optimized for high-throughput generation on standard datacenter GPUs.

What you will do

  • Design new model architecture variants (routing strategies, attention mechanisms, MoE structure) using execution constraints as a first-order input.
  • Extend the Laneformer thesis by exploring inference-aware architectural variants (e.g., DTP, Ladder Residual, PT-Transformer) and identifying what compounds at scale.
  • Own the post-training pipeline across fine-tuning, evaluation methodology, and adaptation of open-weight models toward inference-speed-optimized architecture variants.
  • Scale the stack to large MoE models (e.g., DeepSeek v4, Qwen 3), working through routing, expert parallelism, and inference-time communication patterns.
  • Write research papers, submit to top venues, and present at conferences.
  • Contribute to building AI agents that autonomously run architecture research and training experiments.

Requirements

  • Experience with complex AI problems and evidence of serious technical thinking (paper, repository, thesis, or equivalent technical work).
  • Strong understanding of Transformers and MoE, with enough depth to reason across trade-offs (including how communication structure and layer dependencies affect inference behavior).
  • Experience adapting or modifying existing model architectures and producing concrete results.
  • Comfort working at the intersection of model design and hardware constraints.
  • Ability to work in a hybrid setup with at least 50% of time in the Paris office.

Nice to have

  • Experience with post-training methods such as fine-tuning, preference optimization, or quantization.
  • Experience with production-scale exposure (not required).

Culture & Benefits

  • Direct access to AMD and NVIDIA datacenter GPUs from day one.
  • Small team where creativity and technical judgment directly influence key decisions.
  • Work focuses on the critical path of model execution speed and its impact on system capabilities.
  • Remote-friendly working model while requiring at least 50% time in the Paris office.

Hiring process

  • Review of technical evidence (papers, repositories, theses, or equivalent projects) and discussion of relevant research/engineering work.
  • Interviews focused on architecture/inference reasoning and experimentation approach.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →