AI Performance Engineering — Course Navigator

Course

AI Performance Engineering 2026 — 14-week Nebius Academy fellowship, dual-track (engineering + product).

Format: Wednesdays 18:30 & Saturdays 12:30 (London time)
Venue: Wallacespace, London
Module 1 (AI Agents): 5 weeks · Module 2 (LLM Architectures): 8 weeks · Module 3 (MLOps): 6 weeks · Module 4 (Performance Engineering): 6 weeks · Module 5 (LLM Post-Training): 3 weeks

This Navigator

Open any module to get an embedded video player paired with hierarchical lecture notes (methodichka) that auto-switch as the video plays, plus a clickable timeline of every topic and subtopic.

Topic — main section of the lecture (collapsible)

Subtopic — click to seek; notes auto-follow playback

Q&A — audience questions surfaced under each subtopic

Modules

Module 1: AI Agentsactive

Foundations of agentic systems · Instructor: Rod Rivera (DevRel at Rasa · AI Professor at ITAM · Nebius Academy Lecturer)

Foundations of agentic systems: from LLM to agent, tools and MCP, reasoning and planning, memory architectures, and shipping a sovereign agent.

2:46:08Week 1 — From AI Model to AI Agent7 topics28 subtopics

2:23:20Week 2 — Building the "Hands" of Your Agent7 topics30 subtopics

2:26:13Week 3 — Reasoning & Planning for AI Agents7 topics30 subtopics

2:15:20Week 4 — The Agent Learns to Remember7 topics26 subtopics

2:32:54Week 5 — Sovereign Agent: Stopping the LLM from Lying to Us7 topics29 subtopics

Module 2: LLM Architecturesactive

Mathematical foundations of modern LLMs · Instructor: Stan Fedotov and Nebius Academy LLM Architectures faculty

Mathematical foundations of modern LLMs: gradient optimization, neural networks, sequence models, attention and transformers, efficient fine-tuning (LoRA), pre-training and instruction tuning, and modern model architectures (MoE, GQA, RoPE).

2:30:39Week 1 — Welcome & LLM Architecture Foundations8 topics33 subtopics

1:58:30Week 2 — Gradient Optimization7 topics30 subtopics

2:53:30Week 3 — Understanding Neural Networks5 topics23 subtopics

2:44:43Week 4 — Neural Networks For Sequences7 topics24 subtopics

2:50:17Week 5 — Attention & Transformers8 topics28 subtopics

2:48:12Week 6 — Transformer Architecture & LoRA7 topics24 subtopics

2:40:13Week 7 — Pre-training & Fine-tuning6 topics22 subtopics

2:35:35Week 8 — Models Architecture6 topics20 subtopics

Module 3: MLOpsactive

Deploying & operating ML systems · Instructor: Sergey Vasilinets and Nebius MLOps team

Self-hosting LLMs, distributed training, Kubernetes for AI workloads, LLM evaluation, experiment tracking with MLflow, production-grade LLM inference, vector databases, and storage for AI.

2:39:08Week 1 — Intro to MLOps & Self-hosting8 topics36 subtopics

2:14:27Week 2 — Distributed Training & Kubernetes8 topics35 subtopics

2:46:09Week 3 — Evaluating LLM-powered Systems7 topics32 subtopics

2:10:14Week 4 — Experiment Tracking with MLflow7 topics37 subtopics

2:14:52Week 5 — Production-grade LLM Inference7 topics30 subtopics

2:30:27Week 6 — Inference Wrap-up · Vector DBs · Storage for AI9 topics30 subtopics

Module 4: Performance Engineeringactive

Inference performance, distributed training & GPU optimization · Instructors: Alexey Bukhtiyarov, Sergei Skvortsov, Ruslan Vasilev (Nebius)

Inference performance engineering: GPU architecture fundamentals, memory hierarchy and the compute-memory gap, roofline analysis, inference optimizations (KV-cache management, PagedAttention/vLLM, continuous batching, compilation, prefix caching), speculative decoding and quantization, distributed training (DDP, collectives, ZeRO/FSDP, tensor/sequence/context/expert/pipeline parallelism), and low-level GPU programming with CUDA, Triton, and FlashAttention.

2:47:10Week 1 — GPU & Inference Intro8 topics31 subtopics

2:42:49Week 2 — Inference Optimizations8 topics25 subtopics

2:05:33Week 3 — Speculative Decoding & Quantization8 topics25 subtopics

2:05:37Week 4 — Distributed Training I: DDP, Collectives & FSDP7 topics31 subtopics

2:28:40Week 5 — Model Parallelism: Tensor, Sequence, Context, Expert & Pipeline8 topics33 subtopics

2:09:49Week 6 — Low-level GPU Programming: CUDA, Triton & FlashAttention8 topics31 subtopics

Module 5: LLM Post-Trainingactive

Adapting LLMs after pre-training · Instructor: Anton Plaksin (Nebius)

LLM adaptation and post-training: evaluation and benchmarks, prompting, supervised fine-tuning and PEFT/LoRA, reward design (verifiable rewards, LLM judges, Bradley–Terry, reward hacking), policy-gradient RL from REINFORCE to GRPO, DAPO-style training tricks, PPO, and DPO — closing with the full modern post-training pipeline.

2:48:33Week 1 — LLM Adaptation: Evaluation, Prompting & Supervised Fine-Tuning10 topics37 subtopics

2:49:56Week 2 — RL Fine-Tuning: Reward Design & GRPO9 topics36 subtopics

2:07:34Week 3 — RL Fine-Tuning Algorithms II–III: DAPO Tricks, PPO & DPO8 topics26 subtopics

About

Each module's lecture navigator pairs the recorded video with a clickable list of timestamps. Click any entry on the right to jump the video. Slide bookmarks come from the PDFs; topic markers come from transcript analysis (transition phrases like "let's move on" and content shifts between speech windows). Topic markers are especially useful in lecturer-led demos where slides advance slowly but the conversation moves through many practical points.

Open the PDF or external video links inside any module navigator. Subtitles (SRT) are linked too for keyword searching the spoken content.

Contact: m@zpj.wtf