AI Performance Engineering — Course Navigator

Nebius Academy 2026 · 4 of 4 modules with content (Module 4 in progress) · 19 lectures · 48h 45m · 135 topics · 543 subtopics with notes

Course

AI Performance Engineering 2026 — 14-week Nebius Academy fellowship, dual-track (engineering + product).

  • Format: Wednesdays 18:30 (London time)
  • Venue: Wallacespace, London
  • Module 1 (AI Agents): 5 weeks · Module 2 (LLM Architectures): 8 weeks · Module 3 (MLOps): 4 weeks · Module 4 (Performance Engineering): 2 weeks so far

This Navigator

Open any module to get an embedded video player paired with hierarchical lecture notes (methodichka) that auto-switch as the video plays, plus a clickable timeline of every topic and subtopic.

Topic — main section of the lecture (collapsible)
Subtopic — click to seek; notes auto-follow playback

Modules

Module 1: AI Agentsactive

Foundations of agentic systems · Instructor: Rod Rivera (DevRel at Rasa · AI Professor at ITAM · Nebius Academy Lecturer)

Foundations of agentic systems: from LLM to agent, tools and MCP, reasoning and planning, memory architectures, and shipping a sovereign agent.

Module 2: LLM Architecturesactive

Mathematical foundations of modern LLMs · Instructor: Stan Fedotov and Nebius Academy LLM Architectures faculty

Mathematical foundations of modern LLMs: gradient optimization, neural networks, sequence models, attention and transformers, efficient fine-tuning (LoRA), pre-training and instruction tuning, and modern model architectures (MoE, GQA, RoPE).

1:58:30Week 2 — Gradient Optimization7 topics30 subtopics
2:53:30Week 3 — Understanding Neural Networks5 topics23 subtopics
2:44:43Week 4 — Neural Networks For Sequences7 topics24 subtopics
2:50:17Week 5 — Attention & Transformers8 topics28 subtopics
2:40:13Week 7 — Pre-training & Fine-tuning6 topics22 subtopics
2:35:35Week 8 — Models Architecture6 topics20 subtopics

Module 3: MLOpsactive

Deploying & operating ML systems · Instructor: Sergey Vasilinets and Nebius MLOps team

Self-hosting LLMs, distributed training, Kubernetes for AI workloads, LLM evaluation, and experiment tracking with MLflow.

Module 4: Performance Engineeringin progress

Inference performance & GPU optimization · Instructor: Alexey Bukhtiyarov (Nebius)

Inference performance engineering: GPU architecture fundamentals, memory hierarchy and bandwidth, the compute-memory gap, CUDA programming model, roofline analysis, and inference optimizations including KV-cache management, PagedAttention/vLLM, continuous batching, compilation (torch.compile, CUDA graphs), and prefix caching. In-progress; more weeks to be added.

2:47:10Week 1 — GPU & Inference Intro8 topics31 subtopics
2:42:49Week 2 — Inference Optimizations8 topics25 subtopics

About

Each module's lecture navigator pairs the recorded video with a clickable list of timestamps. Click any entry on the right to jump the video. Slide bookmarks come from the PDFs; topic markers come from transcript analysis (transition phrases like "let's move on" and content shifts between speech windows). Topic markers are especially useful in lecturer-led demos where slides advance slowly but the conversation moves through many practical points.

Open the PDF or external video links inside any module navigator. Subtitles (SRT) are linked too for keyword searching the spoken content.