AI Performance Engineering 2026 — 14-week Nebius Academy fellowship, dual-track (engineering + product).
Open any module to get an embedded video player paired with hierarchical lecture notes (methodichka) that auto-switch as the video plays, plus a clickable timeline of every topic and subtopic.
Foundations of agentic systems: from LLM to agent, tools and MCP, reasoning and planning, memory architectures, and shipping a sovereign agent.
Mathematical foundations of modern LLMs: gradient optimization, neural networks, sequence models, attention and transformers, efficient fine-tuning (LoRA), pre-training and instruction tuning, and modern model architectures (MoE, GQA, RoPE).
Self-hosting LLMs, distributed training, Kubernetes for AI workloads, LLM evaluation, and experiment tracking with MLflow.
Inference performance engineering: GPU architecture fundamentals, memory hierarchy and bandwidth, the compute-memory gap, CUDA programming model, roofline analysis, and inference optimizations including KV-cache management, PagedAttention/vLLM, continuous batching, compilation (torch.compile, CUDA graphs), and prefix caching. In-progress; more weeks to be added.
Each module's lecture navigator pairs the recorded video with a clickable list of timestamps. Click any entry on the right to jump the video. Slide bookmarks come from the PDFs; topic markers come from transcript analysis (transition phrases like "let's move on" and content shifts between speech windows). Topic markers are especially useful in lecturer-led demos where slides advance slowly but the conversation moves through many practical points.
Open the PDF or external video links inside any module navigator. Subtitles (SRT) are linked too for keyword searching the spoken content.