Spring 25 CSCI 8980 Introduction to LLM System

Instructor: Zirui “Ray” Liu

Time: Spring 2025, M/W 4:00 PM - 05:15 PM

Location: Amundson Hall 158

Course Description

Recent progress of Artificial Intelligence has been largely driven by advances in large language models (LLMs) and other generative methods. The success of LLMs is driven by three key factors: the scale of data, the size of models, and the computational resources available. For example, Llama 3, a cutting- edge LLM with 405 billion parameters, was trained over 15 trillion tokens using 16,000 H100 GPUs. Thus, training, serving, fine-tuning, and evaluating such models demand sophisticated engineering practices that leverage modern hardware and software infrastructures. Building scalable systems for LLMs is essential for further advancing AI capabilities. In this course, students will learn the design principle of a good Machine Learning System for LLMs. Specifically, this course mainly covers three topics (1) the components of modern transformer-based language models; (2) the knowledge about GPU programming; (3) how to train and deploy LLMs in a scalable way.

Grading Policy

The grading policy is subject to minor change.

The course will include five short quizzes and a final project. The final project can be completed individually or collaboratively in groups of up to three students. Presentations for the final project will occur during the last three weeks of the semester and will include a Q&A session. For the final project, students must select a research paper focused on MLSys. They will critically analyze the paper to identify its limitations or areas that could be improved. Based on this analysis, students are expected to propose and develop a solution to address the identified issues or introduce an innovative idea to enhance the research.

Acknowledgement

Many of my course contents are inspired by the great materials from CMU’s Large Language Model Systems Course and CMU’s Deep Learning Systems Course

Course Schedule (tentative)

The course schedule may be changed.

Week Topic Papers & Materials Slides Quiz
Week 1 Course overview - Slide -
Week 2 Automatic Differentiation — Reverse Mode Automatic Differentiation Wikipedia of Automatic Differentiation
Andrej Karpathy’s tinygrad
Slide -
  Automatic Differentiation — Computation Graph PyTorch’s Blog on Computation Graph
Andrej Karpathy’s tinygrad
  -
Week 3 Transformers — Multi-Head Attention, Layer Normalization, Feed Forward Network Sasha Rush’s notebook: The Annotated Transformers
Andrej Karpathy’s minGPT
Slide -
  Llama’s change over original Transformer – RMSNorm & SwiGLU activation & Gated MLP The official Llama 1 report
PyTorch docs on RMSNorm
Noam Shazeer’s report: GLU Variants Improve Transformer
  -
Week 4 Llama’s change over original Transformer — Rotary Positional Embedding (ROPE) EleutherAI’s blog on Rotary Positional Embeddings
The original ROPE paper
Slide -
  Pretraining Data Curation The official Llama 3 report
Dolma Corpus report
Slide -
Week 5 Post-Training Overview TULU report Slide -
  GPU Programming Basics CUDA C++ Programming Guide Slide -
Week 6 Overview of Parallelism — Data Parallelism, Pipeline Parallelism, Tensor Parallelism HuggingFace’s blog on multi-GPU training
GPipe
Slide -
  Zero & Full Sharded Data Parallel — Model & Optimizer State Sharding ZeRO Optimizer
Pytorch Team’s FSDP Tutorial
Slide -
Week 7 Guest Lecture 1 – Byron Hsu from xAI -   -
  Guest Lecture 2 – Guanchu Wang from Rice -   -
Week 8 Spring Break -   -
Week 9 Guest Lecture 3 – Yuke Wang from AWS/Rice -   -
  Grading Policy + Performance Engineering Horace He’s post on making DL go Brrr Slide -
Week 10 Inference Workload Overview + Continuous Batching NVIDIA’s blog on Inference Optimization Slide Quiz on Automated Differentiation (on 3/24)
  Inference Workload Overview + KV Cache - Slide  
Week 11 Review Course -    
  Paged Attention vLLM Repo   Quiz on Positional Embedding (on 4/2)
Week 12 Flash-Attention Flash-Attention Repo Slide  
  Quantization GPTQ
SmoothQuant
AWQ
Slide  
Week 13 Presentation Policy+Mixed of Experts DeepSpeed-MoE
DeepSeek-MoE
Slide Quiz on Parallelism (on 4/14)
  Diffuision DDPM
DDIM
Slide