CS 5470
Last Updated
- Schedule of Classes - June 11, 2025 2:48PM EDT
Classes
CS 5470
Course Description
Course information provided by the 2025-2026 Catalog.
Systems for Large-scale ML is a new advanced course at Cornell University designed to equip students with hands-on expertise in designing and operating scalable machine learning systems. With the rise in popularity of large ML models like GPT, LLaMA, and DeepSeek, tackling systems-level challenges of distributing training and inference workloads on multi-accelerator hardware while ensuring fault tolerance has become a crucial skill for both graduate and undergraduate students in computer science. The course will teach students to tackle systems challenges in both training and inferring from large-scale ML models. We will combine theory and hands-on teaching through lectures, assignments, and projects.
Last 3 terms offered (None)
Learning Outcomes REF-FA25
- Distribute the training and inference of large ML models across multiple GPUs.
- Build efficient strategies for sharding ML models.
- Debug communication overheads of distributed ML.
- Develop fault tolerant and elastic ML pipelines.
Regular Academic Session.
-
Credits and Grading Basis
3 Credits GradeNoAud(Letter grades only (no audit))
-
Class Number & Section Details
-
Meeting Pattern
- MW
- Aug 25 - Dec 8, 2025
Instructors
Singh, R
-
Additional Information
Instruction Mode: In Person
For Bowers Computer and Information Science (CIS) Course Enrollment Help, please see: https://tdx.cornell.edu/TDClient/193/Portal/Home/