After completing this class, the students will be able to design their own distributed systems to solve real-world problems. The ability to design one's own distributed system includes an ability to argue for one's design choices. This primary objective is supported by a few others:
- The students will be able to evaluate and critique existing systems, as well as their own system designs. As part of that, students will learn to recognize design choices made in existing systems.
- The students will be able to apply the technical material taught in lecture to new system components. This implies an ability to recognize and describe:
- How common design patterns in computer system—such as abstraction and modularity—are used to limit complexity.
- How operating systems use virtualization and abstraction to enforce modularity.
- How the Internet is designed to deal with scale, a diversity of applications, and competing economic interests.
- How reliable, usable distributed systems are able to be built on top of an unreliable network.
Before taking this course, students should have:
- For software developers with intermediate skills in at least one programming language and a basic understanding of IP networking and HTTP protocol.
- Familiarity with cloud-based version control systems such as GitHub would be helpful.
- No prior knowledge of AI is necessary.
- Good foundational mathematics or logic skills
- Basic Linux skills, including familiarity with command-line options such as ls, cd, cp, and su
3 Days/Lecture & Lab
This course is geared for those who wants to build and implement serverless AI applications, without bogging you down with a lot of theory.
- How systems fail
- How to express your goals: SLIs, SLOs, and SLAs
- How to get agreement -- consensus
- How Counterstrike Works (a.k.a. Time in ::Distributed Systems)
- Blockchain Consensus
- Distributed System Design Example (Unique ID)
- The CAP Theorem
- Lab Project: Build a Blockchain
- Distributed storage systems
- How to combine unreliable components to make a more reliable system
- How nodes communicate – RPCs
- How nodes find each other – naming
- How to persist data -- distributed storage
- How to secure your system
- How to operate your distributed system -- the art of SRE