MIT 6.033 CSE Networking
MIT 6.033 (Computer System Engineering) covers 4 parts: Operating Systems, Networking, Distributed Systems, and Security.
This is the course note for Part II: Networking. And in this section, we mainly focus on: how the Internet is designed to scale and its various applications.
Network Topology
A network is a graph of many nodes: endpoints and switches. Endpoints are physical devices that connect to and exchange information with network. Switches deal with many incoming and outgoing connections on links, and help forward data to destinations that are far away.
On the network, we have to solve various difficult problems, such as addressing, routing, and transport. For each node, it has a name and thus is addressable by the routing protocol. And between any two reachable nodes, they exchange packets, each of which is some data with a header (information for packet delivery, especially the source and destination). Switches have queues in case more packets arrive than it can handle. If the queue is full when a new packet arrives, the packet is to be dropped.
To mitigate complexity, A layered model called TCP/IP Model was presented, with 4 layers:
- Application Layer: acutal traffic generation
- Transport Layer: sharing the network, efficiency, reliability
- Network Layer: naming, addressing, routing
- Link Layer: communicates between two directly-connected nodes.
Not every node in the network has the whole four layers. Some nodes in the network, such as our laptops, have full 4 layers; while others like routers, only have Link Layer and Network Layer.
Routing
Firstly, we need to distinguish two concepts: path and route.
- Path: the full path the packets will travel
- Route: only the first hop of that path
So, routing means that, in the Network Layer, for every node, its routing table should contain a minimum-cost route to every other reachable node after running routing protocol.
- Differentiate between route and path:
- Once a routing table is set up, when a switch gets a packet, it can check the packet header for the destination address, and add the packet to the queue for that outgoing link.
Routing protocols can be divided into two categories: distributed routing protocols and the centralized routing protocols. And distributed routing protocols scale better than the centralized ones. There are two types of distributed routing protocols for an IP network:
- Link-State (LS) Routing, like OSPF, forwards link costs to neighbors via advertisement, and uses Dijkstra algorithm to calculate the full shortest path. (Fast convergence, but high overhead due to flooding. Good for middle-sized network, but not scale up to the Internet)
- Distance-Vector (DV) Routing, like RIP, it only advertises to the nodes that each node knows about. (Low overhead, but convergence time is proportional to longest path. Good for small networks, but not scale up to the Internet.)
Scale and Policy
In this section, we talk about a routing protocol that can scale up to the Internet with policy routing: Border Gateway Protocol (BGP) .
First thing we need to do is scale. The whole Internet is divided into several autonomous systems (AS), e.g., a university, an ISP, etc. To route across the Internet, the scalable routing is introduced, with 3 types:
- hierarchy of routing: first between ASes, then within AS.
- path-vector routing: like BGP, advertise the path to better detect loop.
- topological addressing: CIDR, to make advertisement smaller.
Next thing we need to do is policy. We use export policies and import policies to reflect two common autonomous-system relationships:
- Transit: customer pays provider
- Peer: two ASes agree to share routing tables at no cost.
The export policies decide which routes to advertise, and to whom:
- A provider wants its customers to send and receive as much traffic through the provider as possible
- Peers only tell each other about their customers (A peer does not tell each other about its own providers; because it will lose money providing that transit)
Note: there is a path from AS7 to AS1, but this policy just does not present it to us. To fix this issue in the real world, we make all top-tier(tier-1) ISPs peer, to provide global connectivity:
The import policies decide which route to use. If the AS hears about multiple routes to a destination, it will prefer to use: first its customers, then peers, then providers.
And finally, let's talk about BGP. BGP works at the Application Layer, and it runs on top of a reliable transport protocol called TCP (Transport Layer). BGP doesn’t have to do periodic advertisements to handle failure, instead, it pushs advertisements to neighbors when routes change.
Failures: Routes can be explicitly withdrawn in BGP when they fail. Routing loops avoided because BGP is path-vector.
Does the BGP scale? Yes, but the following 4 factors will cause scaling issues: the size of routing table, route instability, multihoming, iBGP(internal BGP).
Is BGP secure? No, BGP basically relies on the honor system. And also, BGP relies on human, meaning network outages may happen due to human errors.
Reliable Transport
In this section, we talk about how to do reliable transport while keeping things efficient and fair.
First, the reliable transport protocol is a protocol that delivers each byte of data exactly once, in-order, to the receiving application. And we use the sliding-window protocol to guarantee reliability.
- Sender uses sequence numbers to order and send the packets. There are main two steps on how it works.
- Receiver replies acknowledgment(ACK) to sender if a packet is received successfully. Otherwise, a timeout is to be detected, the sender then retransmits the corresponding packet.
Now that a packet will be delivered reliably, next we need to do congestion control.
Our goal for network is efficiency and fairness. Considering both A and B are sending data to R1, and R1 is forwarding to R2, so the bottleneck link is the link between R1 and R2. When the bottleneck link is "full", we call the network is fully utilized (efficient). When A and B are sending at the same rate, we call the network is fair.
The red line(A + B = bandwidth
) is the efficiency line, and the blue line(A = B
) is the fairness line. Initially, the dot is below the red line, meaning network is underutilized. And eventually, A and B will come to oscillate around the fixed point, shown as purple point, which means the network is both efficient and fair.
We use slow-start, AIMD (Additive Increase Multiplicative Decrease), and fast retransmit/fast recovery algorithms to dynamically adjust the window size to deal with congestion. At the start of the connection, slow-start algorithm will double the windows size on every RTT. Upon reaching the threshold, the AIDM algorithm will increase the congestion window (cwnd
) by one segment per RTT, and decrease cwnd
by half upon detecting timeout. However, if a single packet is lost, fast retransmit/fast recovery algorithm will send three duplicate ACKs to the receiver before RTO expires.
In-network Resource Management
In this section, we talk about how to react to congestion before it happens.
Queues are transient (not persistent) buffers and are used to absorb packet bursts. If the queues were to be full, the network delay would have been very long. So, TCP senders need to drop packets before the queues are full.
Application Layer
In this section, we talk about how to deliver content on the Internet.
There are three models on how we sharing a file (deliver content) on the Internet: Client-Server, CDN(Content Distribution Network), and P2P(Peer to Peer).
- Client-Server: if client request a file, the server will just respond with the file content. (simple, but, single-node failure and not scalable)
- CDN: to prevent single-node failure, we add more servers that are linked with persistent TCP, and thus every time a client requests, the DNS dynamically choose the nearest CDN server to respond. (requires coordination among the edge servers)
- P2P: to improve scalability, a client will discover peers and exchange blocks of data. (scalability is limited by end-users' upload constraints)