Content Delivery Networks (CDNs) Explained - Speeding Up the Web, One Edge Server at a Time

July 15, 2024 — #interview #SDE

Hey everyone, and welcome back to the blog! In our hyper-connected world, whether you're Browse from a bustling cafe in Bengaluru or a quiet corner across the globe, you expect websites to load quickly and content (like images and videos) to appear almost instantly. Slow-loading pages are a surefire way to frustrate users and lose engagement. But how do global websites ensure this speedy experience for everyone, regardless of their distance from the website's main server?

A huge part of the magic behind this is the Content Delivery Network (CDN), working in close concert with the Domain Name System (DNS). CDNs are the unsung heroes of web performance, and DNS is the directory service that points users in the right direction – often towards a nearby CDN server. Let's unpack how they work together!

What is a CDN (Content Delivery Network)? Bringing Content Closer to You

A Content Delivery Network (CDN) is a geographically distributed network of proxy servers and their associated data centers.

The Core Problem it Solves: Latency & Origin Server Load. Imagine a website's main server (the "origin server") is located in, say, New York. When a user from India tries to access it, their requests have to travel thousands of miles across multiple network hops. This physical distance translates directly into latency. Furthermore, if all users worldwide hit that single origin server, it can easily get overwhelmed.
The CDN Solution: CDNs tackle this by caching copies of website content (especially static content) in multiple Points of Presence (PoPs) or edge servers located in various strategic geographic locations around the world. When a user requests content, the CDN, with the help of DNS, directs them to the edge server closest to them. This dramatically reduces latency and offloads traffic from the origin server.

How Does a CDN Work? The Journey of a Request with DNS

Let's follow a typical request to see how a CDN and DNS collaborate].

1. User Request & The Initial DNS Lookup

You, our user in Bengaluru, type www.example.com into your browser or try to load an image from a URL like images.mysite.com/logo.png.
Your browser needs the IP address of the server hosting this content, so it initiates a DNS lookup. This typically involves these steps if the information isn't already cached locally:
1. Browser Cache: Checks its own cache first.
2. OS Cache: Checks the operating system's resolver cache (and hosts file).
3. Local Network Router Cache: Checks your router's DNS cache.
4. ISP DNS Resolver: Your request goes to your Internet Service Provider's (ISP) DNS resolver.
5. Recursive Lookup (if needed): If the ISP resolver doesn't have it cached, it performs a recursive lookup:
  - It queries a Root DNS Server.
  - The Root Server directs it to a Top-Level Domain (TLD) DNS Server (e.g., for .com).
  - The TLD Server directs it to the Authoritative Name Server(s) for the specific domain (mysite.com or example.com).

2. DNS Directs to an Optimal CDN Edge Server

This is where the CDN's DNS integration plays a crucial role, especially for assets served via CDN (e.g., images.mysite.com).

CNAME to CDN Provider: Often, the domain for static assets (like images.mysite.com) is configured in its authoritative DNS server with a CNAME (Canonical Name) record. This CNAME record points to a domain managed by the CDN provider (e.g., example.cdnprovider.com or a specific hostname like images.mysite.com.edgesuite.net).
CDN's Authoritative DNS Resolution: When your ISP's recursive resolver (or any resolver in the chain) queries the authoritative name server for images.mysite.com, it gets back this CNAME. It then makes a new DNS query for example.cdnprovider.com. This query now goes to the CDN provider's specialized DNS servers.
Intelligent Routing by CDN DNS: The CDN provider's DNS servers are highly sophisticated. They don't just return a single, static IP address. Instead, they intelligently resolve the request to an optimal edge server IP address within the CDN's global network. This decision is typically based on several factors:
- User's Geographic Location: Usually inferred from the IP address of the DNS resolver making the request (which is often geographically close to the end-user). Many CDNs use GeoDNS or Anycast IP addresses for their DNS resolvers or even for the edge servers themselves. Anycast allows the same IP address to be announced from multiple locations, and routers naturally direct the user's request to the topologically "closest" location advertising that IP.
- Server Load: The current traffic load and capacity of different edge servers.
- Server Health: The availability and health status of the edge servers, determined by continuous monitoring.
- Network Proximity & Latency: Which edge server can currently provide the lowest network latency to the user's region.
Low TTLs: DNS records for CDN-managed hostnames often have very low Time-To-Live (TTL) values. This means resolvers and browsers don't cache these IP addresses for too long, allowing the CDN to dynamically change the optimal edge server IP for users if network conditions or server loads change.
Your browser then receives the IP address of this chosen optimal edge server.

3. Request to the Edge Server

Your browser makes an HTTP(S) request directly to the IP address of the selected CDN edge server.

4. Cache Hit Scenario (Content is at the Edge!)

If the edge server already has the requested content (e.g., logo.png) in its local cache (perhaps because another user in your region recently requested it, or the content was proactively pushed or pre-warmed):
- This is a cache hit!
- The edge server serves the content directly to your browser.
- This is incredibly fast because the server is geographically much closer to you than the origin server, and the content is readily available.

5. Cache Miss Scenario (Fetching from Origin)

If the edge server does not have the requested content in its cache:
- This is a cache miss.
- The edge server then makes a request for the file from the origin server (where the website's original content is actually hosted). Sometimes, for very large CDNs, there might be intermediate regional or "shield" caches between the edge server and the origin server to further optimize this fetch and protect the origin.
- The origin server responds by sending the file to the edge server.
- The edge server caches this file locally (according to its configured caching rules and the content's Time-To-Live or TTL settings, often dictated by HTTP headers like Cache-Control from the origin).
- The edge server then serves the file to your browser.
Now, subsequent users in your geographic region requesting the same content will likely experience a cache hit from this edge server.

What Content is Best for CDNs? Static vs. Dynamic

Ideal for Static Content: CDNs are most famously effective for caching and delivering static assets – content that doesn't change frequently for different users:
- Images (JPEG, PNG, GIF, WebP)
- Videos (MP4, WebM)
- CSS stylesheets
- JavaScript files
- Web fonts
- Software downloads, PDF documents
Dynamic Content Delivery: Delivering dynamic content (which is personalized for each user or changes very frequently) via a CDN is more complex but not impossible. Modern CDNs offer features for:
- Edge Computing: Running some application logic (e.g., using serverless functions like Lambda@Edge or Cloudflare Workers) directly on the edge servers to personalize content or make decisions closer to the user.
- Caching Parts of Dynamic Pages: Caching common fragments of dynamically generated pages that are shared among many users.
- Dynamic Site Acceleration (DSA): Optimizing the network path and connection between the edge server and the origin server for dynamic requests, rather than caching the dynamic content itself at the edge.

CDN Strategies: Push vs. Pull

There are two primary models for how content gets onto CDN edge servers:

Pull CDN (On-Demand Caching)

This is the more common and often simpler model.
Edge servers "pull" content from the origin server only when a user in that region requests it for the first time (resulting in a cache miss).
Pros:
- Simpler to set up and manage.
- Less storage is used on the CDN edge servers initially, as only popular (requested) content gets cached.
Cons:
- The very first user in a region to request a piece of content will experience higher latency because the edge server needs to fetch it from the origin. This is sometimes called a "cold start" for that content at that edge location.

Push CDN (Pre-emptive Caching)

In this model, the website owner or origin server proactively "pushes" content to various CDN edge servers before it's actually requested by users.
This is often done for new content releases that are expected to be in high demand or for critical site assets.
Pros:
- Lower latency for the first user requesting the content, as it's already available at the edge.
Cons:
- More complex to manage, as the origin system needs to decide what content to push, where to push it, and when.
- Uses more storage on CDN edge servers because content is pre-loaded, regardless of whether it's immediately requested or not. Can lead to higher CDN costs if pushing content that doesn't get much traffic.

Benefits of Using a CDN Summarized

Leveraging a CDN brings a host of advantages:

Improved Website Load Times (Reduced Latency): Content is served from geographically closer edge servers, leading to a faster experience for users.
Reduced Bandwidth Costs: Offloads significant traffic from your origin servers, thereby reducing the bandwidth consumed at the origin and potentially lowering hosting costs.
Increased Availability and Redundancy: Content is distributed across many servers. If one edge server fails, requests can often be routed to another nearby edge server. The origin server is also shielded from some direct traffic, making it more resilient.
Improved Scalability: CDNs help websites and applications handle traffic spikes (like during a marketing campaign or a viral event) much more effectively by absorbing a large portion of the load.
Enhanced Security: Many CDNs offer additional security features, such as Distributed Denial of Service (DDoS) protection and Web Application Firewall (WAF) capabilities at the edge.

Key Takeaways

Content Delivery Networks (CDNs) are geographically distributed networks of servers that cache website content closer to end-users, drastically improving performance.
DNS plays a critical role in CDN operation by intelligently routing user requests to the most optimal (nearest, least loaded, healthiest) CDN edge server, often using CNAME records and sophisticated GeoDNS or Anycast techniques.
Low DNS TTLs for CDN resources allow for dynamic and responsive traffic management.
CDNs primarily work on a cache hit/miss model: if content is at the edge (hit), it's served fast; if not (miss), the edge fetches from the origin, caches it, and then serves it.
They are most effective for static content but also have evolving strategies for dynamic content acceleration.
Common models include Pull CDNs (on-demand) and Push CDNs (pre-emptive).
Key benefits include significantly reduced latency, lower origin server load, increased availability, and better scalability.

CDNs, in tight concert with intelligent DNS routing, are essential infrastructure for the modern web, critical for delivering the fast, reliable, and global user experiences that we've all come to expect.