CDN

CDN

What is CDN?

A Content Delivery Network (CDN) is a system of distributed servers that deliver web content, like images, videos, and scripts, to users based on their geographic location.

Analogy of delivery network Think of how many companies setup their delivery network to reduce time of delivery to its customer. It would make use of a network of warehouses spread across different parts of a country. Now, when a customer orders a product from them, then they are going to ship this product from a warehouse that is closest to the customer's location. This will help them reduce the time their customers will have to wait for their delivery. In order to achieve this these companies will stock up their warehouses with products that are being ordered the most. In a similar way a CDN works by having a number of servers as points of presence in different geographies. These servers will maintain copies of frequently accessed content. So now a request for content can be routed to the server that is closest to the user's location and user receives the cached content quickly versus encountering a large network latency by always going to the main server that is responsible for hosting and generating the content.

So, CDNs are designed to minimize latency by bringing the content closer to users.

Evolution of CDNs

As the internet evolved and became more multimedia-heavy, it became clear that a single server couldn’t keep up with global demand. At this time companies like Akamai pioneered the concept of CDN as as a solution for distributing content across many servers to reduce loading times and handle larger amounts of traffic. Key milestones in the evolution of CDNs:

  1. 1990s: CDNs emerge, primarily to speed up static content like HTML pages and images.
  2. Early 2000s: CDNs begin to handle more complex content like video streaming.
  3. 2010s: CDN providers start optimizing for mobile devices, security (such as DDoS protection), and dynamic content delivery.

Let's see why do we need CDNs?

There are multiple reasons and scenarios where CDNs prove to be useful.

  • Faster content delivery: Primary reason why we needs CDNs is to delivery content to the users quickly. CDNs reduce latency by bringing content closer to users. Instead of a request traveling all the way to a data center far away, it goes to a nearby CDN server.
  • Global reach: Websites with global audiences need CDNs to ensure users around the world experience fast load times, regardless of their location.
  • Traffic management: If all users try to access a single server, it can get overwhelmed. CDNs distribute the traffic, improving reliability.
  • Content redundancy: In case of a server failure, a CDN can reroute users to the next closest server, providing fault tolerance.
  • Security: Modern CDNs offer protection from DDoS (Distributed Denial of Service) attacks, which flood websites with traffic. They can absorb and mitigate these attacks by distributing the traffic load.

Types of content served by a CDN

  • Static Content: Files that don't change frequently, like images, CSS, JavaScript, and videos.
  • Dynamic Content: Although trickier to cache, CDNs can still optimize dynamic content by using techniques like request collapsing and edge computing.
  • Streaming Media: CDNs are widely used for video/audio streaming. Special servers known as media servers optimize the delivery of large streaming files.

What are the core components of CDN architecture?

The architecture of a CDN is designed to optimize the delivery of web content to users by strategically distributing it across a globally distributed network of servers. The goal is to minimize latency, reduce bandwidth consumption, and handle high traffic efficiently.

Core components of CDN architecture includes:

  1. Origin server: This is the central or main server where the original, un-cached version of the content is stored. This is the source of truth for the content. It stores all types of content — HTML, CSS, images, videos, API responses, and more.
  2. Edge servers (PoPs): Edge servers, also known as Points of Presence (PoPs), are strategically located servers in various geographical locations. They cache copies of content from the origin server. Edge servers are responsible for delivering content to users based on their proximity, reducing latency and ensuring faster access. They can store static content like images, videos, and sometimes even parts of dynamic content.
  3. Cache/Proxy servers: These servers sit between the origin server and the end user, acting as intermediaries that store (or "cache") data. They reduce the load on the origin server and deliver cached content more quickly. Cache servers store content for specified periods of time. When a user makes a request, the cache server checks if it has the latest version of the content. If not, it retrieves it from the origin server, caches it, and serves it to the user.
  4. CDN management layer: This layer involves the software and administrative functions that control the behavior of the CDN. It determines how content is cached, what gets delivered, security settings, and load balancing policies. It includes dashboards, APIs, and monitoring systems that help users configure and optimize CDN settings for performance, caching rules, content purging, and analytics.
  5. Routing and load balancing: CDNs employ sophisticated routing and load balancing algorithms to direct user requests to the most appropriate edge server based on criteria like server load, network conditions, and proximity to the user. Techniques like Anycast (routing data to the nearest server) and GeoDNS (DNS routing based on user’s location) are commonly used.
  6. Global network infrastructure: This consists of the physical and virtual networking hardware and software that interconnects CDN servers globally. It ensures redundancy, reliability, and speed. Many large CDN providers maintain their own backbone networks, reducing reliance on public internet infrastructure for improved performance.

How does the request flow through various components of CDN?

To understand how CDN architecture works in practice, let’s outline the flow of a typical request:

  • DNS resolution: When a user requests content from a website (e.g., an image), their request is first resolved by DNS (Domain Name System). The CDN’s DNS determines the closest or best edge server based on the user’s location.
  • Routing to edge server: The DNS response directs the user’s request to the nearest edge server, reducing latency. The user doesn’t interact with the origin server directly as yet.
  • Cache lookup: The edge server checks if it already has a cached copy of the requested content.
  • Cache hit: If the content is found, it is served directly from the edge server, resulting in fast delivery.
  • Cache miss: If the content is not available, the edge server forwards the request to the origin server or another edge server that may have the content.
  • Origin server: If the edge server had a cache miss, then this origin server being the authority on the content is able to generate and respond to the content request.
  • Caching: Content received from the origin server (or another edge server) is cached, and then served to the user. The content remains cached for future requests.
  • Content delivery to user: Once the content is found or retrieved, it is delivered to the user from the edge server.
  • Cache invalidation or content purging: Over time, cached content may become outdated. CDN management systems allow for cache invalidation or purging to ensure users get fresh content when needed.

What is CDN network architecture model?

CDN network architecture models define how content is distributed, managed, and retrieved between the origin server and the CDN edge servers. These models focus on how content is pushed, pulled, and cached within the CDN infrastructure.

Here are the main CDN network architecture models:

  • Push model: In this model, the origin server proactively sends (or "pushes") the content to the CDN edge servers. Content is uploaded and distributed across various edge locations before users request it.
    • How It Works:
      • The website owner manually pushes the content to the CDN’s edge servers.
      • The CDN stores this content in its cache, ready to be served to users when requested.
      • Updates to the content must be pushed manually to the CDN by the website owner.
    • Advantages:
      • You decide what content gets distributed to each edge server.
      • Works well for large media files like software downloads, images, videos, or any static content that doesn’t change often.
      • Since content is already on the edge servers, it can be delivered immediately when users requests.
    • Disadvantages:
      • As content changes, it must be manually pushed to the CDN edge servers. This can be labor-intensive, especially for frequently updated content.
      • Storing large amounts of content at all edge locations can incur higher storage and bandwidth costs.
    • Use Cases:
      • Software companies distributing large updates (like Microsoft or Apple).
      • Media or streaming services that want to push high-quality videos to edge servers before large events.
  • Pull Model: In the pull model, the CDN only retrieves content from the origin server when a user requests it for the first time. Once requested, the content is pulled from the origin server, cached on the edge server, and then served to users from the cache for subsequent requests.
    • How It Works:
      • The user makes a request for content (e.g., an image, video, etc.).
      • The request is routed to the nearest edge server.
      • If the edge server does not have the content (a cache miss), it pulls the content from the origin server, caches it, and then delivers it to the user.
      • Cached content is stored according to the TTL (Time to Live) settings, which define how long the content should be stored before being refreshed.
    • Advantages:
      • Content is automatically pulled to edge servers when requested, reducing the need for manual management.
      • Great for websites or applications with frequently changing content.
      • You only store and cache content that users are requesting, which can save on storage and bandwidth costs.
    • Disadvantages:
      • The first time a piece of content is requested, it may take longer as the edge server needs to fetch it from the origin server.
      • Cache Misses: If content is not frequently requested, edge servers may not have it cached, resulting in additional requests to the origin server.
    • Use Cases:
      • Websites with dynamic content that changes frequently (e.g., blogs, news sites).
      • E-commerce platforms where product listings and images are constantly updated.
  • Hybrid Model: This combines elements of both the push and pull models. It allows for the automatic caching benefits of the pull model while still enabling manual control of pushing content when necessary.
    • How It Works:
      • Some content is pushed to the CDN edge servers ahead of time (like in the push model).
      • Other content is pulled from the origin server on demand (like in the pull model).
    • Advantages:
      • You get the control of the push model for critical, high-demand content and the efficiency of the pull model for less frequently accessed or dynamic content.
      • You can push large, static content (e.g., software updates, videos) while pulling dynamic or less critical content.
      • You can optimize content delivery based on different traffic patterns and content types.
    • Disadvantages:
      • Requires more sophisticated management to decide which content to push and which to pull. May also require manual intervention.
    • Use Cases:
      • Large media platforms or e-commerce sites that need to push heavy content in advance (like videos or promotions) but also handle dynamic content with on-demand caching.
  • Peer-to-Peer (P2P) Model: In this model, the CDN uses the collective bandwidth of user devices to share and distribute content. Instead of relying solely on CDN edge servers, content is also distributed among users’ devices, where each device shares a portion of the data with others.
    • How It Works:
      • A central CDN server distributes content to some users.
      • These users then share content with other users, reducing the load on the central server and speeding up delivery for everyone.
      • Each peer (device) in the network serves as a node, sharing chunks of content with others who request it.
    • Advantages:
      • As the number of users increases, the network’s bandwidth and capacity also grow, making it suitable for large-scale content distribution (like live streaming).
      • Offloads some of the work from the CDN’s edge servers, relying on users to share content among themselves.
    • Disadvantages:
      • The performance of content delivery can depend on the quality of the users’ internet connections, which may vary.
      • Sharing content among users can raise security concerns, especially in situations where sensitive data is involved.
    • Use Cases:
      • Popular for live streaming, especially for large-scale events like concerts, sports, or video game streams, where many users are accessing the same content simultaneously.
  • Multi-CDN Model: This model involves using more than one CDN provider to deliver content. This model is useful for ensuring high availability and reliability, especially for global websites or applications.
    • How It Works:
      • Content is distributed across multiple CDN providers.
      • Requests are routed to the optimal CDN based on factors like availability, performance, cost, or geographic location.
      • In case of outages or failures in one CDN, traffic can be automatically routed to another CDN.
    • Advantages:
      • If one CDN experiences an outage or performance issue, another CDN can pick up the load, ensuring uptime.
      • Multi-CDN setups can help provide optimal performance in regions where one CDN might not have strong coverage.
      • Traffic can be dynamically distributed across multiple CDNs to ensure no single CDN is overwhelmed.
    • Disadvantages:
      • Managing multiple CDNs requires sophisticated infrastructure, monitoring, and routing techniques to ensure smooth operation.
      • Using multiple CDNs can increase operational costs.
    • Use Cases:
      • Large enterprises, global platforms, or streaming services that need high redundancy and performance worldwide.

CDN optimization techniques

CDNs use several techniques to maximize efficiency and performance:

  • Caching
    • Expiration Policies: Cache-control headers, like Time to Live (TTL), are used to determine how long content is stored in the edge servers.
    • Content Invalidation: When content updates, the CDN can remove or "purge" old versions to ensure users get the latest content.
  • Load Balancing
    • Geo-Load Balancing: Traffic is routed based on geographical proximity to the user.
    • Traffic-Based Load Balancing: Traffic is distributed based on server load, ensuring no single server gets overwhelmed.
  • Compression
    • Data Compression: CDNs often compress data (e.g., GZIP) before sending it to the user, reducing the size of the file and speeding up transmission.
  • Edge Computing
    • Edge Logic: Some modern CDNs support edge computing, allowing computations and logic (such as authorization, A/B testing, etc.) to be executed at the edge server instead of the origin server.

What are some of the security features in CDN Architecture?

CDNs are increasingly providing integrated security features to protect websites from cyberattacks:

  • DDoS Protection: CDNs can distribute massive traffic surges across many edge servers, mitigating Distributed Denial of Service (DDoS) attacks.
  • Web Application Firewalls (WAF): These are built into many CDNs to block malicious traffic and attacks like SQL injection or cross-site scripting (XSS).
  • SSL/TLS Encryption: CDNs ensure that data is encrypted in transit, improving security.

When to use a CDN

  • Global Audience: If your website or application has users in multiple countries, a CDN will greatly improve their experience by reducing load times.
  • Large Media Files: If your website hosts large images, videos, or downloadable files, CDNs help by speeding up delivery.
  • E-commerce and Video Streaming: Websites that rely on high availability and fast content delivery, like online stores or streaming services, benefit greatly from CDNs.
  • Mobile Optimization: With many users accessing the web via mobile devices, CDNs help by optimizing content for different devices and network conditions.
  • Handling High Traffic Spikes: If your site experiences sudden traffic spikes (e.g., during a product launch or viral campaign), CDNs distribute the traffic, preventing server overloads.
  • Distribute Software Updates: Companies like Microsoft and Apple use CDNs to push large software updates to millions of users simultaneously.
  • Support Live Streaming: CDNs are essential in delivering live video streams to global audiences with minimal delay.
  • Enhance Website Security: Many CDNs now offer built-in security features, such as SSL encryption, to protect websites from cyber attacks.

When not to use a CDN

  • Small, Local Websites: If your website serves only a small, local audience, a CDN might be unnecessary, as a single server can usually handle traffic efficiently.
  • Highly Dynamic Content: Websites that generate highly personalized content (like constantly updated dashboards or live chats) may not benefit as much from CDNs, as the content isn't cached for long. However, modern CDNs are evolving to handle dynamic content better.
  • Cost Considerations: CDNs come with a cost. If you’re running a small-scale project, the cost of using a CDN might outweigh the benefits.
  • Internal Applications: For applications used only within a company or organization, CDNs may not be necessary since latency is not a concern.
  • Low-Traffic Sites: If a site gets only minimal traffic, there might be little to gain from using a CDN, as the benefits come with high volume and geographical spread.

What are the different CDN deployment models?

  • Public CDN: This is the standard deployment, where multiple clients share the CDN provider’s infrastructure.
  • Private CDN: A more specialized model for organizations that want exclusive access to CDN infrastructure, often for increased security or performance.
  • Hybrid CDN: A combination of public and private CDNs, typically used when organizations need extra control over specific parts of the infrastructure.

Some popular CDN providers

Some of the largest and most well-known CDN providers are:

  • Akamai: One of the earliest CDN companies, specializing in large-scale content distribution.
  • Cloudflare: Known for its easy integration and strong focus on security.
  • Amazon CloudFront: Amazon’s CDN service, integrated with its cloud services.
  • Fastly: A modern CDN provider that emphasizes real-time caching and instant purges of old content.


Try few questions