Cache Keys and Vary Headers: CDN Correctness Fundamentals

Balance scale showing performance versus correctness trade-off, with cache misconfiguration tipping the scale dangerously toward performance

A CDN misconfiguration once cost a client three days of incident response. The setup was straightforward: an API endpoint returning user dashboard data, fronted by CloudFront. Someone had enabled caching without realizing the endpoint returned personalized content. User A’s dashboard—complete with their name, email, and recent transactions—got cached and served to User B, then User C, then a few thousand more users before anyone noticed.

The cache hit ratio looked fantastic. 94% of requests served from edge. Response times dropped from 200ms to 15ms. Everyone was thrilled until support tickets started arriving.

This is the fundamental tension in edge caching. Aggressive caching dramatically improves performance and reduces origin load, but incorrect configuration serves wrong content to users—sometimes catastrophically. Edge caching is a data integrity problem first, performance problem second.

The good news: two concepts cause the vast majority of CDN cache bugs, and once you understand them, you can cache aggressively without fear. Those concepts are cache keys and Vary headers.

Cache Keys Determine Correctness

A cache key is the identifier the CDN uses to store and retrieve cached responses. When a request arrives, the edge generates a cache key from the request attributes and looks for a matching entry. If found, it returns the cached response. If not, it fetches from origin and stores the response under that key.

By default, most CDNs use a simplified version of the URL:

Default cache key:
  SCHEME + HOST + PATH + QUERY_STRING

Example request:
  https://example.com/api/products?category=shoes

Cache key:
  "https://example.com/api/products?category=shoes"

This seems straightforward, but the details matter enormously. Two requests that should return identical content but generate different cache keys create duplicate cache entries—wasting storage and reducing hit ratios. Two requests that should return different content but generate the same cache key serve wrong content to users—exactly the bug we opened with.

The rule is simple: every attribute that affects your response must be in the cache key. Miss one, and users get wrong content. Include unnecessary attributes, and your hit ratio suffers.

The Query String Problem

Query parameters are the most common source of cache key problems. Consider these two URLs:

Problem: Query parameter order

These are the same request but different cache keys:
  /products?color=red&size=large
  /products?size=large&color=red

Result: Two cache entries for identical content

Different parameter order, different cache keys, same content—you’re now storing two copies and reducing your hit ratio. Marketing teams compound this by adding tracking parameters to URLs. Every utm_source, fbclid, and gclid creates a unique cache key for content that’s identical regardless of how the user arrived.

The solution is query string normalization at the edge:

// CloudFront Function: normalize query string
function handler(event: any): any {
  const request = event.request;
  const params = request.querystring || {};
  const trackingParams = ['utm_source', 'utm_medium', 'utm_campaign', 'fbclid', 'gclid'];

  // Remove tracking parameters
  trackingParams.forEach((param) => delete params[param]);

  // Sort remaining parameters alphabetically
  const sortedKeys = Object.keys(params).sort();
  const normalized: Record<string, any> = {};
  sortedKeys.forEach((key) => (normalized[key] = params[key]));

  request.querystring = normalized;
  return request;
}

Most CDNs support this natively. CloudFront lets you specify which query parameters to include in the cache key. Cloudflare and Fastly support query string sorting and parameter stripping. Enable these features—they’re high-impact, low-effort wins.

Warning callout:

Query parameter pollution is one of the most common causes of poor cache hit ratios. Analytics and tracking parameters create millions of unique cache keys for identical content. Audit your cache key cardinality regularly.

The Vary Header Trap

The Vary header tells caches “this response differs based on these request headers.” It’s the origin’s way of saying “I returned different content to different clients based on header X, so you need to store separate cached versions.”

When a CDN sees Vary: Accept-Language, it doesn’t just store one cached response per URL—it stores one per URL per unique Accept-Language value. The cache key effectively becomes URL + the values of all headers listed in Vary.

HTTP/1.1 200 OK
Content-Type: text/html
Cache-Control: public, max-age=3600
Vary: Accept-Encoding, Accept-Language

<!DOCTYPE html>...

With two headers in Vary, you get a combinatorial explosion of cache variants. Each unique combination of Accept-Encoding and Accept-Language creates a separate cache entry. This is powerful—it lets you cache content that legitimately varies by request attributes. But it’s also dangerous, because Vary headers with high-cardinality values destroy your cache effectiveness entirely.

The guideline: only vary on headers with a small, known set of values.

Vary OnGood Idea?Reason
Accept-EncodingYesLimited values (gzip, br, identity)
Accept-LanguageMaybeCan explode if not normalized
User-AgentNoThousands of unique values
CookieNoUnique per user, kills caching
AuthorizationNoUnique per user, kills caching
Vary header guidance for common headers.

Accept-Encoding is safe because there are only a few compression algorithms. Accept-Language is risky because browsers send values like en-US, en;q=0.9, de;q=0.8—technically unique per user’s language preferences. If you vary on it, normalize the header first to extract just the primary language.

newsletter.subscribe

$ Stay Updated

> One deep dive per month on infrastructure topics, plus quick wins you can ship the same day.

$

You'll receive a confirmation email. Click the link to complete your subscription.

The most common Vary mistake is Vary: Cookie. It seems logical—if your response depends on cookies, tell the cache to vary on them. But cookies include session IDs, authentication tokens, and tracking identifiers that are unique per user. Even when the actual content-affecting cookie (preferences=dark) is the same, the unique session IDs create separate cache entries.

Your cache becomes per-user, which defeats the entire purpose of edge caching.

The solution is to handle cookies at the edge before they reach your origin. Strip session cookies, extract only the cookies that actually affect content (like country or currency preference), and normalize those into a consistent format:

// CloudFront Function: normalize cookies for cache key
function handler(event: any): any {
  const request = event.request;
  const cookieHeader = request.cookies || {};

  // Extract only cache-relevant cookies
  const cacheRelevant: Record<string, string> = {};
  const relevantKeys = ['country', 'currency', 'language'];

  relevantKeys.forEach((key) => {
    if (cookieHeader[key]) {
      cacheRelevant[key] = cookieHeader[key].value;
    }
  });

  // Replace cookies with normalized subset
  // Session IDs, auth tokens, tracking cookies are stripped
  request.cookies = Object.fromEntries(
    Object.entries(cacheRelevant).map(([k, v]) => [k, { value: v }])
  );

  return request;
}

Users with the same country and currency preferences now share cache entries, regardless of their session IDs.

Danger callout:

Vary: * means “this response is unique to every request”—it effectively disables caching. Never use it unless you truly intend to make the response uncacheable. Some frameworks add this by default for dynamic pages; check your response headers.

Bugs That Break Production

Understanding cache keys and Vary headers lets you avoid the two most dangerous cache bugs: serving personalized content to the wrong user, and cache poisoning.

Caching Personalized Content

This is the bug from our opening story—serving one user’s content to another. It happens when:

  • Your origin returns Cache-Control: public for personalized responses
  • You vary on session cookie but session IDs still contaminate the cache key
  • You cache without checking for authentication headers

The patterns that keep you safe:

PatternRisk LevelWhy
Edge-side includes for personalizationSafeCache page shell, personalize at edge
Client-side personalizationSafeCache generic content, personalize in browser
Separate cacheable/non-cacheable endpointsSafeClear separation of concerns
Origin returns Cache-Control: public for personalized APIDangerousCDN caches response, serves to wrong user
Vary on session cookieDangerousStill risks cross-contamination
Cache without checking auth headerDangerousUnauthenticated cache serves to authenticated users
Personalized content caching patterns.

The safest rule: always return Cache-Control: private, no-store for any response that contains user-specific data. If your application framework makes this difficult, add middleware that detects personalization indicators (Set-Cookie headers, user IDs in response body) and overrides the cache headers to prevent caching.

Cache Poisoning

Cache poisoning exploits the gap between what’s in the cache key and what affects the response. If your origin reflects a header value in the response but that header isn’t part of the cache key, an attacker can inject malicious content that gets cached and served to everyone.

The classic vector is X-Forwarded-Host. Many applications trust this header to generate absolute URLs, and many CDNs don’t include it in the cache key by default. An attacker sends a request with X-Forwarded-Host: evil.com, your origin generates links pointing to evil.com, the CDN caches that response, and every subsequent user gets poisoned content.

The fix: normalize or strip dangerous headers at the edge before they reach your origin:

// CloudFront Function: prevent cache poisoning
function handler(event: any): any {
  const request = event.request;
  const headers = request.headers;

  // Normalize forwarded headers to trusted values
  headers['x-forwarded-host'] = request.headers.host;
  headers['x-forwarded-proto'] = { value: 'https' };

  // Strip headers that should never reach origin
  delete headers['x-original-url'];
  delete headers['x-rewrite-url'];
  delete headers['x-custom-ip-authorization'];

  return request;
}

With these headers normalized or stripped, your origin never sees the attack vectors that could poison your cache.

Free PDF Guide

Download the Edge Caching Guide

Get the complete CDN caching framework for safe cache keys, Vary controls, and high-hit-ratio correctness.

What you'll get:

  • Cache key design checklist
  • Vary header normalization patterns
  • Personalization safety guardrails
  • Poisoning prevention hardening guide
PDF download

Free resource

Instant access

No credit card required.

Getting It Right

Both of these bugs—and most CDN issues you’ll encounter—stem from the same root cause: a mismatch between what’s in the cache key and what actually affects the response.

A properly-configured cache with 70% hit ratio beats a misconfigured cache with 95% hit ratio. The latter is serving wrong content to 95% of users—or worse, leaking private data between users.

Start with serving correct content: audit your cache keys, normalize your Vary headers, and verify that personalized content can never be cached. Once you’ve established that foundation, you can optimize for performance with confidence.

Share this article

Found this helpful? Share it with others who might benefit.

Share this article

Enjoyed the read? Share it with your network.

Other things I've written