Get Started

Cache-Control Headers That Actually Speed Up Your Crawl Budget

Cache-Control Headers That Actually Speed Up Your Crawl Budget

Set Cache-Control: no-cache on frequently updated pages to force revalidation, search bots will check for fresh content without refetching unchanged assets. Add max-age=31536000, immutable to static resources like CSS and JavaScript so crawlers skip revisiting files that never change, freeing crawl budget for pages that matter. Use private directives on user-specific or session-based URLs to signal they shouldn’t be cached by proxies or crawlers, preventing bots from wasting crawl budget on personalized variants. Audit your server’s default headers with Chrome DevTools or curl, misconfigurations like missing directives or conflicting Expires headers dilute cache signals and confuse both browsers and bots.

What Cache-Control Does for Crawlers

Cache-Control headers tell crawlers two essential things: whether your content has changed, and how long they can trust their stored copy before checking again. When a bot visits your page, it reads directives like max-age=3600 (cache for one hour) or no-cache (always revalidate before serving). In most cases, this prevents bots from wasting crawl budget re-fetching identical content or, conversely, serving stale versions of pages that update frequently. Three weeks of crawl-budget waste. All from one missing directive.

Quick vocabulary

max-age
Seconds a resource is considered fresh before revalidation, the headline freshness lever for both browsers and bots.
s-maxage
Like max-age but only for shared caches (CDNs, proxies). Overrides max-age at the edge while leaving browser caches alone.
no-cache
Cache the response, but revalidate with origin before serving. Not the same as “don’t cache”.
no-store
Don’t store the response anywhere. The actual “don’t cache” directive, use sparingly.
must-revalidate
When the cached copy is stale, do not serve it without checking origin first, even on network failure.
immutable
Promises the resource will never change for the life of its max-age. Pairs with versioned filenames to eliminate revalidation entirely.

The key difference from browser caching: browsers prioritize user experience and speed, while crawlers prioritize freshness and discovery. A browser might aggressively cache to load pages faster on repeat visits. Search bots, however, use Cache-Control to decide when to return, too-short expiration means they crawl unnecessarily often, consuming server resources; too-long means they miss updates that could affect rankings. Actually, scratch that, too-long means they miss the updates that matter, image swaps and minor copy edits they can defer all day.

The header becomes a scheduling instruction, “check back in X seconds” instead of “guess when I’ve changed.”

Practical impact, setting max-age=86400 on stable pages (product specs, evergreen guides) signals bots to skip re-crawling for 24 hours, freeing crawl budget for new or frequently updated content. Conversely, no-cache or short max-age values on news articles or inventory pages ensure bots catch changes quickly. The full directive list and behavior is documented in MDN’s HTTP Cache-Control reference.

Overhead view of server rack with ethernet cables and LED indicators in data center
Server infrastructure efficiently manages HTTP headers and caching directives that control how search crawlers interact with your content.

Directives That Matter Most for Crawl Efficiency

The directive zoo is small but the combinations matter. Here’s how the six headline directives stack up on what they do, when they help, and what they cost in crawl efficiency. Honestly, most teams I’ve worked with use two of these directives and ignore the other four.

Directive What it does Best for Crawl-budget effect
max-age=N Freshness window in seconds for any cache Versioned static assets, evergreen pages Saves budget when long, wastes it when short
s-maxage=N Freshness window for shared caches only CDN-edge control without touching browser TTL Indirect, shifts pressure between edge and origin
no-cache Cache OK, but revalidate every time Frequently updated pages that still deserve crawling Neutral, with proper ETags revalidations are cheap 304s
no-store No caching anywhere, full refetch every time Auth-gated, personalized, or sensitive endpoints Burns budget, only use where indexing isn’t wanted
must-revalidate Forbids serving stale even on network failure Pricing, inventory, product pages Neutral, prevents stale-cache liabilities for bots
immutable Skip revalidation entirely for the cache window Versioned static files (app.7f3a.js) Strongest budget win, eliminates 304 round-trips
Six directives that shape how search bots schedule revisits. The combinations, not any single directive, decide whether bots spend budget productively.

max-age and s-maxage

The max-age directive tells browsers and crawlers how many seconds a resource stays fresh before revalidation. Set max-age=2592000 (30 days) on stable assets like logos or archived blog posts, and bots will skip them during subsequent crawls, preserving budget for dynamic pages. The s-maxage variant applies only to shared caches (CDNs, proxy servers), overriding max-age for intermediate layers while letting user browsers follow their own rules (your CDN may override the origin header silently, so always verify at the edge). Use s-maxage when you want tighter control over edge caching without affecting end-user behavior. For search bots, max-age is the primary signal, longer windows on unchanging content mean fewer wasteful requests and more crawl capacity allocated to your frequently updated pages.

Pro tip

Pair max-age=31536000 with immutable only when your build pipeline produces fingerprinted filenames like app.7f3a9b.css. Without versioning, immutable on a year-long max-age is how you end up shipping a CSS bug that takes 12 months to clear from caches. I’ve watched that one play out on a client site, well, two client sites.

no-cache vs. no-store

Look, the naming here is a trap. no-cache tells browsers and crawlers to revalidate with the origin server before serving cached content, the resource stays cached but requires a freshness check each time. no-store prohibits caching entirely, forcing fresh downloads on every request. For SEO, no-cache preserves crawl efficiency while ensuring Googlebot sees current content when it matters (use for pages that change frequently but still deserve crawling). no-store blocks all caching, burning crawl budget on repetitive fetches, reserve it for truly sensitive content like user dashboards or checkout pages that search engines shouldn’t index anyway. For most teams, public-facing pages benefit from neither directive; static assets and stable pages should use max-age instead to reduce server load and speed up bot crawls.

must-revalidate and proxy-revalidate

These two directives enforce strict revalidation behavior when cached content becomes stale. must-revalidate tells browsers and proxies they cannot serve stale content without checking the origin server first, even if the user might accept it. proxy-revalidate applies the same rule but only to shared caches (CDNs, corporate proxies), leaving private browser caches unaffected. Both override user agent defaults that might serve expired content during network failures.

Watch for

Bots don’t tolerate stale prices the way users do. A cached $49 serving up next to a current $59 is the kind of mismatch that flags in shopping-result QA. For commerce pages, must-revalidate is cheap insurance.

For crawlers, must-revalidate ensures bots see current content on subsequent visits rather than outdated cache entries, helping search engines index fresh data. Use must-revalidate when accuracy matters more than availability, critical for product pages, pricing, or real-time content (I’ve seen this matter most on a 200K-URL e-commerce site where pricing drift cost the team a week of shopping-result re-indexing). proxy-revalidate suits scenarios where individual users can tolerate slight staleness but intermediaries should stay current. These directives reduce crawl waste by preventing bots from encountering stale cached responses at CDN layers.

Setting Cache-Control by Content Type

Tailoring Cache-Control to each content type sharpens both user experience and bot efficiency. Static assets, CSS, JavaScript, images, fonts, rarely change and should carry long max-age values (31536000 for one year) paired with immutable when versioning is in place, allowing bots to skip recrawling these resources and focus bandwidth on indexable pages.

Dynamic pages benefit from shorter windows: homepage and category pages might use max-age=3600 (one hour) or no-cache with validation via ETag, signaling freshness without forcing full redownloads, which keeps crawlers returning at sensible intervals. API endpoints serving personalized or sensitive data warrant no-store to prevent any caching, though this matters less for SEO since bots typically ignore JSON responses.

Paginated archives and faceted filters present a challenge, without proper crawl controls, bots waste budget on duplicate or thin slices; apply short max-age or must-revalidate and combine with canonical tags or noindex directives to steer crawlers toward primary content.

The pattern is consistent. Predictable resources get long cache lives, frequently updated pages get short validation cycles, and non-indexable or sensitive endpoints get strict no-cache or no-store rules. I’d argue that matching directives to change frequency and indexability is the single highest-leverage move available to most technical SEOs, it directly shapes how efficiently search engines allocate crawl budget across your site.

How to Audit Your Current Headers

Start with browser DevTools, open Network tab, reload your page, click any asset, and scan the Response Headers section for Cache-Control. Look for the actual directives served, no guessing required.

For bulk checks, run curl -I https://yoursite.com/page in terminal to see headers instantly. Pipe multiple URLs through a script to spot inconsistencies across your site.

SEO crawlers like Screaming Frog or Sitebulb surface Cache-Control values at scale, flagging pages that send no-cache on static assets or max-age=0 on evergreen content, both waste crawl budget by forcing bots to refetch unchanged resources.

Common red flags, missing Cache-Control entirely (defaults to heuristic caching), conflicting Expires and Cache-Control values, or public directives on authenticated pages. Fix these first to reclaim wasted bot visits and speed up indexing of fresh content. And start with the static-asset offenders, that’s where the bleed is loudest.



Deep dive
Edge cases that bite: Vary, ETag, and the 304 trap

The Cache-Control header doesn’t operate alone. Three companions decide whether your directives actually behave the way you expect:

  1. Vary response header. If you serve different content per user agent or per Accept-Encoding, your Vary header has to list those keys. Miss it, and Googlebot can get served a desktop-cached copy of your mobile page (or vice versa) from a CDN edge, defeating the freshness contract entirely.
  2. ETag validators. With no-cache, every revisit is a revalidation, but a revalidation against a strong ETag returns a 304 with zero body. Cheap. Without an ETag, the bot re-downloads the full payload every time, and no-cache becomes effectively no-store for your crawl budget.
  3. The Expires conflict. Older stacks (and some CDN defaults) still emit an Expires header alongside Cache-Control. Per RFC 7234, Cache-Control wins, but only if both are well-formed. A malformed Expires can confuse intermediate proxies, especially on shared hosting where you don’t control the full middlebox chain.
  4. The 200 OK vs 304 Not Modified ratio. In Google Search Console’s Crawl stats report, a healthy site shows a high proportion of 304s on static assets. If you’re seeing 200s where 304s should live, your max-age is too short or your ETag/Last-Modified validators aren’t being honoured downstream.

The audit pattern: pull a sample of crawl-stats responses, group by content type, and check the 200/304 split. Static assets bleeding 200s are the first thing to fix, in most cases this single change reclaims more crawl budget than any directive tweak on the dynamic pages.

Developer hands typing server configuration code on laptop keyboard
Implementing Cache-Control headers requires precise server configuration to optimize how search bots consume your crawl budget.

Implementation: Server-Level Setup

Apache users can add Cache-Control headers via .htaccess or main config. For static assets that change rarely, use:

<FilesMatch "\.(jpg|png|css|js)$">
  Header set Cache-Control "public, max-age=31536000, immutable"
</FilesMatch>

Nginx requires editing server blocks. This snippet caches images and stylesheets for one year:

location ~* \.(jpg|png|css|js)$ {
  add_header Cache-Control "public, max-age=31536000, immutable";
}

For CDN-level control, most providers offer UI toggles or API endpoints. Cloudflare’s Page Rules let you override origin headers per URL pattern. Set browser TTL and edge TTL separately to balance freshness with crawl efficiency. Fastly and CloudFront offer similar rule-based config.

Note

Ship cache changes to a staging path first. A site-wide max-age=31536000 applied to HTML by accident, instead of just static assets, is the kind of mistake that takes a long week to walk back. Test the FilesMatch or location regex against a handful of real URLs before pushing to production.

Test with curl to confirm headers reach browsers and bots:

curl -I https://yoursite.com/style.css

Look for the Cache-Control line in the response. Adjust max-age values based on update frequency, shorter for dynamic pages, longer for versioned assets.

Try it this week

Audit five URL types. Fix the one bleeding crawl budget hardest.

  1. 1
    Run curl -I against one URL each from: homepage, a category page, a blog post, a CSS file, and a hero image. Capture the Cache-Control line for each.
  2. 2
    Flag any static asset returning max-age=0, no-cache, or no Cache-Control header at all. That’s where bots are burning the most budget.
  3. 3
    Pick the single worst offender, ship a fix to it only (start small), and verify in GSC’s Crawl stats report next week whether the 304 ratio improves.

One header change won’t move rankings overnight. A pattern of disciplined header hygiene, applied across every release, is what keeps crawl budget pointed at the pages you actually want indexed.

Smart Cache-Control configuration cuts wasted crawl budget by steering bots away from stale or unchanged pages, keeping your freshest content indexed faster. Combined with a control layer like robots meta directives, it gives you precise recrawl governance, fewer pointless visits, more efficient discovery, and tighter index quality. Audit your headers today to reclaim budget and surface what matters.

Related guides

Madison Houlding
Madison Houlding
February 22, 2026, 18:13242 views
Categories:Technical SEO
Madison Houlding
Madison Houlding Content Manager

Madison Houlding Content Manager at Hetneo's Links. Madison runs editorial across the link-building space, auditing campaigns, writing the briefs that keep guest posts from sounding like ad copy, and turning analytics into next month's roadmap. Loves a clean brief, hates a buried lede.

More about the author

Leave a Comment