Cache-Control Headers That Actually Speed Up Your Crawl Budget
Set Cache-Control: no-cache on frequently updated pages to force revalidation, search bots will check for fresh content without refetching unchanged assets. Add max-age=31536000, immutable to static resources like CSS and JavaScript so crawlers skip revisiting files that never change, freeing crawl budget for pages that matter. Use private directives on user-specific or session-based URLs to signal they shouldn’t be cached by proxies or crawlers, preventing bots from wasting crawl budget on personalized variants. Audit your server’s default headers with Chrome DevTools or curl, misconfigurations like missing directives or conflicting Expires headers dilute cache signals and confuse both browsers and bots.
What Cache-Control Does for Crawlers
Cache-Control headers tell crawlers two essential things: whether your content has changed, and how long they can trust their stored copy before checking again. When a bot visits your page, it reads directives like max-age=3600 (cache for one hour) or no-cache (always revalidate before serving). In most cases, this prevents bots from wasting crawl budget re-fetching identical content or, conversely, serving stale versions of pages that update frequently. Three weeks of crawl-budget waste. All from one missing directive.
Quick vocabulary
max-age- Seconds a resource is considered fresh before revalidation, the headline freshness lever for both browsers and bots.
s-maxage- Like
max-agebut only for shared caches (CDNs, proxies). Overridesmax-ageat the edge while leaving browser caches alone. no-cache- Cache the response, but revalidate with origin before serving. Not the same as “don’t cache”.
no-store- Don’t store the response anywhere. The actual “don’t cache” directive, use sparingly.
must-revalidate- When the cached copy is stale, do not serve it without checking origin first, even on network failure.
immutable- Promises the resource will never change for the life of its
max-age. Pairs with versioned filenames to eliminate revalidation entirely.
The key difference from browser caching: browsers prioritize user experience and speed, while crawlers prioritize freshness and discovery. A browser might aggressively cache to load pages faster on repeat visits. Search bots, however, use Cache-Control to decide when to return, too-short expiration means they crawl unnecessarily often, consuming server resources; too-long means they miss updates that could affect rankings. Actually, scratch that, too-long means they miss the updates that matter, image swaps and minor copy edits they can defer all day.
The header becomes a scheduling instruction, “check back in X seconds” instead of “guess when I’ve changed.”
Practical impact, setting max-age=86400 on stable pages (product specs, evergreen guides) signals bots to skip re-crawling for 24 hours, freeing crawl budget for new or frequently updated content. Conversely, no-cache or short max-age values on news articles or inventory pages ensure bots catch changes quickly. The full directive list and behavior is documented in MDN’s HTTP Cache-Control reference.

Directives That Matter Most for Crawl Efficiency
The directive zoo is small but the combinations matter. Here’s how the six headline directives stack up on what they do, when they help, and what they cost in crawl efficiency. Honestly, most teams I’ve worked with use two of these directives and ignore the other four.
| Directive | What it does | Best for | Crawl-budget effect |
|---|---|---|---|
max-age=N |
Freshness window in seconds for any cache | Versioned static assets, evergreen pages | Saves budget when long, wastes it when short |
s-maxage=N |
Freshness window for shared caches only | CDN-edge control without touching browser TTL | Indirect, shifts pressure between edge and origin |
no-cache |
Cache OK, but revalidate every time | Frequently updated pages that still deserve crawling | Neutral, with proper ETags revalidations are cheap 304s |
no-store |
No caching anywhere, full refetch every time | Auth-gated, personalized, or sensitive endpoints | Burns budget, only use where indexing isn’t wanted |
must-revalidate |
Forbids serving stale even on network failure | Pricing, inventory, product pages | Neutral, prevents stale-cache liabilities for bots |
immutable |
Skip revalidation entirely for the cache window | Versioned static files (app.7f3a.js) |
Strongest budget win, eliminates 304 round-trips |
max-age and s-maxage
The max-age directive tells browsers and crawlers how many seconds a resource stays fresh before revalidation. Set max-age=2592000 (30 days) on stable assets like logos or archived blog posts, and bots will skip them during subsequent crawls, preserving budget for dynamic pages. The s-maxage variant applies only to shared caches (CDNs, proxy servers), overriding max-age for intermediate layers while letting user browsers follow their own rules (your CDN may override the origin header silently, so always verify at the edge). Use s-maxage when you want tighter control over edge caching without affecting end-user behavior. For search bots, max-age is the primary signal, longer windows on unchanging content mean fewer wasteful requests and more crawl capacity allocated to your frequently updated pages.
Pro tip
Pair max-age=31536000 with immutable only when your build pipeline produces fingerprinted filenames like app.7f3a9b.css. Without versioning, immutable on a year-long max-age is how you end up shipping a CSS bug that takes 12 months to clear from caches. I’ve watched that one play out on a client site, well, two client sites.
no-cache vs. no-store
Look, the naming here is a trap. no-cache tells browsers and crawlers to revalidate with the origin server before serving cached content, the resource stays cached but requires a freshness check each time. no-store prohibits caching entirely, forcing fresh downloads on every request. For SEO, no-cache preserves crawl efficiency while ensuring Googlebot sees current content when it matters (use for pages that change frequently but still deserve crawling). no-store blocks all caching, burning crawl budget on repetitive fetches, reserve it for truly sensitive content like user dashboards or checkout pages that search engines shouldn’t index anyway. For most teams, public-facing pages benefit from neither directive; static assets and stable pages should use max-age instead to reduce server load and speed up bot crawls.
must-revalidate and proxy-revalidate
These two directives enforce strict revalidation behavior when cached content becomes stale. must-revalidate tells browsers and proxies they cannot serve stale content without checking the origin server first, even if the user might accept it. proxy-revalidate applies the same rule but only to shared caches (CDNs, corporate proxies), leaving private browser caches unaffected. Both override user agent defaults that might serve expired content during network failures.
Watch for
Bots don’t tolerate stale prices the way users do. A cached $49 serving up next to a current $59 is the kind of mismatch that flags in shopping-result QA. For commerce pages, must-revalidate is cheap insurance.
For crawlers, must-revalidate ensures bots see current content on subsequent visits rather than outdated cache entries, helping search engines index fresh data. Use must-revalidate when accuracy matters more than availability, critical for product pages, pricing, or real-time content (I’ve seen this matter most on a 200K-URL e-commerce site where pricing drift cost the team a week of shopping-result re-indexing). proxy-revalidate suits scenarios where individual users can tolerate slight staleness but intermediaries should stay current. These directives reduce crawl waste by preventing bots from encountering stale cached responses at CDN layers.
Setting Cache-Control by Content Type
Tailoring Cache-Control to each content type sharpens both user experience and bot efficiency. Static assets, CSS, JavaScript, images, fonts, rarely change and should carry long max-age values (31536000 for one year) paired with immutable when versioning is in place, allowing bots to skip recrawling these resources and focus bandwidth on indexable pages.
Dynamic pages benefit from shorter windows: homepage and category pages might use max-age=3600 (one hour) or no-cache with validation via ETag, signaling freshness without forcing full redownloads, which keeps crawlers returning at sensible intervals. API endpoints serving personalized or sensitive data warrant no-store to prevent any caching, though this matters less for SEO since bots typically ignore JSON responses.
Paginated archives and faceted filters present a challenge, without proper crawl controls, bots waste budget on duplicate or thin slices; apply short max-age or must-revalidate and combine with canonical tags or noindex directives to steer crawlers toward primary content.
The pattern is consistent. Predictable resources get long cache lives, frequently updated pages get short validation cycles, and non-indexable or sensitive endpoints get strict no-cache or no-store rules. I’d argue that matching directives to change frequency and indexability is the single highest-leverage move available to most technical SEOs, it directly shapes how efficiently search engines allocate crawl budget across your site.
How to Audit Your Current Headers
Start with browser DevTools, open Network tab, reload your page, click any asset, and scan the Response Headers section for Cache-Control. Look for the actual directives served, no guessing required.
For bulk checks, run curl -I https://yoursite.com/page in terminal to see headers instantly. Pipe multiple URLs through a script to spot inconsistencies across your site.
SEO crawlers like Screaming Frog or Sitebulb surface Cache-Control values at scale, flagging pages that send no-cache on static assets or max-age=0 on evergreen content, both waste crawl budget by forcing bots to refetch unchanged resources.
Common red flags, missing Cache-Control entirely (defaults to heuristic caching), conflicting Expires and Cache-Control values, or public directives on authenticated pages. Fix these first to reclaim wasted bot visits and speed up indexing of fresh content. And start with the static-asset offenders, that’s where the bleed is loudest.

Implementation: Server-Level Setup
Apache users can add Cache-Control headers via .htaccess or main config. For static assets that change rarely, use:
<FilesMatch "\.(jpg|png|css|js)$">
Header set Cache-Control "public, max-age=31536000, immutable"
</FilesMatch>
Nginx requires editing server blocks. This snippet caches images and stylesheets for one year:
location ~* \.(jpg|png|css|js)$ {
add_header Cache-Control "public, max-age=31536000, immutable";
}
For CDN-level control, most providers offer UI toggles or API endpoints. Cloudflare’s Page Rules let you override origin headers per URL pattern. Set browser TTL and edge TTL separately to balance freshness with crawl efficiency. Fastly and CloudFront offer similar rule-based config.
Note
Ship cache changes to a staging path first. A site-wide max-age=31536000 applied to HTML by accident, instead of just static assets, is the kind of mistake that takes a long week to walk back. Test the FilesMatch or location regex against a handful of real URLs before pushing to production.
Test with curl to confirm headers reach browsers and bots:
curl -I https://yoursite.com/style.css
Look for the Cache-Control line in the response. Adjust max-age values based on update frequency, shorter for dynamic pages, longer for versioned assets.
Try it this week
Audit five URL types. Fix the one bleeding crawl budget hardest.
-
1
Runcurl -Iagainst one URL each from: homepage, a category page, a blog post, a CSS file, and a hero image. Capture the Cache-Control line for each. -
2
Flag any static asset returningmax-age=0,no-cache, or no Cache-Control header at all. That’s where bots are burning the most budget. -
3
Pick the single worst offender, ship a fix to it only (start small), and verify in GSC’s Crawl stats report next week whether the 304 ratio improves.
One header change won’t move rankings overnight. A pattern of disciplined header hygiene, applied across every release, is what keeps crawl budget pointed at the pages you actually want indexed.
Smart Cache-Control configuration cuts wasted crawl budget by steering bots away from stale or unchanged pages, keeping your freshest content indexed faster. Combined with a control layer like robots meta directives, it gives you precise recrawl governance, fewer pointless visits, more efficient discovery, and tighter index quality. Audit your headers today to reclaim budget and surface what matters.
Related guides
- Wasting Crawl Budget, How thin and duplicate pages drain crawl capacity, and which controls reclaim it.
- Faceted Navigation Crawl Controls, The combinations of canonical, noindex, and Cache-Control that keep facet pages from eating your budget.