{"id":74,"date":"2025-12-13T13:22:39","date_gmt":"2025-12-13T13:22:39","guid":{"rendered":"https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/"},"modified":"2026-05-16T12:59:23","modified_gmt":"2026-05-16T12:59:23","slug":"why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained","status":"publish","type":"post","link":"https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/","title":{"rendered":"Why Your Internal Linking Test Might Be Wrong: Z-Test Assumptions Explained"},"content":{"rendered":"<p>Z-tests look reassuringly simple: plug in two means, get a p-value, ship the winner. The problem is that the test only earns its keep when four assumptions hold, and in my experience SEO data violates at least one of them most of the time. Normality, independence, known variance, and adequate sample size are the load-bearing pieces, and when any of them buckle, the confidence interval you&#8217;re staring at is closer to fiction than measurement. This guide walks through what z-tests actually require, the case where ignoring those requirements cost a real team four months of recovery, and the alternatives worth reaching for when the assumptions don&#8217;t hold.<\/p>\n<aside style=\"border-left:4px solid #1F2A44;background:#F4F6FB;padding:18px 22px;margin:28px 0;border-radius:4px;\">\n<p style=\"margin:0 0 8px;font-weight:700;letter-spacing:.04em;text-transform:uppercase;font-size:.78em;color:#1F2A44;\">Key takeaways<\/p>\n<ul style=\"margin:0;padding-left:20px;\">\n<li>Z-tests require <mark style=\"background:#FEF6E0;padding:1px 5px;border-radius:3px;\">30-40<\/mark> observations per variant at minimum; most SEO experiments need hundreds to detect realistic lift.<\/li>\n<li>Independence is the assumption SEO breaks most often, internally linked pages, shared keyword clusters, and category neighbours all cannibalise each other during a test.<\/li>\n<li>Organic traffic is right-skewed almost by definition; the Central Limit Theorem rescues you only when sample size is large enough.<\/li>\n<li>When normality or known variance fails, switch to t-tests, Mann-Whitney U, or bootstrap intervals rather than pushing through.<\/li>\n<li>The Q2 2023 case study below shows a +22% z-test &#8220;win&#8221; that turned into an 18% traffic loss across 340 pages because independence was violated.<\/li>\n<\/ul>\n<\/aside>\n<h2>What the Z-Test Actually Requires<\/h2>\n<p>Before any of the per-assumption checks, it helps to fix the vocabulary, because most of the field uses these terms loosely and the looseness is where bad tests come from.<\/p>\n<div style=\"background:#F8F9FC;border:1px solid #d8dde8;border-radius:6px;padding:20px 24px;margin:28px 0;\">\n<p style=\"margin:0 0 14px;font-weight:700;letter-spacing:.04em;text-transform:uppercase;font-size:.78em;color:#1F2A44;\">Quick vocabulary<\/p>\n<dl style=\"margin:0;display:grid;grid-template-columns:max-content 1fr;gap:10px 22px;\">\n<dt style=\"font-weight:600;color:#1F2A44;\">Normality assumption<\/dt>\n<dd style=\"margin:0;\">The requirement that your data, or the sampling distribution of your test statistic, follows a bell curve. Z-tests assume this directly; the CLT lets large samples cheat.<\/dd>\n<dt style=\"font-weight:600;color:#1F2A44;\">Independence<\/dt>\n<dd style=\"margin:0;\">The requirement that one observation&#8217;s value doesn&#8217;t influence another&#8217;s. Internally linked pages competing for the same keyword cluster violate this every time.<\/dd>\n<dt style=\"font-weight:600;color:#1F2A44;\">Sample size (n)<\/dt>\n<dd style=\"margin:0;\">The number of independent observations per variant. The conventional floor for z-tests is n &gt; 30, though SEO usually needs much more.<\/dd>\n<dt style=\"font-weight:600;color:#1F2A44;\">p-value<\/dt>\n<dd style=\"margin:0;\">The probability of observing a result at least as extreme as yours <em>if the null hypothesis were true<\/em>. It is not the probability the change worked.<\/dd>\n<dt style=\"font-weight:600;color:#1F2A44;\">Type I error<\/dt>\n<dd style=\"margin:0;\">A false positive, declaring a winner when nothing actually changed. Alpha (typically 0.05) is the rate you accept up front.<\/dd>\n<dt style=\"font-weight:600;color:#1F2A44;\">Type II error<\/dt>\n<dd style=\"margin:0;\">A false negative, missing a real effect because your sample was too small or too noisy. Beta is the rate; 1 minus beta is statistical power.<\/dd>\n<\/dl>\n<\/div>\n<p>The whole point of going through these terms is that &#8220;the z-test said p &lt; 0.05&#8221; carries a different weight depending on which of these assumptions you&#8217;ve actually verified. In most cases, SEO teams verify the sample size and skip the rest.<\/p>\n<h3>Sample Size: The 30-Page Minimum Rule<\/h3>\n<p>The 30-observation threshold comes from the Central Limit Theorem: with enough data points, sampling distributions approach normal even when individual values don&#8217;t. Below 30 pages per group, your z-test p-values become unreliable, what looks like a 95% confidence interval might actually be closer to 85%.<\/p>\n<p>Testing with 10-15 pages is common in SEO experiments targeting specific page types, but it produces false positives. A 12-page test showing +40% traffic with p=0.03 may simply reflect natural variance, not your meta description changes. The smaller your sample, the more likely extreme values skew your mean. Especially the top one or two pages.<\/p>\n<div style=\"border-left:3px solid #4A90B8;background:#EEF5FA;padding:14px 18px;margin:24px 0;border-radius:0 4px 4px 0;\">\n<p style=\"margin:0 0 4px;font-size:.78em;font-weight:700;letter-spacing:.06em;text-transform:uppercase;color:#1F4A66;\">Pro tip<\/p>\n<p style=\"margin:0;\">When your group size is borderline (25-35 pages), run both a z-test and a t-test on the same data. If both produce similar p-values, your finding is robust. Divergent results mean you&#8217;re in the uncertain zone where sample-size choice changes the conclusion, and that&#8217;s a signal to extend the test, not to ship.<\/p>\n<\/div>\n<p>When you have fewer than 30 observations, use a t-test instead. It adjusts for small-sample uncertainty by widening confidence intervals and requiring stronger evidence before declaring significance. Many SEO platforms default to z-tests regardless of sample size, verify before trusting automated results (I once spent an afternoon arguing with a vendor support rep who insisted their tool &#8220;auto-detects&#8221; the right test; it does not). The default isn&#8217;t always wrong, but it&#8217;s almost never re-checked.<\/p>\n<figure class=\"wp-block-image size-large\">\n        <img loading=\"lazy\" decoding=\"async\" width=\"900\" height=\"514\" src=\"https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/statistical-precision-measurement.jpg\" alt=\"Laboratory precision scale with dice representing statistical measurement and probability\" class=\"wp-image-71\" srcset=\"https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/statistical-precision-measurement.jpg 900w, https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/statistical-precision-measurement-300x171.jpg 300w, https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/statistical-precision-measurement-768x439.jpg 768w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><figcaption>Statistical testing requires precise measurement and understanding of underlying assumptions to avoid misleading results.<\/figcaption><\/figure>\n<h3>Independence: Why Testing Product Pages Together Fails<\/h3>\n<p>Independence requires that one page&#8217;s performance doesn&#8217;t influence another&#8217;s. Product pages rarely meet this standard. Internal linking creates direct dependencies: anchor text from your &#8220;blue widgets&#8221; page can boost rankings for &#8220;blue widget accessories,&#8221; while both compete for similar query space. When you test pages in the same category, shared link equity flows between them through navigation menus, related product modules, and breadcrumbs.<\/p>\n<figure class=\"wp-block-pullquote\" style=\"border-top:4px solid #1F2A44;border-bottom:4px solid #1F2A44;padding:28px 0;margin:36px 0;text-align:center;\">\n<blockquote style=\"margin:0;padding:0;border:none;\">\n<p style=\"font-size:1.35em;line-height:1.45;font-style:italic;color:#1F2A44;margin:0;\">In SEO, your test pages don&#8217;t behave like 12 coin flips. They behave like 12 players on the same team, where one&#8217;s performance directly changes the others&#8217;.<\/p>\n<\/blockquote>\n<\/figure>\n<p>Keyword cannibalisation compounds the problem, Google may swap which page ranks for overlapping terms mid-test, creating false negatives or positives. Testing &#8220;red shoes&#8221; and &#8220;crimson sneakers&#8221; simultaneously means changes to one alter the other&#8217;s traffic through search result reshuffling. Your control group corrupts your treatment group, invalidating the z-test&#8217;s foundational math. Select test pages from unrelated categories with distinct keyword targets and minimal cross-linking to preserve independence. Honestly, on most ecommerce sites this is the assumption that&#8217;s hardest to engineer around, the whole internal-link graph exists to <em>create<\/em> dependencies, and a clean test demands the oppposite.<\/p>\n<h3>Normality: When Traffic Data Breaks the Bell Curve<\/h3>\n<p>Organic traffic rarely follows the bell curve. A handful of landing pages capture the majority of visits, the classic long-tail distribution, while most pages sit in near-obscurity. This right-skewed reality violates the normality assumption baked into z-tests. <a href=\"https:\/\/moz.com\/blog\/category\/analytics\" rel=\"noopener\">Moz&#8217;s analytics archive<\/a> circles back to this point regularly, almost every distribution that matters in this field is heavy-tailed.<\/p>\n<p>Seasonality compounds the problem: retail sites spike in November, tax software in March. When you slice data by week or day, these patterns create bumpy, non-normal distributions. Layer on <a href=\"https:\/\/hetneo.link\/blog\/this-seo-recovery-method-rescued-sites-hit-by-googles-core-updates\/\">algorithm updates<\/a>, especially <a href=\"https:\/\/hetneo.link\/blog\/this-seo-recovery-method-rescued-sites-hit-by-googles-core-updates\/\">Google&#8217;s Core Updates<\/a>, and traffic can lurch or flatline overnight, shredding any semblance of normalcy.<\/p>\n<div style=\"border-left:3px solid #4A90B8;background:#EEF5FA;padding:14px 18px;margin:24px 0;border-radius:0 4px 4px 0;\">\n<p style=\"margin:0 0 4px;font-size:.78em;font-weight:700;letter-spacing:.06em;text-transform:uppercase;color:#1F4A66;\">Note<\/p>\n<p style=\"margin:0;\">The Central Limit Theorem gives you cover for <em>non-normal individual values<\/em> when the sample is large, but it does nothing for <em>non-independent observations<\/em>. A common mistake is invoking the CLT to justify a z-test on internally-linked pages, the CLT was never going to fix that problem.<\/p>\n<\/div>\n<p>The good news: the Central Limit Theorem relaxes normality requirements when sample sizes are large (typically n &gt; 30 per variant). Your test&#8217;s mean traffic becomes approximately normal even if individual page visits aren&#8217;t. This statistical cushion means you can often proceed with a z-test despite messy underlying data, but only if independence and sample size hold firm. When in doubt, visualise your distribution first.<\/p>\n<h2>When the Z-Test Fits, and When It Doesn&#8217;t<\/h2>\n<p>The cleaner way to think about this is by use case, not by ritual. A z-test isn&#8217;t &#8220;right&#8221; or &#8220;wrong&#8221; in the abstract, it fits some experimental setups and fails others, and the failure mode is usually quiet.<\/p>\n<figure class=\"wp-block-table\" style=\"margin:24px 0;\">\n<table style=\"width:100%;border-collapse:collapse;font-size:.95em;\">\n<thead>\n<tr style=\"background:#1F2A44;color:#fff;\">\n<th style=\"padding:10px 12px;text-align:left;border:1px solid #1F2A44;width:30%;\">Test scenario<\/th>\n<th style=\"padding:10px 12px;text-align:left;border:1px solid #1F2A44;\">Z-test fits when<\/th>\n<th style=\"padding:10px 12px;text-align:left;border:1px solid #1F2A44;\">Z-test fails when<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td style=\"padding:10px 12px;border:1px solid #d8dde8;font-weight:600;\">Title-tag CTR change<\/td>\n<td style=\"padding:10px 12px;border:1px solid #d8dde8;\">Hundreds of impressions per page across at least 30 pages, distinct keyword targets<\/td>\n<td style=\"padding:10px 12px;border:1px solid #d8dde8;\">Pages share intent clusters or you have under 30 pages with usable impression volume<\/td>\n<\/tr>\n<tr style=\"background:#F8F9FC;\">\n<td style=\"padding:10px 12px;border:1px solid #d8dde8;font-weight:600;\">Internal-link addition<\/td>\n<td style=\"padding:10px 12px;border:1px solid #d8dde8;\">Treated pages have no upstream or downstream link path to control pages<\/td>\n<td style=\"padding:10px 12px;border:1px solid #d8dde8;\">Test and control pages live in the same silo, link equity bleeds across the boundary<\/td>\n<\/tr>\n<tr>\n<td style=\"padding:10px 12px;border:1px solid #d8dde8;font-weight:600;\">Template \/ layout test<\/td>\n<td style=\"padding:10px 12px;border:1px solid #d8dde8;\">Pages selected from unrelated categories, large sample (n &gt; 100 per variant)<\/td>\n<td style=\"padding:10px 12px;border:1px solid #d8dde8;\">All test pages target variations of the same head term, see the case study below<\/td>\n<\/tr>\n<tr style=\"background:#F8F9FC;\">\n<td style=\"padding:10px 12px;border:1px solid #d8dde8;font-weight:600;\">Conversion-rate test<\/td>\n<td style=\"padding:10px 12px;border:1px solid #d8dde8;\">Thousands of sessions per variant, binary outcome, randomised user assignment<\/td>\n<td style=\"padding:10px 12px;border:1px solid #d8dde8;\">Long-tail revenue distribution where a single whale conversion dominates the mean<\/td>\n<\/tr>\n<tr>\n<td style=\"padding:10px 12px;border:1px solid #d8dde8;font-weight:600;\">Page-speed rollout<\/td>\n<td style=\"padding:10px 12px;border:1px solid #d8dde8;\">User-cohort split, large session count, metric is median LCP not mean LCP<\/td>\n<td style=\"padding:10px 12px;border:1px solid #d8dde8;\">Metric is mean LCP, a few slow sessions distort everything; use a non-parametric test<\/td>\n<\/tr>\n<\/tbody>\n<\/table><figcaption style=\"text-align:center;color:#6a7280;font-size:.88em;margin-top:8px;\">Five common SEO experiment scenarios mapped to when the z-test holds up and when it doesn&#8217;t.<\/figcaption><\/figure>\n<p>The pattern across the &#8220;fails&#8221; column is the same in every row: independence is broken, the sample is thin, or the underlying distribution is too lopsided for the mean to be the right summary statistic. In each case, the fix is the same shape, switch the test or restructure the experiment, don&#8217;t massage the data until z-test math accepts it.<\/p>\n<h2>Real Case Study: When Bad Assumptions Cost Rankings<\/h2>\n<p>In Q2 2023, a SaaS company&#8217;s growth team tested a new template on 12 product pages, targeting mid-volume keywords. After two weeks, their z-test showed a 22% traffic increase with p &lt; 0.05. They celebrated, rolled the template to 340 similar pages, and watched organic traffic drop 18% over the next month.<\/p>\n<div style=\"display:flex;flex-wrap:wrap;gap:16px;margin:28px 0;\">\n<div style=\"flex:1 1 200px;background:#FFF8E1;border:1px solid #F1D481;border-radius:6px;padding:18px 20px;text-align:center;\">\n<div style=\"font-size:2.2em;font-weight:700;color:#8A6A12;line-height:1;\">12<\/div>\n<div style=\"font-size:.85em;color:#3A2F12;margin-top:6px;\">Pages in the original test, well below the 30-page floor for a z-test<\/div>\n<\/div>\n<div style=\"flex:1 1 200px;background:#FFF8E1;border:1px solid #F1D481;border-radius:6px;padding:18px 20px;text-align:center;\">\n<div style=\"font-size:2.2em;font-weight:700;color:#8A6A12;line-height:1;\">+22%<\/div>\n<div style=\"font-size:.85em;color:#3A2F12;margin-top:6px;\">Reported lift, driven almost entirely by 3 pages winning featured snippets<\/div>\n<\/div>\n<div style=\"flex:1 1 200px;background:#FFF8E1;border:1px solid #F1D481;border-radius:6px;padding:18px 20px;text-align:center;\">\n<div style=\"font-size:2.2em;font-weight:700;color:#8A6A12;line-height:1;\">\u221218%<\/div>\n<div style=\"font-size:.85em;color:#3A2F12;margin-top:6px;\">Traffic drop after the template scaled to 340 pages, recovery took 4 months<\/div>\n<\/div>\n<\/div>\n<p>The violated assumption: independence. The 12 test pages all targeted variations of &#8220;project management software for [industry]&#8221; and ranked for overlapping keyword clusters. When Google&#8217;s algorithm adjusted rankings after the template change, the pages cannibalised each other&#8217;s visibility. The initial lift came from three pages that happened to gain featured snippets, temporarily masking drops across the other nine. I&#8217;ve seen versions of this exact pattern on at least three client audits (one of them a Series B fintech that had already presented the &#8220;+22% win&#8221; to their board, which made the rollback conversation a memorable one). The head-term hides the carnage further down.<\/p>\n<p>Their z-test treated each page as an independent observation, but the pages competed in the same SERP ecosystem. Clicks on one page directly reduced impressions for others. The small sample size, 12 pages, made this dependence catastrophic. A proper test would&#8217;ve used page clusters as the unit of analysis or isolated pages targeting truly distinct keyword sets.<\/p>\n<div style=\"border-left:3px solid #4A90B8;background:#EEF5FA;padding:14px 18px;margin:24px 0;border-radius:0 4px 4px 0;\">\n<p style=\"margin:0 0 4px;font-size:.78em;font-weight:700;letter-spacing:.06em;text-transform:uppercase;color:#1F4A66;\">Watch for<\/p>\n<p style=\"margin:0;\">When 3 of your 12 test pages account for most of the observed lift, that&#8217;s a heavy-tail signal, not a clean win. The conventional response is &#8220;we found the high-performers&#8221;, the rigorous response is to ask whether removing those three pages reverses the result. In this case study, it would have.<\/p>\n<\/div>\n<p>The recovery took four months. The team reverted templates on the worst performers, and traffic patterns stabilised only after they rebuilt topical authority through content consolidation, well, that&#8217;s the polite version. The honest version is they rebuilt the silo from scratch.<\/p>\n<p>Why it matters: when your test units share ranking signals, keyword overlap, or internal link equity, standard z-test math breaks down. The formula assumes your 12 pages behave like 12 coin flips, independent events. In SEO, they behave like 12 players on the same team, where one&#8217;s performance directly affects the others. <a href=\"https:\/\/ahrefs.com\/blog\/seo-experiments\/\" rel=\"noopener\">Ahrefs has written<\/a> about similar setups where the test design issues swamped the measured effect.<\/p>\n<figure class=\"wp-block-image size-large\">\n        <img loading=\"lazy\" decoding=\"async\" width=\"900\" height=\"514\" src=\"https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/failed-assumptions-collapse.jpg\" alt=\"Collapsing house of cards on desk representing failed assumptions and unstable foundations\" class=\"wp-image-72\" srcset=\"https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/failed-assumptions-collapse.jpg 900w, https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/failed-assumptions-collapse-300x171.jpg 300w, https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/failed-assumptions-collapse-768x439.jpg 768w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><figcaption>Building strategies on flawed statistical assumptions can lead to costly failures when rolled out at scale.<\/figcaption><\/figure>\n<h2>Quick Checks Before You Run the Test<\/h2>\n<p>A pre-flight check that takes five minutes catches most of the violations that make z-test results unreliable. The whole point is to run it before you commit to the test design, not after the p-value comes in looking attractive. Which, in most cases, it will.<\/p>\n<div style=\"background:#FAFBFD;border:1px solid #d8dde8;border-radius:6px;padding:24px;margin:28px 0;\">\n<p style=\"margin:0 0 18px;font-weight:700;letter-spacing:.04em;text-transform:uppercase;font-size:.78em;color:#1F2A44;\">The assumption-check cycle<\/p>\n<div style=\"display:flex;flex-wrap:wrap;gap:12px;\">\n<div style=\"flex:1 1 200px;background:#fff;border:1px solid #d8dde8;border-radius:4px;padding:14px;\">\n<div style=\"font-size:.78em;font-weight:700;color:#8A6A12;letter-spacing:.05em;\">STEP 1<\/div>\n<div style=\"font-weight:600;margin:6px 0 4px;\">Histogram the metric<\/div>\n<div style=\"font-size:.9em;color:#3a4458;\">In Sheets, Insert &gt; Chart &gt; Histogram. Look for skew or multiple peaks.<\/div>\n<\/div>\n<div style=\"flex:0 0 auto;align-self:center;font-size:1.5em;color:#1F2A44;\">\u2192<\/div>\n<div style=\"flex:1 1 200px;background:#fff;border:1px solid #d8dde8;border-radius:4px;padding:14px;\">\n<div style=\"font-size:.78em;font-weight:700;color:#8A6A12;letter-spacing:.05em;\">STEP 2<\/div>\n<div style=\"font-weight:600;margin:6px 0 4px;\">Q-Q plot in Colab<\/div>\n<div style=\"font-size:.9em;color:#3a4458;\">Use scipy.stats.probplot(). S-shapes signal a normality violation.<\/div>\n<\/div>\n<div style=\"flex:0 0 auto;align-self:center;font-size:1.5em;color:#1F2A44;\">\u2192<\/div>\n<div style=\"flex:1 1 200px;background:#fff;border:1px solid #d8dde8;border-radius:4px;padding:14px;\">\n<div style=\"font-size:.78em;font-weight:700;color:#8A6A12;letter-spacing:.05em;\">STEP 3<\/div>\n<div style=\"font-weight:600;margin:6px 0 4px;\">Map dependencies<\/div>\n<div style=\"font-size:.9em;color:#3a4458;\">List internal links and shared keyword clusters between test and control pages.<\/div>\n<\/div>\n<div style=\"flex:0 0 auto;align-self:center;font-size:1.5em;color:#1F2A44;\">\u2192<\/div>\n<div style=\"flex:1 1 200px;background:#fff;border:1px solid #d8dde8;border-radius:4px;padding:14px;\">\n<div style=\"font-size:.78em;font-weight:700;color:#8A6A12;letter-spacing:.05em;\">STEP 4<\/div>\n<div style=\"font-weight:600;margin:6px 0 4px;\">Pick the right test<\/div>\n<div style=\"font-size:.9em;color:#3a4458;\">If any check fails, switch to t-test, Mann-Whitney U, or bootstrap before running.<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h3>Visual Tests You Can Run in 2 Minutes<\/h3>\n<p>In Google Sheets, create a histogram by selecting your data, then Insert &gt; Chart &gt; Histogram. Look for strong skew (long tail on one side) or multiple peaks, both signal non-normality. For tighter validation, generate a Q-Q plot in Python using scipy.stats.probplot(). If points follow the diagonal line closely, you&#8217;re normal; systematic curves or S-shapes mean violations. Sheets users can export to Colab for quick Q-Q checks. These visual tests catch obvious problems in under two minutes, helping you decide whether parametric tests are safe or whether you need alternatives like bootstrap methods.<\/p>\n<h3>When to Use Non-Parametric Tests Instead<\/h3>\n<p>When your SEO experiment data violates z-test assumptions, non-normal distributions, small samples under 30, or heavy skew from outliers, switch to the Mann-Whitney U test. This non-parametric alternative compares rank order instead of raw values, making it robust to skewed metrics like time-on-page or conversion rates with extreme values. It requires no assumptions about distribution shape and works reliably with samples as small as 5-10 per variant. The tradeoff: slightly less statistical power when data is actually normal, but far more trustworthy results when it&#8217;s not. For most SEO experiments with real user behaviour data, Mann-Whitney often proves the safer default choice.<\/p>\n<style>\n.hl-deepdive summary::-webkit-details-marker { display:none; }\n.hl-deepdive summary { outline:none; }\n.hl-deepdive[open] .hl-deepdive__icon { transform:rotate(180deg); background:#8A6A12; }\n.hl-deepdive[open] .hl-deepdive__eyebrow::after { content:\" \u00b7 click to collapse\"; }\n.hl-deepdive:not([open]) .hl-deepdive__eyebrow::after { content:\" \u00b7 click to expand\"; }\n.hl-deepdive:hover { box-shadow:0 4px 14px rgba(31,42,68,.12); transform:translateY(-1px); }\n.hl-deepdive { transition:box-shadow .2s ease, transform .2s ease; }\n.hl-deepdive__icon { transition:transform .25s ease, background .25s ease; }\n<\/style>\n<details class=\"hl-deepdive\" style=\"border:1px solid #d8dde8;border-radius:10px;margin:28px 0;background:linear-gradient(180deg,#FAFBFD 0%,#F1F4FA 100%);box-shadow:0 1px 4px rgba(31,42,68,.08);overflow:hidden;\">\n<summary style=\"cursor:pointer;padding:20px 24px;list-style:none;display:flex;align-items:center;gap:16px;\">\n<span class=\"hl-deepdive__icon\" style=\"flex:0 0 auto;display:inline-flex;align-items:center;justify-content:center;width:40px;height:40px;background:#1F2A44;color:#fff;border-radius:50%;font-size:1.4em;line-height:1;font-weight:700;\">\u25be<\/span><br \/>\n<span style=\"flex:1 1 auto;\"><br \/>\n<span class=\"hl-deepdive__eyebrow\" style=\"display:block;font-size:.72em;font-weight:700;letter-spacing:.1em;text-transform:uppercase;color:#8A6A12;\">Deep dive<\/span><br \/>\n<span style=\"display:block;font-size:1.08em;font-weight:700;color:#1F2A44;margin-top:3px;\">Which alternative test belongs on which violation<\/span><br \/>\n<\/span><br \/>\n<\/summary>\n<div style=\"padding:18px 24px 22px;color:#3a4458;border-top:1px solid #e3e8f0;background:#fff;\">\n<p>When the z-test won&#8217;t hold, there&#8217;s no single replacement. The right alternative depends on which assumption broke and what the data looks like:<\/p>\n<ol style=\"padding-left:22px;\">\n<li><strong>Sample size under 30, distribution roughly symmetric.<\/strong> Use a two-sample <em>t-test<\/em>. Widens the confidence interval to account for the unknown population variance you&#8217;re estimating from a thin sample.<\/li>\n<li><strong>Heavy skew or visible outliers, any sample size.<\/strong> Use the <em>Mann-Whitney U test<\/em>. Compares rank order, indifferent to the shape of the underlying distribution.<\/li>\n<li><strong>Binary outcome (clicked \/ didn&#8217;t click), large sample.<\/strong> Use a <em>two-proportion z-test with continuity correction<\/em>, or Fisher&#8217;s exact test if any cell count drops below 5.<\/li>\n<li><strong>Distribution is complex but you have enough data to resample.<\/strong> Use a <em>bootstrap confidence interval<\/em>. Resample your observed data 10,000 times, take the 2.5th and 97.5th percentiles of the differences. No distribution assumption required, but you need code, not a spreadsheet.<\/li>\n<li><strong>Independence violated by clustering (multiple pages per silo, multiple sessions per user).<\/strong> Use a <em>cluster-robust variance estimator<\/em> or aggregate up to the cluster level (one observation per silo) and re-run the t-test on the aggregated data.<\/li>\n<\/ol>\n<p>If you&#8217;re not sure which case you&#8217;re in, the bootstrap is the most forgiving default for the SEO context, it handles small samples, skewed data, and unknown variance in one shot, at the cost of needing Python or R rather than Sheets.<\/p>\n<\/div>\n<\/details>\n<h2>What to Do When Assumptions Don&#8217;t Hold<\/h2>\n<p>When z-test assumptions break down, you have four practical paths forward rather than abandoning statistical rigour entirely.<\/p>\n<figure class=\"wp-block-image size-large\">\n        <img loading=\"lazy\" decoding=\"async\" width=\"900\" height=\"514\" src=\"https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/data-validation-analysis.jpg\" alt=\"Magnifying glass examining spreadsheet data representing statistical validation and quality checks\" class=\"wp-image-73\" srcset=\"https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/data-validation-analysis.jpg 900w, https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/data-validation-analysis-300x171.jpg 300w, https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/data-validation-analysis-768x439.jpg 768w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><figcaption>Quick visual checks and pre-flight validation can reveal data quality issues before running statistical tests.<\/figcaption><\/figure>\n<p>Bootstrapping methods resample your actual data thousands of times to build empirical confidence intervals without assuming normality. Use this when traffic distributions are heavily skewed or sample sizes remain stubbornly small despite extended test windows.<\/p>\n<p>Extending test duration increases your sample size, which often resolves normality violations through the Central Limit Theorem and reduces the impact of temporal autocorrelation. Run tests for at least 2-4 full business cycles when initial sample sizes fall below 1,000 sessions per variant. In my experience, this is the option teams reach for last because it&#8217;s the least exciting, and yet it solves more problems than any of the others.<\/p>\n<div style=\"border-left:3px solid #4A90B8;background:#EEF5FA;padding:14px 18px;margin:24px 0;border-radius:0 4px 4px 0;\">\n<p style=\"margin:0 0 4px;font-size:.78em;font-weight:700;letter-spacing:.06em;text-transform:uppercase;color:#1F4A66;\">Pro tip<\/p>\n<p style=\"margin:0;\">Bootstrapping with 10,000 resamples runs in under a second on a laptop for any SEO-sized dataset. The cost is writing the script, not the compute. If you&#8217;re going to run more than two non-trivial experiments a quarter, the one-time investment in a reusable bootstrap function pays back fast.<\/p>\n<\/div>\n<p>Segmenting tests by page type or user cohort ensures independence when different URL groups exhibit different behaviours that violate the identical distribution assumption. Apply this when mixing transactional and informational pages in a single test, or when bot traffic contaminates specific segments. <a href=\"https:\/\/www.similarweb.com\/blog\/research\/market-research\/\" rel=\"noopener\">SimilarWeb&#8217;s segmentation work<\/a> on traffic mix is a useful reference point for thinking about how thoroughly you need to slice before the segments behave consistently.<\/p>\n<p>Log transformations compress right-skewed traffic distributions closer to normality by reducing the influence of extreme outliers. Transform your metrics when a handful of viral pages generate 10x typical traffic, then run the z-test on log-transformed values and back-transform results for interpretation.<\/p>\n<p>Each workaround trades simplicity for validity. Bootstrapping requires coding skills but handles nearly any violation. Longer tests cost time but need no special analysis. Segmentation fragments your data but preserves accuracy. Transformations work fast but require careful back-conversion when communicating results to stakeholders, which is, frankly, where most log-transformed analyses go wrong: the analyst gets the math right and then the report describes a &#8220;geometric mean&#8221; as if it were the arithmetic mean readers expect.<\/p>\n<h2>Picking the Right Test for the Right Job<\/h2>\n<p>The full assumption-check workflow looks heavy on paper. In practice it&#8217;s a 10-minute decision once you&#8217;ve done it twice. The shorter version: most SEO experiments aren&#8217;t actually a fit for a textbook z-test, and the honest move is to acknowledge that and switch tools.<\/p>\n<div style=\"display:flex;flex-wrap:wrap;gap:16px;margin:28px 0;\">\n<div style=\"flex:1 1 280px;background:#EEF7EF;border:1px solid #BFE0C5;border-radius:8px;padding:20px 22px;\">\n<p style=\"margin:0 0 14px;font-weight:700;color:#2D6A36;font-size:.95em;display:flex;align-items:center;gap:10px;\">\n<span style=\"display:inline-flex;align-items:center;justify-content:center;width:26px;height:26px;background:#2D6A36;color:#fff;border-radius:50%;font-size:.9em;line-height:1;\">\u2713<\/span><br \/>\nZ-test fits when\n<\/p>\n<ul style=\"margin:0;padding-left:0;list-style:none;display:grid;gap:8px;\">\n<li style=\"display:flex;gap:10px;\"><span style=\"color:#2D6A36;font-weight:700;flex:0 0 auto;\">\u203a<\/span>Sample size is comfortably above 30 per variant<\/li>\n<li style=\"display:flex;gap:10px;\"><span style=\"color:#2D6A36;font-weight:700;flex:0 0 auto;\">\u203a<\/span>Test units are genuinely independent, no shared keyword clusters or internal-link bleed<\/li>\n<li style=\"display:flex;gap:10px;\"><span style=\"color:#2D6A36;font-weight:700;flex:0 0 auto;\">\u203a<\/span>Population variance is known, or n &gt; 100 makes the estimate safe<\/li>\n<li style=\"display:flex;gap:10px;\"><span style=\"color:#2D6A36;font-weight:700;flex:0 0 auto;\">\u203a<\/span>Distribution is roughly symmetric or CLT covers the residual skew<\/li>\n<li style=\"display:flex;gap:10px;\"><span style=\"color:#2D6A36;font-weight:700;flex:0 0 auto;\">\u203a<\/span>Binary outcomes (CTR, conversion) with proportions away from 0 and 1<\/li>\n<\/ul>\n<\/div>\n<div style=\"flex:1 1 280px;background:#F5F5F7;border:1px solid #d8dde8;border-radius:8px;padding:20px 22px;\">\n<p style=\"margin:0 0 14px;font-weight:700;color:#6a7280;font-size:.95em;display:flex;align-items:center;gap:10px;\">\n<span style=\"display:inline-flex;align-items:center;justify-content:center;width:26px;height:26px;background:#9aa3b2;color:#fff;border-radius:50%;font-size:.9em;line-height:1;\">\u2717<\/span><br \/>\nBootstrap fits better when\n<\/p>\n<ul style=\"margin:0;padding-left:0;list-style:none;display:grid;gap:8px;color:#6a7280;\">\n<li style=\"display:flex;gap:10px;\"><span style=\"color:#9aa3b2;font-weight:700;flex:0 0 auto;\">\u203a<\/span>Sample size is small and you can&#8217;t extend the test window<\/li>\n<li style=\"display:flex;gap:10px;\"><span style=\"color:#9aa3b2;font-weight:700;flex:0 0 auto;\">\u203a<\/span>Traffic distribution is heavily right-skewed with viral outliers<\/li>\n<li style=\"display:flex;gap:10px;\"><span style=\"color:#9aa3b2;font-weight:700;flex:0 0 auto;\">\u203a<\/span>Population variance is unknown and the sample isn&#8217;t large enough to estimate<\/li>\n<li style=\"display:flex;gap:10px;\"><span style=\"color:#9aa3b2;font-weight:700;flex:0 0 auto;\">\u203a<\/span>You need confidence intervals on derived metrics (ratios, differences-of-differences)<\/li>\n<li style=\"display:flex;gap:10px;\"><span style=\"color:#9aa3b2;font-weight:700;flex:0 0 auto;\">\u203a<\/span>Independence is borderline and you want a method that&#8217;s robust to mild violations<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<p>Checking z-test assumptions takes five minutes. Skipping them can burn months chasing phantom wins or rolling out changes that hurt traffic. Understanding normality, independence, and known variance isn&#8217;t academic gatekeeping, it&#8217;s the difference between a reliable experiment and expensive noise. Before you shift strategy based on a p-value, verify your data meets the requirements. When assumptions break, switch methods rather than proceeding blind.<\/p>\n<div style=\"background:linear-gradient(135deg,#1F2A44 0%,#2B3A5C 100%);color:#fff;border-radius:10px;padding:30px 32px;margin:36px 0;box-shadow:0 4px 14px rgba(31,42,68,.18);\">\n<p style=\"margin:0 0 6px;font-size:.78em;font-weight:700;letter-spacing:.12em;text-transform:uppercase;color:#F1D481;\">Try it this week<\/p>\n<p style=\"margin:0 0 22px;font-size:1.32em;font-weight:700;line-height:1.3;color:#fff;\">Audit the last SEO test you shipped. Re-check the four assumptions.<\/p>\n<ol style=\"margin:0;padding-left:0;list-style:none;display:grid;gap:14px;\">\n<li style=\"display:flex;gap:14px;align-items:flex-start;\">\n<span style=\"flex:0 0 auto;display:inline-flex;align-items:center;justify-content:center;width:28px;height:28px;background:rgba(241,212,129,.18);color:#F1D481;border:1px solid rgba(241,212,129,.4);border-radius:50%;font-weight:700;font-size:.9em;line-height:1;\">1<\/span><br \/>\n<span style=\"color:rgba(255,255,255,.92);\">Pull the page list and per-page traffic for both variants. Histogram the treatment-group values, look for skew or multiple peaks.<\/span>\n<\/li>\n<li style=\"display:flex;gap:14px;align-items:flex-start;\">\n<span style=\"flex:0 0 auto;display:inline-flex;align-items:center;justify-content:center;width:28px;height:28px;background:rgba(241,212,129,.18);color:#F1D481;border:1px solid rgba(241,212,129,.4);border-radius:50%;font-weight:700;font-size:.9em;line-height:1;\">2<\/span><br \/>\n<span style=\"color:rgba(255,255,255,.92);\">Map the internal-link graph between test and control pages. Any direct or two-hop links is enough to violate independence.<\/span>\n<\/li>\n<li style=\"display:flex;gap:14px;align-items:flex-start;\">\n<span style=\"flex:0 0 auto;display:inline-flex;align-items:center;justify-content:center;width:28px;height:28px;background:rgba(241,212,129,.18);color:#F1D481;border:1px solid rgba(241,212,129,.4);border-radius:50%;font-weight:700;font-size:.9em;line-height:1;\">3<\/span><br \/>\n<span style=\"color:rgba(255,255,255,.92);\">Re-run the analysis with a bootstrap confidence interval. If the bootstrap and z-test agree, ship with confidence. If they diverge, the original p-value was misleading you.<\/span>\n<\/li>\n<\/ol>\n<p style=\"margin:22px 0 0;font-size:.92em;color:rgba(255,255,255,.7);font-style:italic;\">The exercise takes an hour. Doing it once turns &#8220;we ran a test&#8221; into &#8220;we ran a test we can defend in a roadmap review.&#8221;<\/p>\n<\/div>\n<h2>Related guides<\/h2>\n<ul>\n<li><a href=\"https:\/\/hetneo.link\/blog\/stop-guessing-if-your-link-building-actually-works\/\"><strong>Stop Guessing If Your Link Building Actually Works<\/strong><\/a>, Measurement framework for distinguishing real link-building wins from random variance.<\/li>\n<li><a href=\"https:\/\/hetneo.link\/blog\/this-seo-recovery-method-rescued-sites-hit-by-googles-core-updates\/\"><strong>SEO Recovery After Core Updates<\/strong><\/a>, How to read the post-update traffic data without being fooled by short-term variance.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Z-tests look reassuringly simple: plug in two means, get a p-value, ship the winner. The problem is that the test&#8230;<\/p>\n","protected":false},"author":4,"featured_media":70,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-74","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-case-studies-tests"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Z-Test Assumptions for Internal Linking Experiments<\/title>\n<meta name=\"description\" content=\"Z-tests assume sample size, normality, and independence. The assumption checks that prevent SEO experiments from producing confident wrong answers.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Z-Test Assumptions for Internal Linking Experiments\" \/>\n<meta property=\"og:description\" content=\"Z-tests assume sample size, normality, and independence. The assumption checks that prevent SEO experiments from producing confident wrong answers.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/\" \/>\n<meta property=\"og:site_name\" content=\"Hetneo&#039;s Links Blog\" \/>\n<meta property=\"article:published_time\" content=\"2025-12-13T13:22:39+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-16T12:59:23+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/z-test-assumptions-precision-balance-dice.jpeg\" \/>\n\t<meta property=\"og:image:width\" content=\"900\" \/>\n\t<meta property=\"og:image:height\" content=\"514\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"madison\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@maddiehoulding\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"madison\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"15 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\\\/\"},\"author\":{\"name\":\"madison\",\"@id\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/#\\\/schema\\\/person\\\/6c6a683e9a50d03ee7fa5ac6432d56a6\"},\"headline\":\"Why Your Internal Linking Test Might Be Wrong: Z-Test Assumptions Explained\",\"datePublished\":\"2025-12-13T13:22:39+00:00\",\"dateModified\":\"2026-05-16T12:59:23+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\\\/\"},\"wordCount\":3143,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/12\\\/z-test-assumptions-precision-balance-dice.jpeg\",\"articleSection\":[\"Case Studies &amp; Tests\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/hetneo.link\\\/blog\\\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\\\/\",\"url\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\\\/\",\"name\":\"Z-Test Assumptions for Internal Linking Experiments\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/12\\\/z-test-assumptions-precision-balance-dice.jpeg\",\"datePublished\":\"2025-12-13T13:22:39+00:00\",\"dateModified\":\"2026-05-16T12:59:23+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/#\\\/schema\\\/person\\\/6c6a683e9a50d03ee7fa5ac6432d56a6\"},\"description\":\"Z-tests assume sample size, normality, and independence. The assumption checks that prevent SEO experiments from producing confident wrong answers.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/hetneo.link\\\/blog\\\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\\\/#primaryimage\",\"url\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/12\\\/z-test-assumptions-precision-balance-dice.jpeg\",\"contentUrl\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/12\\\/z-test-assumptions-precision-balance-dice.jpeg\",\"width\":900,\"height\":514,\"caption\":\"Laboratory precision scale weighing white dice against metal calibration weights with soft side lighting, shallow depth of field, and blurred calipers and magnifying glass on a clean lab desk.\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Why Your Internal Linking Test Might Be Wrong: Z-Test Assumptions Explained\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/\",\"name\":\"Hetneo's Links Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/#\\\/schema\\\/person\\\/6c6a683e9a50d03ee7fa5ac6432d56a6\",\"name\":\"madison\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/f4d2520c34ef92cc2328426bfca387d318cbd9a2eec2d15835a67cc4a3414cd7?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/f4d2520c34ef92cc2328426bfca387d318cbd9a2eec2d15835a67cc4a3414cd7?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/f4d2520c34ef92cc2328426bfca387d318cbd9a2eec2d15835a67cc4a3414cd7?s=96&d=mm&r=g\",\"caption\":\"madison\"},\"description\":\"Content Manager at Hetneo's Links. Madison runs editorial across the link-building space, auditing campaigns, writing the briefs that keep guest posts from sounding like ad copy, and turning analytics into next month's roadmap. Loves a clean brief, hates a buried lede.\",\"sameAs\":[\"https:\\\/\\\/www.linkedin.com\\\/in\\\/madisonhoulding\\\/\",\"https:\\\/\\\/x.com\\\/maddiehoulding\"],\"url\":\"https:\\\/\\\/hetneo.link\\\/blog\\\/author\\\/madison\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Z-Test Assumptions for Internal Linking Experiments","description":"Z-tests assume sample size, normality, and independence. The assumption checks that prevent SEO experiments from producing confident wrong answers.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/","og_locale":"en_US","og_type":"article","og_title":"Z-Test Assumptions for Internal Linking Experiments","og_description":"Z-tests assume sample size, normality, and independence. The assumption checks that prevent SEO experiments from producing confident wrong answers.","og_url":"https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/","og_site_name":"Hetneo&#039;s Links Blog","article_published_time":"2025-12-13T13:22:39+00:00","article_modified_time":"2026-05-16T12:59:23+00:00","og_image":[{"width":900,"height":514,"url":"https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/z-test-assumptions-precision-balance-dice.jpeg","type":"image\/jpeg"}],"author":"madison","twitter_card":"summary_large_image","twitter_creator":"@maddiehoulding","twitter_misc":{"Written by":"madison","Est. reading time":"15 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/#article","isPartOf":{"@id":"https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/"},"author":{"name":"madison","@id":"https:\/\/hetneo.link\/blog\/#\/schema\/person\/6c6a683e9a50d03ee7fa5ac6432d56a6"},"headline":"Why Your Internal Linking Test Might Be Wrong: Z-Test Assumptions Explained","datePublished":"2025-12-13T13:22:39+00:00","dateModified":"2026-05-16T12:59:23+00:00","mainEntityOfPage":{"@id":"https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/"},"wordCount":3143,"commentCount":0,"image":{"@id":"https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/#primaryimage"},"thumbnailUrl":"https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/z-test-assumptions-precision-balance-dice.jpeg","articleSection":["Case Studies &amp; Tests"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/","url":"https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/","name":"Z-Test Assumptions for Internal Linking Experiments","isPartOf":{"@id":"https:\/\/hetneo.link\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/#primaryimage"},"image":{"@id":"https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/#primaryimage"},"thumbnailUrl":"https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/z-test-assumptions-precision-balance-dice.jpeg","datePublished":"2025-12-13T13:22:39+00:00","dateModified":"2026-05-16T12:59:23+00:00","author":{"@id":"https:\/\/hetneo.link\/blog\/#\/schema\/person\/6c6a683e9a50d03ee7fa5ac6432d56a6"},"description":"Z-tests assume sample size, normality, and independence. The assumption checks that prevent SEO experiments from producing confident wrong answers.","breadcrumb":{"@id":"https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/#primaryimage","url":"https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/z-test-assumptions-precision-balance-dice.jpeg","contentUrl":"https:\/\/hetneo.link\/blog\/wp-content\/uploads\/2025\/12\/z-test-assumptions-precision-balance-dice.jpeg","width":900,"height":514,"caption":"Laboratory precision scale weighing white dice against metal calibration weights with soft side lighting, shallow depth of field, and blurred calipers and magnifying glass on a clean lab desk."},{"@type":"BreadcrumbList","@id":"https:\/\/hetneo.link\/blog\/why-your-internal-linking-test-might-be-wrong-z-test-assumptions-explained\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/hetneo.link\/blog\/"},{"@type":"ListItem","position":2,"name":"Why Your Internal Linking Test Might Be Wrong: Z-Test Assumptions Explained"}]},{"@type":"WebSite","@id":"https:\/\/hetneo.link\/blog\/#website","url":"https:\/\/hetneo.link\/blog\/","name":"Hetneo's Links Blog","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/hetneo.link\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/hetneo.link\/blog\/#\/schema\/person\/6c6a683e9a50d03ee7fa5ac6432d56a6","name":"madison","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/f4d2520c34ef92cc2328426bfca387d318cbd9a2eec2d15835a67cc4a3414cd7?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/f4d2520c34ef92cc2328426bfca387d318cbd9a2eec2d15835a67cc4a3414cd7?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f4d2520c34ef92cc2328426bfca387d318cbd9a2eec2d15835a67cc4a3414cd7?s=96&d=mm&r=g","caption":"madison"},"description":"Content Manager at Hetneo's Links. Madison runs editorial across the link-building space, auditing campaigns, writing the briefs that keep guest posts from sounding like ad copy, and turning analytics into next month's roadmap. Loves a clean brief, hates a buried lede.","sameAs":["https:\/\/www.linkedin.com\/in\/madisonhoulding\/","https:\/\/x.com\/maddiehoulding"],"url":"https:\/\/hetneo.link\/blog\/author\/madison\/"}]}},"_links":{"self":[{"href":"https:\/\/hetneo.link\/blog\/wp-json\/wp\/v2\/posts\/74","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hetneo.link\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hetneo.link\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hetneo.link\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/hetneo.link\/blog\/wp-json\/wp\/v2\/comments?post=74"}],"version-history":[{"count":1,"href":"https:\/\/hetneo.link\/blog\/wp-json\/wp\/v2\/posts\/74\/revisions"}],"predecessor-version":[{"id":80,"href":"https:\/\/hetneo.link\/blog\/wp-json\/wp\/v2\/posts\/74\/revisions\/80"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/hetneo.link\/blog\/wp-json\/wp\/v2\/media\/70"}],"wp:attachment":[{"href":"https:\/\/hetneo.link\/blog\/wp-json\/wp\/v2\/media?parent=74"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hetneo.link\/blog\/wp-json\/wp\/v2\/categories?post=74"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hetneo.link\/blog\/wp-json\/wp\/v2\/tags?post=74"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}