Glossary/Holdout Testing

What Is a Holdout Test? Incrementality Measurement, Step by Step

A holdout test randomly withholds a slice of a marketing channel's traffic as a control group so you can measure the channel's causal lift. Learn how to design, run, and interpret one.

holdouta/b testingincrementalitycausal inferencemeasurement

Definition

A holdout test is a randomized experiment that measures how much a marketing channel causes conversions, as distinct from how many conversions happen through it. The method: assign a random slice of the channel's traffic (say 10%) to a control group that's not exposed to the channel's treatment, compare conversion outcomes between treated and held-out groups, and attribute the difference to the channel itself.

Holdout tests are the gold standard for incrementality measurement. They're functionally equivalent to the randomized controlled trials pharmaceutical companies use to test drugs.

How a holdout works in affiliate marketing

In an affiliate context, the mechanism is simple:

Partner sends traffic. A publisher redirects a customer to the brand through a tracked affiliate link.
Tracker splits traffic deterministically. A hash of the click (visitor fingerprint + timestamp) is compared against the rule's holdout percentage. Clicks in the holdout slice are flagged is_holdout = true.
Everyone still sees the offer. Clicks in both the treated and holdout slices redirect to the advertiser normally. The customer experience is identical. What differs is whether downstream conversions are counted toward the partner's reported performance.
Daily aggregation. Every night, the system compares the conversion rate in the held-out slice to the rate in the treated slice for each partner.
Statistical test. A two-proportion z-test gives a p-value; a Wald 95% CI bounds the lift estimate. If the CI clears zero, you have a significant result.

What makes a good holdout

Randomization must be real

The deterministic-hash approach above ensures every visitor has an unbiased probability of landing in either group. Non-random assignment — e.g., "holdout everyone from California" — is a geo test, not a holdout, and is vulnerable to regional confounders.

The holdout must be blinded from the partner

If the partner knows their traffic is in a measurement test, they'll send their best customers to the treated slice. Holdouts only work when the partner can't see the split.

Volume has to be sufficient

A holdout on 500 clicks a month will never reach significance. Rough rule of thumb for detecting a 10% lift at 95% confidence:

| Baseline conversion rate | Monthly clicks needed | |---|---| | 1% | ~130,000 | | 3% | ~45,000 | | 10% | ~15,000 |

For partners below these thresholds, the holdout will still give you a directional signal, but you'll need to run it longer — and accept that borderline results aren't statistically conclusive.

Holdouts need a clean comparison window

If a partner's creative changes mid-test, or your site does a major redesign, or the customer base shifts (seasonality), the holdout is comparing apples to oranges. Either end the test and start a new one, or segment the analysis.

How to pick the holdout percentage

The trade-off: bigger holdouts reach significance faster but cost more conversions forgone.

5–10% is the standard range
10% is a good default — balanced between speed-to-significance and revenue impact
20%+ only makes sense if you genuinely doubt the partner's contribution and want a fast answer

A 10% holdout on a $100k/month partner means $10k in potential commissions is redirected into measurement. If the test reveals the partner drives zero incremental conversions, you save the full $90k going forward. That's a good trade.

Reading the results

The Incrementality dashboard in most platforms (including Trcker) reports:

Treated CVR — conversion rate among traffic NOT held out
Holdout CVR — conversion rate in the random control slice
Lift % — (treated − holdout) / holdout, positive if the channel is incremental
p-value — probability the observed difference is due to chance
95% CI — range of plausible true-lift values

The decision rule is straightforward:

| Result | Action | |---|---| | Significant + positive lift | Keep the partner — they're driving incremental revenue | | Significant + zero or negative lift | Pause or renegotiate — you're paying for conversions that happen anyway | | Not significant, positive lift | Keep running the test — directionally positive but unproven | | Not significant, negative lift | Consider pausing — at best the partner's neutral, at worst anti-incremental |

Common pitfalls

Peeking early. Looking at results every day and stopping when they look significant inflates your false-positive rate. Commit to a minimum sample size before you start.
Running too many tests simultaneously. If you have 20 active holdouts, you'll see one "significant" result at p < 0.05 purely by chance. Correct for multiple comparisons or stagger tests.
Forgetting about novelty effects. A new creative's first week of performance is never representative. Run holdouts on stable creative, not launches.
Confusing correlation with causation. A holdout measures causation within its sample — it doesn't tell you what would happen if you scaled the channel up 10x or moved budget around. That's a different test.

Holdouts vs. A/B tests

A/B tests compare two treatments — two creatives, two pricing pages. Holdout tests compare one treatment against no treatment. Every A/B test should technically include a holdout as a third arm so you can distinguish "B is better than A" from "both A and B are worse than showing nothing."

Related concepts

Incrementality — the thing holdout tests measure
Attribution — the correlational alternative to causal measurement
Multi-touch attribution — credit distribution across every touch
CPA — cost per acquisition, which looks different once you measure incrementality

Related terms

Incrementality

Incrementality measures the causal impact of a marketing channel by comparing outcomes with and without exposure to it. Learn how it differs from attribution and how to run holdout tests.

Attribution

Attribution is the process of determining which affiliate or marketing channel deserves credit for a conversion. Learn about attribution models and how they affect your affiliate program.

Multi-Touch Attribution (MTA)

Multi-touch attribution distributes conversion credit across every partner that touched a customer journey, not just the last click. Learn the three main models and when to use each.

CPA (Cost Per Action)

CPA is a pricing model where you pay affiliates a fixed amount for each completed action like a sale, signup, or lead. Learn how to set the right CPA for your program.

EPC (Earnings Per Click)

EPC measures how much revenue each click generates on average, helping you evaluate affiliate and offer quality at a glance. Learn how to calculate EPC and use it to optimize your program.

See holdout testing in action

Trcker handles this automatically. Set up your program in 5 minutes.

Get Started Free →

← Back to glossary