A/B Test Sample Size Calculator
Plan email A/B tests properly. Enter your baseline rate (open or click), the lift you want to detect, and your power + confidence levels - the calculator returns the minimum sample size per variant. The math is the standard two-proportion z-test, the same one used by Optimizely, VWO, and Evan Miller's calculator. Free, no signup, instant.
How the math works
The calculator uses the standard two-proportion z-test:
n = (z_alpha + z_beta)^2 * (p1*(1-p1) + p2*(1-p2)) / (p1 - p2)^2
Where:
p1= baseline rate (e.g. 0.22 for 22% open rate)p2= lifted rate (e.g. 0.242 for +10% relative lift on 22%)z_alpha= critical value at confidence level (1.96 for 95% two-sided, 1.645 for 95% one-sided)z_beta= critical value at power (0.84 for 80%, 1.28 for 90%)n= required sample per variant
Total list size = n × number of variants. Days needed = total / daily send capacity (if you set one).
FAQ
What does the calculator compute?
Minimum sample size per variant to detect a given lift in a binary outcome (open or click) with given statistical power and confidence. Standard z-test for two proportions.
What is statistical power?
Probability of correctly detecting a real lift when one exists. Standard is 80% - if your variant is actually better, an 80%-power test will detect it 80% of the time. Higher power reduces false negatives but needs a larger sample. Below 70% is generally too unreliable.
What is confidence level?
Probability of NOT incorrectly declaring a winner when no real difference exists. 95% (5% false-positive rate) is standard. 99% (1% false-positive) is stricter and needs more sample. For email, 95% is appropriate.
One-sided or two-sided test?
One-sided tests for a lift in a specific direction. Two-sided tests for any difference. One-sided needs fewer samples but only lets you conclude in the predicted direction. Use one-sided when you have a clear hypothesis.
Why do small lifts need huge samples?
Sample size scales as 1/(lift)^2. Detecting a 1% lift takes 4x as many subscribers as a 2% lift. The signal gets buried in random variation. With 10,000 subscribers, you may not be able to reliably detect anything smaller than a 5% absolute lift on a 22% baseline.
Absolute vs relative lift?
Absolute lift is in percentage points: 22% + 2pp = 24%. Relative lift is in percent: 22% × 1.10 = 24.2%. They diverge at low baselines. Click rates are usually framed in absolute; open rates in relative.
Can I peek at results early?
Peeking inflates false-positive rate. The math assumes you wait until the planned sample size, then compute once. For the ability to stop early, use sequential frameworks like AGILE or Bayesian methods - the math is different.
What happens after I have the sample size?
Send to a random subset with at least N per variant. Wait for the open/click window (48-72h for opens, longer for clicks). Compute actual rates. Run a z-test for two proportions. If p < (1 - confidence), you have a winner.
Run real A/B tests in MiN8T
MiN8T's editor includes built-in subject-line A/B testing, send-time optimization, and content variants. Plan with this calculator, execute in the editor.
Open MiN8T Editor →