A/B Test Sample Size Calculator
Plan email A/B tests properly. Enter your baseline rate (open or click), the lift you want to detect (your MDE), and your statistical power + confidence interval levels - the calculator returns the minimum sample size per variant. The math is the standard two-sided test for two proportions. Free, no signup, instant.
How the math works
The calculator uses the standard two-proportion z-test:
n = (z_alpha + z_beta)^2 * (p1*(1-p1) + p2*(1-p2)) / (p1 - p2)^2
Where:
p1= baseline rate (e.g. 0.22 for 22% open rate)p2= lifted rate (e.g. 0.242 for +10% relative lift on 22%)z_alpha= critical value at confidence level (1.96 for 95% two-sided, 1.645 for 95% one-sided)z_beta= critical value at power (0.84 for 80%, 1.28 for 90%)n= required sample per variant
Total list size = n × number of variants. Days needed = total / daily send capacity (if you set one).
FAQ
What does the calculator compute?
Minimum sample size per variant to detect a given lift in a binary outcome (open or click) with given statistical power and confidence. Standard z-test for two proportions.
What is statistical power?
Probability of correctly detecting a real lift when one exists. Standard is 80% - if your variant is actually better, an 80%-power test will detect it 80% of the time. Higher power reduces false negatives but needs a larger sample. Below 70% is generally too unreliable.
What is confidence level?
Probability of NOT incorrectly declaring a winner when no real difference exists. 95% (5% false-positive rate) is standard. 99% (1% false-positive) is stricter and needs more sample. For email, 95% is appropriate.
One-sided or two-sided test?
One-sided tests for a lift in a specific direction. Two-sided tests for any difference. One-sided needs fewer samples but only lets you conclude in the predicted direction. Use one-sided when you have a clear hypothesis.
Why do small lifts need huge samples?
Sample size scales as 1/(lift)^2. Detecting a 1% lift takes 4x as many subscribers as a 2% lift. The signal gets buried in random variation. With 10,000 subscribers, you may not be able to reliably detect anything smaller than a 5% absolute lift on a 22% baseline.
Absolute vs relative lift?
Absolute lift is in percentage points: 22% + 2pp = 24%. Relative lift is in percent: 22% × 1.10 = 24.2%. They diverge at low baselines. Click rates are usually framed in absolute; open rates in relative.
Can I peek at results early?
Peeking inflates false-positive rate. The math assumes you wait until the planned sample size, then compute once. For the ability to stop early, use sequential frameworks like AGILE or Bayesian methods - the math is different.
What happens after I have the sample size?
Send to a random subset with at least N per variant. Wait for the open/click window (48-72h for opens, longer for clicks). Compute actual rates. Run a z-test for two proportions. If p < (1 - confidence), you have a winner. If the test is on subject lines specifically, draft both variants through the Subject Line Analyzer first so each candidate passes the basic hygiene checks (length, spam triggers, CAPS, emoji) before you commit a sample-size budget to it.
Run real A/B tests in MiN8T
MiN8T's editor includes built-in subject-line A/B testing, send-time optimization, and content variants. Plan with this calculator, execute in the editor.
Peak into MiN8T Editor →