What We Measure
We choose one primary metric that represents business value, revenue per visitor or revenue per session on the tested journey, plus guardrails like error and payment-failure rate, basket value and exposure health, so a win in one place can't quietly damage another. Before a test starts we fix five things in writing: the primary metric, the guardrails, the baseline conversion rate, the minimum detectable effect we'd actually act on, and the decision rule (typically 95% significance and 80% power). We respect the weekly business cycle, running for at least seven days and often two full weeks, because small effects are expensive to detect: as a rule of thumb, halving the effect you want to find roughly quadruples the sample needed, and detecting a 2% revenue change can require around 100,000 users per variant. A result is only a real win if the primary metric moved, revenue per visitor moved with it, the guardrails held, and it's still visible in the measurement surface you rely on.