Or, peeling back the layers of marketing to understand when A/B testing does and doesn’t make sense for an organization.
A/B testing, multi variate testing (broadly, experimentation) originate in the world of physical sales. Slightly different paper catalogs were distributed in a geographical area and sales of different items were compared at the end of the season. It was a basic application of the scientific method to an infamously unscientific pursuit - sales.
The technology to conduct experiments has rapidly changed, even in the last 10 years. We’re now sitting on an advertisement apparatus more extensive in reach than any other time in history, and as a result companies know more about their audiences than ever before. But when you step back you can see the bulk of innovation in A/B testing has been in scale and speed. You can reach more people, and determine impact, faster and faster.
But the underlying statistical methodologies are largely the same. Confidence intervals were first formalized in the early 20th century; sequential hypothesis testing was developed during WW2.
That all means the fundamental limitations haven’t changed, no matter the platform, no matter the era. You still need lots of samples to have confidence in your conclusions. Your samples need to be taken randomly, and reflect the properties of the population to which they belong.
In the modern world, samples = traffic. Organic or part of a campaign, it doesn’t matter. So what happens when your org starts trying A/B testing but doesn’t really have the traffic? In short, you’ll get lots of “inconclusive” results. And in an effort to find value in this expensive tool you’ll be tempted to look at the directional change of (the primary metric of) your treatment and say “well it looks like an improvement.” What you’re doing is peeking at results, which is a well-understood cause of biased and invalid experiments.
I don’t mean to sound extreme, but really look closely at your product’s traffic before you sign any contracts to use an A/B testing platform. When your scale is small, you’ll benefit more from a culture of writing design documents than automated experimentation. That way you’re getting the benefits of a regimented process (documenting your product hypotheses and justifications, explaining conclusions drawn) but not heading in spurious directions because of a tool your org isn’t scaled to use.