Split testing has been around a lot longer than websites. The first example of a split test experiment we’ve found was actually pirates (or Dr. James Lind, but it’s cooler if you say pirates) putting limes into the rum on some ships and not on others, so see whether the lime helped prevent scurvy or not.
Control vs Variation
To borrow from this example, pirates kept half their ships as they were. These formed the control group, which exists to measure the performance of the variation against.
The pirates could have just put limes into all the rum on all the ships but then they wouldn’t have been so sure it was the lime that caused any difference in scurvy rate – it could have been the time of year, weather conditions, or the quality of the food or water onboard. By keeping back that control sample, the pirates ensured that they had two statistically similar groups where the only difference was zesty lime.
Providing the number of ships was big enough to split into a greater number of samples, the pirates could also have tested other grog additives (multiple variations) rather than just test Control vs Lime.
Test Run Time
They also had to let the test run for long enough to get a good statistical sample size. After a week, scurvy rates on both the control and lime ships was probably pretty low. There might be a little on either ship because of random variations in the past diet of the sailors, but we’d expect the control and variation to show similar results at this stage. It’s even possible that there would be fewer scurvy-free crewmen in the lime-drinking variation at the start.
With absolutely no vitamin C, it takes about a month of the signs of scurvy to develop, so after that point we’d expect to see the sickness rates of the control and variation ships begin to diverge.
Using a surprisingly well-designed test, the pirates were able not only to reduce scurvy rates amongst all their crews but also to stop wasting time and money on false remedies or additives that didn’t actually impact crew health. They got healthier, more swashbuckling crews with the minimum effort and expenditure.
Split Testing Online
For those digital businesses that don’t have a fleet of pirate ships (or a meaningful percentage of their employees dying of vitamin C deficiency ever quarter), testing online means splitting your web traffic into at least two randomized and statistically similar groups and showing one group the existing version of your site or page (the control) and the other a variation on it.
Performance for both groups is measured and careful tracking ensures that a returning user stays in the right group when they come back. The tests are allowed to run either until a statistically significant conclusion is reached, or it becomes clear that there is no useful difference between the value obtained from each group.