Suhel Khan Web Analytics Blog: The ABCs of A/B Testing

My friend Cindy has taken up hiking, and, being a shoe kinda gal, she has taken an intense interest in hiking boots. She's the first one who will tell you, "They may be a dream on the shelf, but you really have no idea how they're going to work for you until you start trail testing them."

You could say the same thing about ebusiness. When you start doing business online, you enter the realm of "trail testing." And here is where you discover that the practical work of managing any online enterprise has only just begun. Because no matter how thoroughly you think you've covered your bases in the development phases of your Web site or your email campaigns, you are about to discover countless opportunities for improvement.

How do you take that first step in monitoring your performance based on all the little tweaks and fidgets you think will help? Ah, it's as simple as A, B ... and no C!

Maximizing your conversion rate is not simply a matter of making changes, it's about making

a) the right changes,

b) at the right time,

c) in the right sequence, and then

d) evaluating the results before continuing the process.

If you are not methodical in your approach to change, much of your effort will be wasted. So take your time, and keep these guidelines in mind as you test:

Always clarify in your mind what you are testing and how you are going to measure and interpret the results before you begin. You cannot measure success unless you know exactly what you are measuring (and naturally I hope you have a clear idea of what constitutes success).
Your test groups should be of similar size.
After the first test, you should always test against a control. The first time out, you are testing two unknowns, and you won't be able to determine the better option until the results are in. Once they are in, then you have a benchmark, or control, against which you can measure subsequent changes.
Remember you can accurately test only one element at a time. Even if they all seem necessary, changes need to be made individually so you can track effectively the result of the change. While you might institute several changes and see an improvement, it could be a "net" improvement - that is, a 5% improvement could be the result of change #1 helping 10% while change #2 actually hurt by 5%. If you make one change at a time and then discover it doesn't help, it's easier to back up and try something else.
If you are testing emails, send your test emails simultaneously to eliminate the timing variable. Ideally, at the same time, you want to test your Web site as well. Software platforms like Optimost and Inceptor can help with this.

A/B testing - sometimes it's called an A/B split - is the simplest and easiest method of testing elements in your emails or on your Web site. You divide your audience into two groups. You expose one group to the original version of whatever you are testing. You expose the other group to an alternative version, in which only one element has been changed. Then you track the results.

For example, suppose you want to figure out the best subject line for your promotional email. Prepare two separate emails, identical except for the subject line. The email with the first subject line goes to half your list, while the email with the second subject line goes to the other half.

To gauge the effectiveness of your subject line, compare the open rates between the two groups. Once you find your winner, it becomes your control or benchmark. Test it against another subject line, and so on, until you are convinced you've found the best possible subject line for your messages. And the results may surprise you. In one of our testing scenarios, one subject line generated an open rate 300% higher than its closest competitor. A difference like that can have a tremendous impact on your bottom line!

What should you be testing? It's easier to make a list of what you don't need to worry about, because that would be a blank sheet of paper. But here are some possibilities to get you started:

Emails: bonus gifts, coupons, P.S. messages, guarantees, opening sentence image, closing sentence image, from-field, calls to action, opening greetings, type styles, layout elements, graphic images, etc.

Web Sites: landing pages, language of copy (headings, body, calls to action, assurances), colors, location of elements, look/feel, hyperlinks, etc.

A/B testing is far from rocket science (and there are other more complicated and robust ways to test), but it has a sweet advantage: it isn't complicated. More importantly, it means you won't have to make potentially expensive decisions based on your gut reaction. You'll know, because you can say, "Here, look at the numbers!"

, multivariate testing is a process by which more than one component of a website may be tested in a live environment. It can be thought of in simple terms as numerous split tests or A/B tests performed on one page at the same time. Split tests and A/B tests are usually performed to determine the better of two content variations, multivariate testing can theoretically test the effectiveness of limitless combinations. The only limits on the number of combinations and the number of variables in a multivariate test are the amount of time it will take to get a statistically valid sample of visitors and computational power.

Multivariate testing is usually employed in order to ascertain which content or creative variation produces the best improvement in the defined goals of a website, whether that be user registrations or successful completion of a checkout process (that is, conversion rate). Dramatic increases can be seen through testing different copy text, form layouts and even landing page images and background colours.

Testing can be carried out on a dynamically generated website by setting up the server to display the different variations of content in equal proportions to incoming visitors. Statistics on how each visitor went on to behave after seeing the content under test must then be gathered and presented. Outsourced services can also be used to provide multivariate testing on websites with minor changes to page coding. These services insert their content to predefined areas of a site and monitor user behavior.

In a nutshell, multivariate testing can be seen as allowing website visitors to vote with their clicks for which content they prefer and will stand the most chance of them proceeding to a defined goal. The testing is transparent to the visitor with all commercial solutions capable of ensuring that each visitor is shown the same content as they first saw on each subsequent visit.

Some websites benefit from constant 24/7 continuous optimization as visitor response to creatives and layouts differ by time of day/week or even season.

Multivariate testing is currently an area of high growth in internet marketing as it helps website owners to ensure that they are getting the most from the visitors arriving at their site. Areas such as search engine optimization and pay per click advertising bring visitors to a site and have been extensively used by many organisations but multivariate testing allows internet marketeers to ensure that visitors are being effectively exploited once they arrive at the website.

We're thrilled that Google's getting into the testing game with a new service called Website Optimizer. The service will be gratis to all Google advertisers. Just as Google Analytics had a major effect on how e-tailers viewed analytics, so this service will open the world of testing to a much broader audience. Testing is more action-oriented (and should therefore appeal to even more people) than straight analysis. However, some level of analysis is still required.

This is long overdue. We were fortunate to be an early beta tester of the system and are impressed with several features.

Making a decision to test is simple. But making that decision alone won't deliver better online results.

In over 10 years of optimizing sites for our clients, we've identified over 1,100 factors that contribute to a customer's ability to successfully complete a single conversion funnel.

Multiply that by the number of campaigns, offers, products, keywords, visitor motivations, visitor types, and several other elements and the number of contributing factors becomes astronomical. When you consider most of these factors as potential variants to test and optimize for, you must conclude determining what and how to test campaigns for maximum return takes plenty of thought, planning, and effort.

Fortunately, not all factors are equal in their ability to drive success. There are many things you can do to stack these factors in your favor.

If you're new to testing, begin with A/B testing rather than multivariate. Although it may feel limiting and takes more time, you're likely to get more sound scientific data with which to determine your optimization efforts. It also allows you to gain experience testing with proper methods.

Testing is a science, not an art. Unlike the intuitive creative process, a test must conform to well-established scientific design to be truly effective. As I've stated before, most of today's so-called A/B and multivariate tests are nothing more than marketers throwing landing page, banner ad, and AdWord variations at a wall to see what sticks, wasting valuable time and money with little to no conversion increases to show for it. A key to avoiding that is to have a better handle on "fitness factors." In his A/B testing whitepaper our CTO, John Quarto-vonTividar, explains:

Let's say we want to determine whether Nolan Ryan is a better baseball player than Homer Simpson. How should we proceed? First, we might set a metric for what we mean by a "better" baseball player. We can measure evidence in concrete ways, noting the two subjects' different batting averages or RBIs or the like. What we're searching for is the right metric -- a formula that would lead us to a correct decision. Such a formula is more precisely termed a "fitness function."

We might decide that considering indirect evidence will lead us to a better decision than comparing pure statistics. In that case, our fitness function may involve such things as the difference in salary paid for services or a comparison of the prices paid for our subjects' autographs on eBay.

In virtually all such measures, Nolan is the better candidate. If you were choosing a player for your team, you'd certainly pick Nolan; you can be confident you've made the correct decision.

But let's think on that a moment: the reason you feel confidence in signing Nolan stems from your familiarity with the metric and fitness function that are implicitly applied when we speak of baseball. Your decision might be quite different if we want to pick an effective donut quality assurance taster. Suddenly, Homer Simpson is back in the running.

Even then, your confidence may be based on your understanding that "tastes better" is the donut metric and that Homer Simpson is an acknowledged expert in donut consumption. But what is the fitness function? That is, what does it mean to "taste better"? Are you relying solely on Homer's reputation as an expert? But his expertise is based on consumption quantity so perhaps you suspect he enjoys all donuts equally and actually has little, uh, "taste" at all. In other words, it's quite possible you don't have any knowledge at all of what we might call the "donut tastiness" fitness function.

Interestingly, marketers and business owners are asked -- every day -- to make more important decisions with less information with an undetermined fitness function.

More formally, A/B Testing first requires a metric be identified (that is, "what will be contrasted?"). Second, a fitness function describing that metric is agreed upon ("how will we measure and contrast the differences?"). And third, an optimization step where the system is tweaked based on comparison of exactly two tested solutions, which differ in only one respect of how they meet the fitness function.

Here are few steps, then, that will keep you from merely throwing stuff at the wall.

Define the Conversion Goals

The fitness factors that will determine a successful campaign:

Is it a lead generation campaign?
Is it a product-specific purchase?
Does it generate qualified traffic for on-site purchase?
Does it generate traffic for a self-service or subscription service?
Is it an online service signup or event registration?
What micro conversions are needed to reach a macro conversion?

Know Your Customers

How many different types of personas will participate in this campaign?
What type of decision maker are they: methodical, spontaneous, humanistic, or competitive?
Where are they in the buying cycle?

Do the Creative

Create the pages, PPC (define) ads, e-mail, or ads (online and off-) that drive your prospects to your landing pages. Always try having someone different than you normally would use for a portion of the variations. Ask:

Do you have the correct messages for each profile type and her motivation? You will likely need more than one.
Do you have the correct message for each stage of the profile's buying cycle? Is she early in the buying process, in the middle, or ready to buy?

Test and Optimize

Thankfully, the financial costs related to the software side of A/B and multivariant testing are about to be out of the way. The only remaining costs are the creative variations. They're affected by how much you spend thinking in the first two steps. More time thinking usually means less time coming up with meaningless variations.

A lot of the testing is done by producing the creative first while skimming (sometimes skipping) over step two and poorly identifying goals and fitness factors. Don't make the same mistake.

Suhel Khan Web Analytics Blog

Thursday, March 13, 2008

The ABCs of A/B Testing

No comments:

crazyeggcode

Bookmark

monitorus

Blog Archive

About Me

My favourite blogger

FEEDJIT Live Traffic Feed

HitTail

Stats