Convert Blog: Learn A/B Testing, Optimization & Marketing

How to Detect and Tackle Perverse Incentives in A/B Testing: Prioritizing Rigor and Process

Uwemedimo Usa — Thu, 28 Mar 2024 18:07:11 +0000

Have you heard the story of the sea captains who killed their passengers?

In the 1700s, the British government paid captains to transport prisoners to Australia. To cut costs, the captains provided inadequate food and care, so only a few passengers arrived in Australia alive.

The survival rate rose dramatically once the government shifted the incentive, paying captains only for convicts who safely walked off the ship into Australia.

Incentives matter. And just as they once determined life and death at sea, perverse incentives can poison even the noblest of pursuits.

The Embassy of Good Science discusses scientists who, when pressured to publish attention-grabbing results for grants and promotions, may exaggerate findings or omit contradictory data. This is where the term “perverse incentives” has its roots.

In conversion rate optimization, we can observe this when optimizers are gunning for ‘wins’ at all costs, often disregarding rigor, process, and the deeper insights experimentation should provide.

There are two types of perverse incentives:

Institutional perverse incentives: Results from years of bad practice leading to a poor culture that’s unconducive to a learning-driven scientific process and cares only about the flashy outcomes.
Inadvertent perverse incentives: Incentives unintentionally woven into the test design that may influence audience behavior and skew results.

Both types demand our attention. The problem lies in how success is often measured.

Whenever we choose a metric, we also need to have a clear objective of what the metric is trying to accomplish. We cannot forget that given a target, people will find the easiest way to get to it. Choosing metrics that are difficult to trick will help [us] get the right outcomes from our investments. We also need to have the policy that for every project that is intended to influence some metrics, we should review if the project’s impact is still aligned with the goal of the product.

Juan M. Lavista Ferres, CVP and Chief Data Scientist at Microsoft in ‘Metrics and the Cobra Effect’

Easily manipulated metrics create fertile ground for perverse incentives to flourish.

This fact, coupled with the looming and obvious presence of monetary considerations, makes Conversion Rate Optimization the ideal stage for perverse incentives to invade.

That is why we say…

You Can’t Eliminate Perverse Incentives From “CRO”

Reason number one is in the name—Conversion Rate Optimization. By design, the focus is on improving conversion rate; a metric that can quickly be taken as a proxy for the health and success of a business. Which is too much of an incomplete picture to judge the health of a business.

Despite this, some consider conversion—micro or macro—proof that they can effectively nudge web traffic to behave in a certain (desirable) way.

But according to Goodhart’s Law, “When a measure becomes a target, it ceases to be a good measure’’ (Mattson et al., 2021).

An example is when surgeons are judged solely based on patient survival rates. They might avoid high-risk but necessary operations, leading to worse overall health outcomes.

And this is what happens if testers obsess—often myopically—over conversion rates.

Craig Sullivan shares an example of how trying to increase conversion rates without context is perverse in and of itself. This is made worse by granular goal selection, shoddy processes, and reporting misdemeanors, which are the classic symptoms of induced perverse incentives in business experiments.

Jonny Longden adds: No one can guarantee improved conversion rates unless they are the CEO!

This, too, is in jest. No one can guarantee more registrations, transactions, or revenue.

The second half of the perverse incentives puzzle involves money—funding, buy-in, salaries, fees, performance-based commissions.

That presents two issues. First, conversion rate optimization focuses on a downstream metric that a single person can’t predictably influence. Second, the folks who claim to improve conversions are often paid based on how well they do so.

A/B testers are incentivized to show results and show them quickly. If 80% of the tests they run come back inconclusive or the test version performs worse than the default—which are typical scenarios—the people budgeting for the testing program might not want to continue it. Similarly, if the testing team tells their bosses that the test will take eight months, and then it needs to be replicated for another six months before determining the validity of the hypothesis, their bosses might scrap the program completely.

Stephen Watts, Sr. SEO and Web Growth Manager at Splunk

Businesses are driven by revenue, and there’s nothing wrong with that. More conversions often translate into more revenue.

But this has created an environment where CRO providers who charge a percentage of the money made through their optimization efforts flourish. They come off as more confident, less of a cost center, and easier to onboard (because they’re an easier sell to skeptical CFOs). A much safer ‘bet’ for the business.

This perpetuates a dangerous cycle. CRO becomes inextricably linked to a single outcome despite its actual value in the breadth of insights it can provide. And that’s just how things are right now.

The Psychology Behind Perverse Incentives

People’s tendency to choose an easier path toward a target can be understood through the lens of self-enhancement, which refers to individuals’ motivations to view themselves positively and validate their self-concept.

Research suggests that individuals may prioritize more accessible options for self-enhancement. However, what is interesting about this in an organizational context is that people are very sensitive to group expectations of self-enhancement (Dufner et. al., 2019).

This explains why different companies or even teams within them can have very different approaches to self-enhancement and, therefore, incentives!

Interestingly, the pursuit of self-enhancement has been associated with short-term benefits but long-term costs. Those who engage more in self-enhancement activity may initially make positive impressions but may experience decreasing levels of self-esteem (Robins and Beer, 2021)—just like the pursuit of test ‘wins’ can harm your long-term strategy. It feels good now, but not so much later. Perverse incentives!

So, perverse incentives in conversion rate optimization programs aren’t a question of “If.” The real questions are: How and to What Extent?

How Perverse Incentives Are Introduced in Your Program

Surveys conducted between 1987 and 2008 revealed that 1 in 50 scientists admitted to either fabricating, falsifying, and/or modifying data in their research at least once.

Sadly, this sobering reality isn’t confined to laboratories and peer-reviewed journals. No one, not even well-intentioned CRO practitioners, is above succumbing to the siren call of perverse incentives.

Here’s how they can creep into your experimentation program:

Testing to Win

Perverse incentives take root where success is always defined as a ‘winning test.’

We’ve already spoken at length about this phenomenon. Teams that view A/B testing in CRO programs as customer research or a chance to learn what matters to audiences are far more likely to develop innovative ideas that solve real problems.

They lay the foundation for strategic, iterative testing where cumulative efforts far outweigh one-off haphazard guesses that can’t build momentum to that elusive big win.

Success in experimentation follows the rule of critical mass, i.e., you must hit a threshold of input (perseverance) before expecting any degree of change (commensurate output).

A/B testers who test to earn units of information from the result, whether win, loss, or inconclusive, tend to exhaust all the possibilities around an insight (hypothesis) before moving on to the next shiny object.

Conversely, testing just to win defeats the whole point of experimentation—which is gathering evidence to establish causality and gauge the potential superiority of interventions. It sacrifices long-term transformation for short-lived gains, leaving companies shy of real transformation. And it plays out in many different ways:

1. Focusing on short-term gains

When you test to win, you naturally gravitate towards ideas and concepts that are easier to impact.

For example, you might want to test introducing a higher discount.

It is tenable that the treatment will win and cart abandonment will be reduced. Depending on your margin, you may be giving away profit that you just can’t afford to. The test won, but your overarching goal—to help the business deepen its pockets—wasn’t achieved! Plus, you learned nothing; discounts are proven to move the needle.

2. Cherry-picking data and fudging reports

The looming specter of failure tests an optimizer’s integrity. It goes something like this:

Winning is the only acceptable outcome. If you don’t keep up a steady supply of wins (i.e., improved conversions because the metric is the king), then your contribution to the business will be questioned.
9 out of 10 ideas fail to change user behavior significantly. So you will see a lot of red across the board.
When the defining KPI suffers, you find succor in the arms of hypersegmentation. Overall, the test did lose, but it outperformed the control for users on an iPad.
Does this even matter? If the insight fuels an iteration and analysis of the problem the treatment solved for iPad users, then you are thinking like a scientist. But if you wish to use the finding to label your test a winner, you are grasping at straws.
The ordeal doesn’t end there. Stats are a slippery slope, and the concept of “adequate power” isn’t widely understood. Often, testers celebrate “wins” that are clearly underpowered.

3. Running “safe” tests

The explore vs exploit debate is settled. Healthy experimentation programs maintain a balance between the two. In some cases, you may want to maximize the learning, while in others, going for immediate impact makes sense.

But the win-or-die mentality warps this dynamic. The desire to please stakeholders or avoid ‘failure’ often translates into an over-reliance on well-trodden ‘patterns.’ While not a huge red flag, this approach signals missed opportunities. Call it an orange flag.

Remember the example about surgeons? The one where surgeons who are judged by the survival rates of their patients feel compelled only to conduct safer surgeries, ignoring patients who need riskier operations? This is like that.

Yes, truly novel ideas carry higher risks, but they also open up new learning avenues.

4. Bypassing ethics

Another tell-tale sign that you’re too focused on the win is when ethics are getting compromised. One excellent example recounted in Goodhart’s Law and the Dangers of Metric Selection with A/B Testing is Facebook.

Their relentless focus on growth metrics, like active users and engagement, directly shaped their product and company culture. Teams tied to these KPIs were incentivized to boost those numbers at any cost.

This single-minded pursuit created a blind spot–vital concerns like the platform’s negative impact on society are neglected or actively downplayed.

In such an environment, ethical considerations take the backseat. The growth mandate often takes precedence even when internal research teams flag potential harm.

Sadly, we see unethical practices like p-hacking emerge as a grim extension of this ‘ends-justify-the-means’ mindset. Manipulated data and fudged reports become tools used to perpetuate the illusion of success and stave off scrutiny of a warped system.

Putting Mindless “Productivity” on a Pedestal

In experimentation, mindless productivity means running tests without a clear goal, neglecting long-term strategy, generating inadequate learning, and believing that the more tests you run, the better your CRO program is.

But what about Google, Booking.com, Microsoft, Netflix, and Amazon, which run thousands of tests each year? They have the most developed experimentation programs in the software industry, right?

It’s deeper than that.

Yes, test velocity is a legitimate measure of your program’s maturity. But only when you look at it in the context of quality parameters like:

Learning per test: How many experiments yielded valuable knowledge that informs future iterations or broader strategy?
Actionable insights: Did the learnings translate into concrete changes that drive business results?
Process optimization: Did testing reveal any bottlenecks or areas for improvement?

Shiva Manjunath shares a great example in this podcast episode, “If you run 50 button color tests a week, did you add value to the program?”

From AZ Quotes

You did increase your velocity, and you can brag about the metric. But you did not improve anything of import, nor did you generate much learning.

When the quantity of tests is one of the main KPIs for a program, then the focus could be on doing a lot of experiments quickly, which might mean that they’re not well thought out or done carefully.

For example, the discovery/insight research could be rushed or incomplete. The ideation regarding the solutions as well. The outcome can be an inconsistent hypothesis/success KPI choice regarding the initial problem. When we do not take enough time to define the problem and explore a wide range of possible solutions, we can easily miss the point, only to launch as many campaigns as possible. This can also lead to partial dev or QA to ensure the test works properly.

Laura Duhommet, Lead CRO at Club Med

This is not to say test velocity is a lousy metric. No. We’re all familiar with Jeff Bezos’ quote, “Our success at Amazon is a function of how many experiments we do per year, per month, per week, per day…” Rather, test velocity makes more sense when weighed against the rate of insights generated.

Because as a measure of work done, not efficiency, test velocity alone is a vanity metric.

Matt Gershoff discusses how test type and design are often manipulated to inflate program output, rendering test processes ritualized for busyness and not necessarily for efficiency.

Another take on this, from the realm of product experimentation and development, is something Jonny Longden recently addressed in his newsletter.

Product teams are often measured solely on their ability to continuously deliver and ship features, but then they are judged based on the outcome of those features. This ends up with creative analytics, designed to show success from everything that is delivered, otherwise known as SPIN. This means they can meet their targets to deliver without bearing the consequences of the wider judgment.

Via The Journey Further newsletter

Instead of focusing on increasing the number of tests you run, focus on increasing your velocity of learning. This may mean fewer tests, but if it means you’re learning faster, you’re on the right track.

11 Testers Share Their Clashes with Perverse Incentives

To better understand the ways warped metrics and misaligned goals distort the essence of experimentation, we’ve gathered insights from eleven testers. And we’ve grouped them under the two main types of perverse incentives:

Culture-Driven Perverse Incentives

Based on a deep-rooted ‘win-at-all-costs’ culture, where success is big lifts, this perverse incentive can pressure testers to compromise ethics and deprioritize rigor and process.

1. Focusing on uplift while ignoring critical metrics

This is common when conversion rates are viewed in isolation from other business health and success metrics. If this is present, something similar to the Facebook story above can happen.

The consequences can be so detrimental that they negate any benefits you gained from the uplift.

One notable example was a situation where the team was incentivized purely based on the uplift in conversion rates, ignoring other critical metrics like long-term customer value and retention. This led to short-term strategies that boosted conversions but had negative long-term impacts on customer satisfaction and brand reputation.

Tom Desmond, CEO of Uppercut SEO

2. Favoring quick success over a thorough learning process

Stephen Watts, Senior SEO and Web Growth Manager at Splunk, has witnessed the dangers of chasing immediate results in A/B testing. He highlights a common scenario–the premature declaration of victory.

The most common perverse incentives for A/B testing programs are (1) showing success and (2) doing so quickly. It’s simply the nature of A/B testing that most tests fail. Even for large programs with teams of data scientists and loads of traffic to test, most ideas for improving conversions and user behavior fail the test for mature websites.

It’s frequent to see A/B testers stop their test too early, as soon as their tool tells them they’ve got a positive result. By stopping early, they fail to learn that the test version would eventually normalize to the baseline level, and there isn’t a positive result.

Stephen Watts

Often, due to being pressured to secure A/B testing buy-in, testers feel a fundamental tension between rigorous experimentation and the need for quick, desirable results.

A team often tries to prove that an individual’s pet idea is a success. Whether it’s the idea of a HIPPO, other executive or the A/B testing team — people’s egos and internal politics often cause them to want to see specific results in the data, even if they aren’t really there.

A single A/B test can take months or years to achieve a result on many websites. This is even before replicating the results. Practically, no businesses are happy to sit around conducting a science experiment for a year just to find out if their conversions have slightly increased — all the while being forced to freeze the content and features on the tested areas. Not only does it make little business sense, but the results of such an extended test usually can’t be trusted because of the wide variety of outside factors impacting the traffic during that long single-test duration.

Stephen Watts

Inadvertent Perverse Incentives

While culture-driven perverse incentives in CRO are often blatant, the inadvertent ones can be most insidious.

The very design of your experiments can influence participant behavior in a way that challenges the integrity of your experiments. Here are some examples testers have observed:

1. Using tactics that attract the wrong audience or encourage superficial gains

Seemingly “good ideas” within experiments might generate undesirable behaviors that actively harm your goals.

Carrigan recounts a prime example of this phenomenon:

I’ve witnessed some real doozies in our A/B testing for marketing. The one that stands out to me most is when we offered free appraisals to generate leads, but homeowners who just wanted a freebie choked the funnel and displaced serious customers. I’ve talked with real estate agents whose use of certain Luxe language inflated bounce rates — the consensus here seemed to be that gourmet kitchens and granite countertops will attract affluent viewers. Still, it can also scare off middle-income buyers and, in turn, run the risk of shrinking your target audience.

Ryan Carrigan, Co-Founder of moveBuddha

‘Successful’ tests tend to boost your KPI, but they might mask deeper problems proudly sponsored by perverse incentives.

Like many brands, a brand I worked with offered a sign-up bonus to new customers to lure them into our app. Our sign-up rates increased exponentially, but the engagement levels were unaffected. Those sign-ups weren’t interested in our product but only in the offered bonus. We detected this discrepancy by regularly using data analytics and checking our engagement rates instead of the externally visible metrics.

We had to change our tactics to eliminate this issue. We associated the sign-up bonus with our desired action on the app: customers purchasing products through our platform. This approach incentivized potential customers who were not a part of our ecosystem while keeping away disinterested ones.

Faizan Khan, Public Relations and Content Marketing Specialist, Ubuy Australia

Andrei Vasilescu shares a personal example that highlights what happens when goals and testing methodology misalign:

We once tried a pop-up offering a $10 discount to get people to sign up for our newsletter. It worked too well—we got lots of fake email addresses because people just wanted the

discount, not our emails. It was like giving away treats for pretending to play guitar!

Andrei Vasilescu, Co-Founder & CEO of DontPayFull

2. Focusing on short-term engagement metrics over long-term user value

Metrics like time-on-site can give an illusion of higher user engagement. However, as Michael Sawyer notes, these metrics don’t translate into customer satisfaction:

A major example of this is tests performed by travel agencies where different customers are presented with destination options, where one group can be shown more choices than the other. […] While sometimes beneficial for data, this overabundance of choice yields poor user experience and customer loss. […] Stakeholders need to realize that while this strategy may offer short-term gains, it’s harmful to the business’s long-term health.

Michael Sawyer, Operations Director, Ultimate Kilimanjaro

Ekaterina Gamsriegler explains why such metrics can be misleading and produce a negative overall outcome:

Focusing solely on increasing user engagement metrics [with emphasis] on ‘depth,’ such as app session length, can lead teams to prioritize features that keep users in the app longer without necessarily enhancing the overall user experience. That’s why it’s crucial to delve into session data and conduct UX research to ensure that extended app sessions stem from genuine user value, not confusion or design issues. It’s also crucial to monitor the ‘breadth’ aspect of user engagement, e.g., average amount of sessions per user, app stickiness, long-term retention, etc.

Ekaterina Gamsriegler, Head of Marketing, Product Growth at Mimo

In Khunshan’s case, they introduced a gamification feature—virtual badges earned for sharing articles. However, the strategy quickly revealed its dark side:

We aimed to enhance user engagement on a news website through an A/B testing program. Introducing a feature where readers could earn virtual badges for extensive article sharing resulted in a surprising boost in engagement metrics. Users enthusiastically competed to collect more badges.

However, this apparent success came with unintended consequences, including a decline in the overall quality of shared content. Some users prioritized badge accumulation over sharing relevant or accurate information.

Khunshan Ahmad, CEO of EvolveDash Inc.

And here’s Ekaterina again with the analysis of gamifying your metrics, especially when those metrics are too far removed from their core contribution to your product’s success.

Gamification and instant gratification and popular ways to improve user activation and get users to the a-ha and habit moments. At the same time, when experimenting with adding too many gamification elements, one can find users focusing more on earning rewards rather than engaging with the app’s core functionality (e.g., learning a language), which can undermine the app’s original purpose.

Ekaterina Gamsriegler

But it doesn’t end there. Another way testers sacrifice long-term business health for short-term gains is by chasing click-through rates without considering their impact on conversion quality.

It’s essential to experiment with creatives that are relevant for your target audience and the context that the audience is in. However, making CTR the main KPI and pursuing an increase in CTR can lead to misleading ads with the primary focus on getting users to click on the ad. This can lead to a degradation of trust and user experience.

Ekaterina Gamsriegler

Jon Torres has seen this happen, and the outcome:

Focusing on click-through rates without considering conversion quality skewed decision-making toward flashy but misleading content.

Jon Torres, CEO of UP Venture Media

How to Detect Perverse Incentives

Detecting perverse incentives in A/B testing requires approaching the problem from multiple angles. Looking beyond surface-level metrics and carefully considering the long-term consequences of your treatments.

Ekaterina explains how you can take advantage of the common denominator of most perverse incentives (vanity metrics and short-term goals) to detect them in your experiments:

What perverse incentives might have in common is the focus on vanity metrics and short-term goals instead of deeper-funnel KPIs and long-term growth. With them, you can make progress with a metric at the bottom of your hierarchy of metrics but move the ‘higher’ KPIs in the wrong direction.

What I find very helpful is an experimentation template with a special section specifying:

– the primary metrics that you’re trying to improve

– the secondary metrics that you might see improving, even if they are not the target ones

– the guardrail metrics: the ones you need to monitor closely because hurting them would hurt the long-term outcomes and ‘higher’ level KPIs

– the tradeoffs: the metrics that might drop, and we’re willing to sacrifice them

Having a hierarchy of metrics built around your target KPI also helps to make sure that working on the lowest-level KPIs will lead to an improvement in the higher-level goals.

Ekaterina Gamsriegler

This describes an Overall Evaluation Criterion (OEC)—a holistic approach for assessing the success of a test that goes beyond focusing on a single metric and considers the overall impact on the business and its long-term goals. It:

Broadens the scope beyond single, gameable metric,
Aligns the desired outcomes with the overall business strategy,
Tracks both intended and unintended consequences, and
Evaluates the long-term impact of an experiment.

For instance, combining conversion rate objectives with customer satisfaction and retention metrics provides a more balanced approach. […] Aligning A/B testing goals with long-term business objectives avoids perverse incentives.

Tom Desmond

There’s a huge role leadership plays in making this happen:

And at the same time, acknowledge that no metric is going to be perfect, right? […] The challenge isn’t to find the perfect metrics, but to find the metrics that will help the teams execute in the short to medium term against the long-term objectives that the company has.”

Lukas goes on to say, “[Leadership should understand] their role in experimentation, how they play a part in enabling this. [Their part should not necessarily be] coming up with hypotheses and pushing for particular ideas. Part of the role of [leading for high-velocity experimentation culture] is giving teams a target to aim for [and] then monitoring their progress against that target. And when they start to go off the rails, to adjust the target. [Leaders should not] adjust by saying, “No, that’s wrong, you shouldn’t have run that experiment, that experiment is bad, you should stop doing that.” […] someone at the top should step in and say, “Actually, these metrics that we have set, they are causing perverse incentives, and so we should adjust the metrics so that the teams are actually optimizing for the right thing.

Lukas Vermeer in #216: Operationalizing a Culture of Experimentation with Lukas Vermeer

Another way to detect perverse incentives is by looking for what Ryan Carrigan calls “unexpected metrics.”

Don’t just track your leads and conversions. Track things like engagement times, bounce rates, and other qualitative feedback. Sudden shifts in these stats are a great way to identify likely unintended consequences.

Ryan Carrigan

Vigilance, continuous learning, and focusing on long-term value will help you avoid these hidden traps.

One key point is closely monitoring behavioral shifts that deviate from the intended goal. If individuals or systems respond in ways that exploit loopholes or achieve outcomes counter to the original purpose, it signals a perverse incentive.

For instance, if a pricing incentive meant to boost customer satisfaction inadvertently leads to deceptive practices, it’s important to recognize and rectify such discrepancies promptly. Regularly assessing outcomes against the intended objectives allows for the early identification of perverse incentives, enabling corrective actions to maintain alignment with the business’s overarching goals.

Nick Robinson, Co-Founder at PickandPullsellcar

We can also detect the impact of institutional perverse incentives by reviewing the backlog of test ideas. A team fixated on a single type of test, or one repeatedly favoring a particular location or audience, might be exploiting a ‘comfort zone’ of easy wins. This approach prevents actual exploration and is bound to exhaust once the local maximum is hit.

This isn’t nefarious per se. However, not all impacts of perverse incentives result in outright data manipulation.

Often sticking to this pattern of low-effort, low-risk tests may appear impressive to C-suite seeking quick ‘wins,’ but it puts a glass ceiling on more significant learning and bigger bets.

In the worst-case scenario, if promised results do not materialize yet everything is rosy on paper, you can be sure that the hard, gritty, frustrating science of experimentation has been disregarded, and perverse incentives are at play.

How to Manage Perverse Incentives

So far, we’ve explored the insidious nature of perverse incentives. We’ve seen how the pursuit of short-term gains can undermine long-term business health, distort metrics, and compromise customer experience.

While the challenges of perverse incentives are real, they can be minimized.

Yes, perverse incentives can happen, but that’s solvable with governance.

Ben Labay, CEO of Speero via LinkedIn Post

Let’s explore ways to manage perverse incentives in experimentation:

1. Align Test Ideas (and Metrics) with Business Goals

First and foremost, understand that your experimentation program cannot exist separate from the product, marketing, UX, and the business as a whole.

It is essential to clearly state the program goals and success KPIs and make sure they align with the different stakeholders’ goals (business, UX, product—not only business). We can also regularly check and update how we promote the program’s success to ensure it still makes sense regarding its goals.

Laura Duhommet

Speero’s Strategy Map offers a fantastic framework for this. It helps you zero in on specific growth levers (interest, acquisition, monetization, etc.) and the motions that drive them (marketing-led, product-led, etc.).

You want to make sure you are testing against metrics you need rather than the metrics you have. Think about what the right metrics are to prove the hypothesis, and if you find yourself not being able to test against them, then you have a problem.

Graham McNicoll in ‘Goodhart’s Law and the Dangers of Metric Selection with A/B Testing’

While you’re at it, remember that KPIs (Key Performance Indicators) and OKRs (Objectives and Key Results) aren’t the same beast. KPIs are the ongoing metrics you track, while OKRs are more ambitious, time-bound goals for the organization.

Difference between OKRs and KPIs by Dora Nagy on LinkedIn

A word of caution: Even with the best intentions, your ‘growth map’ will need adjustments. As business priorities shift, your experimentation strategies must also adapt.

Regularly revisit your alignment to safeguard against inadvertently creating perverse incentives as your goals evolve.

Experimentation KPI categorization by Ben Labay via LinkedIn

2. Think Like a Scientist

Let’s get real: perverse incentives and scientific rigor don’t mix. It is almost impossible to chase quick wins while upholding the principles of experimentation.

True experimentation demands a commitment to sound methodology, statistics, and respecting the results—even when they don’t give you the ‘win’ you were looking for. There is no cherry-picking “positive” fragments from a failed test, no peeking at results mid-test, and no p-hacking to squeeze out a semblance of significance.

Monitor and ensure the hypotheses are as data-driven as possible and the problems are well-defined/reviewed with the most relevant people. This can be done with some simple “quality checks” regarding the test plans > hypothesis consistency, data sources and insights proof, test feasibility (Minimum sample size, MDE), etc.

Laura Duhommet

You’ve got to be brutally honest with yourself, your experimentation process, and the numbers.

Perverse incentives can mess with the quality of a program’s scientific method. […] It can encourage us to “play around with data” to get exciting results; optimizers might mess around with the data until they find something significant. This can give us false positives and make results seem more important than they really are.

Laura Duhommet

Some might argue that rigor is resource-expensive and slows down innovation. Here’s the thing: non-rigor is even more expensive; the cost just sneaks up on you in the form of incorrect conclusions or those subtle opportunities you miss because you’re only looking for big, obvious wins.

Rigor helps you:

Maintain statistical significance, careful sampling, and bias reduction to ensure reliable results.
Interpret data accurately, not just chase the numbers that make you feel good.
See the true impact and potential risks so you make data-driven decisions.
Scale what works and evolve as your business grows through thorough analysis.

Think of it this way: innovation without rigor is like a toddler let loose in the kitchen. Sure, they might create something “new,” but is it edible or even safe?

This is also the reason why legendary tester Brian Massey wears a lab coat. A/B testing is science.

3. Foster a Culture where Everyone Can Call Out Perverse Incentives

Perverse incentives thrive in the shadows. So, you must break down silos:

With more visibility on experiments within your organization, you’ll have increased vigilance against misleading metrics.

We promote a deeper understanding of the importance of ethical and sustainable optimization practices by presenting scenarios where alternative strategies could yield better results in immediate metrics and overall business health.

Khunshan Ahmad

Share those experiment results with all departments and teams. Don’t limit insights to just the testing team. Circulate your findings and educate colleagues across the organization through in-house programs, such as a dedicated Slackbot that shares experimentation facts, a weekly newsletter, and monthly town halls.

Additionally, sharing the results of every experiment with the rest of the organization in a dedicated communication channel (e.g., Slack) allows for constructive feedback, helping refine conclusions and ensuring the entire team moves forward with a holistic understanding.

Ekaterina Gamsriegler

Bring in agencies or outside experts for workshops on solid experimentation basics, and use transparently shared scorecards to keep everyone in the loop.

When communicating perverse incentives, I tend to emphasize education. Highlighting the problem alone will only cause tension. It’s important to approach the conversation prepared to suggest alternative A/B tests or marketing strategies. Acknowledging pitfalls and actively seeking solutions is a surefire way to foster trust and guarantee that your marketing serves its best purpose.

Ryan Carrigan

Encourage questioning, not blind acceptance. This signals that it’s safe to challenge assumptions and point out potential flaws in strategies.

Stay humble and be ready to adjust if something’s not working. Make sure that the program and the product are continuously improved, with continuous training, monitoring stakeholders’ opinions, program success, fostering a culture that values learning from mistakes and always seeks improvement, etc.

Laura Duhommet

This open dialogue will not only boost buy-in for experimentation, but it will turn everyone into perverse incentive detection sensors—just like the distributed awareness encouraged by Holacracy.

To solve these [perverse incentives] issues, A/B testing teams must teach their stakeholders what is and isn’t appropriate to test. Teams—if they have the traffic—often shouldn’t be testing small changes on a single webpage, trying to achieve a 1 or 2% gain. They should look for radical changes that break out of the local maximum. Testers should be running experiments across not just one or two web pages but entire website sections, across hundreds or thousands of pages at once in aggregate where possible to increase the sample size and lower the test duration.

Stephen Watts

Get input from people who don’t directly benefit from the outcome of experimentation. Often, those with a less vested interest can pinpoint perverse incentives that those intimately involved may overlook (sometimes unintentionally!).

It can also help to stay open about experiment decision-making and encourage people who are really interested to share their thoughts, especially those who might not directly benefit, to avoid confirmation bias of the optimizers.

Laura Duhommet

Remember: Cultivating this proactive culture takes time and consistent effort. But the payoff is huge: a team empowered to call out problematic metrics before they cause damage, fostering true innovation rooted in reliable data.

Don’t do this 😂 (Source)

4. Create Psychologically Safe Experimentation Programs

It turns out that the battle against perverse incentives isn’t just about metrics and test setups—it’s about creating an environment where people feel empowered to take risks, learn from setbacks, and prioritize true progress.

Lee et al. (2004) found that employees are less likely to experiment when they face high evaluative pressure and when the company’s values and rewards don’t align with supporting experimentation.

For example, inconsistent signals about the value of experimentation and the consequences of failure would discourage risk-taking and innovation, even if the organization officially and vocally supports these activities.

This could lead to a scenario where employees are motivated to play it safe and avoid experimentation.

Just saying you don’t mind people failing experiments doesn’t make people feel that’s really true.

This means rethinking how you incentivize your teams and how you respond to experiments that don’t yield the expected outcome.

When we find a problem, we don’t just sound the alarm. We explain what’s going wrong in a way everyone can understand. We show how these tricks can mess up our plans and suggest better ways to test things. It’s important to talk clearly and not just blame people.

Perverse incentives aren’t done on purpose. They’re like hidden traps. To avoid them, we need to talk openly, really understand our data, and always think about our long-term goals. This helps make sure our tests actually help us instead of causing problems.

And here’s some interesting info: 70% of A/B tests can be misleading. Tricks like perverse incentives cost companies a ton of money—about $1.3 trillion a year! Companies that really get data tend to make more money. Talking well with your team can make people 23% more involved in their work. Being trustworthy makes customers 56% more loyal. So, using A/B tests the right way, with honesty and care, can really help your business grow without falling into these sneaky traps.

Andrei Vasilescu

5. Run an ‘Experimentation’ Program Instead of a ‘CRO’ Program

Words matter. Language defines culture—the way we think and act—and lays the foundation of success.

The term “Conversion Rate Optimization” itself carries baggage—an obsession with a single metric, a short-term focus on a downstream metric that is well nigh impossible to control.

It is also going through the same commoditization phase SEO went through, where companies can choose to do “some CRO” before a launch and expect to reap the rewards. It’s all about the revenue.

Experimentation, though, is human nature. It’s how we learn, grow, and innovate. It’s a constant part of our daily life and thought processes.

Framing your efforts around this concept shifts the focus from desperate outcome-chasing to a structured yet exploratory process. This emphasis on process protects you from the trap of perverse incentives.

Sell experimentation as a low-barrier innovation enabler. Then, ensure that those who execute understand that the quality of the input defines the program. That it’s not a magic revenue bullet.

But hey, that’s easier said than done. You’d have to change:

What metrics are tracked
How performance is measured, and
How buy-in stays secured.

You may even have to re-think compensation for the experimenters.

But once we achieve this pivot, the payoff is massive. The potential perverse incentives are now golden incentives, encouraging forward thinking, consideration of long-term impact, big bets, and high-quality ideation.

Conclusion

Let’s close out with this from Kelly Anne Wortham:

Many experimentation SMEs and platforms focus all their time on metrics—specifically—the conversion rate. Like I said in my response — we even let ourselves be called CROs (at least some do. It’s a case of Goodhart’s law where the measure has become the target; it ceases to be a good measure. We are focusing on converting customers instead of understanding customer pain and working to truly resolve those pain points with true product solutions and improved usability. The end result is the same (higher conversion), but the focus is wrong. It feels like our goal shouldn’t be to improve conversion rate but rather to improve decision-making and customer satisfaction with the brand. And perhaps there’s a way to show that we are informing product changes with research and experimentation that is supported by customer empathy. All of these things would be more closely linked to our actual goal of building a better relationship between a brand and its customers. Which, at the end of the day, will increase the conversion rate.

Kelly Anne Wortham, Founder of Test & Learn Community (TLC)

References

De Feo, R. (2023). The perverse incentives in academia to produce positive results. [Cochrane blog]. s4be.cochrane.org/blog/2023/11/06/the-perverse-incentives-in-academia-to-produce-positive-results/

Dufner, M., Gebauer, J. E., Sedikides, C., & Denissen, J. J. A. (2019). Self-Enhancement and Psychological Adjustment: A Meta-Analytic Review. Personality and social psychology review: an official journal of the Society for Personality and Social Psychology, Inc., 23(1), 48–72. https://doi.org/10.1177/1088868318756467

Lee, F., Edmondson, A. C., Thomke, S., & Worline, M. (2004). The Mixed Effects of Inconsistency on Experimentation in Organizations. Organization Science, 15(3), 310–326. http://www.jstor.org/stable/30034735

Mattson, C., Bushardt, R. L., & Artino, A. R., Jr (2021). “When a Measure Becomes a Target, It Ceases to be a Good Measure.” Journal of Graduate Medical Education, 13(1), 2–5. https://doi.org/10.4300/JGME-D-20-01492.1

Robins, Richard & Beer, Jennifer. (2001). Positive Illusions About the Self: Short-Term Benefits and Long-Term Costs. Journal of personality and social psychology. 80. 340-52. 10.1037/0022-3514.80.2.340

Top 24 A/B Testing Tools for 2024

Daniel Daines Hutt — Wed, 13 Mar 2024 22:50:32 +0000

In searching for your perfect-fit A/B testing tool, simply comparing price tags and feature lists won’t cut it.

There’s more that’ll impact the quality of your experiments and the learnings you can pull from them. In this guide, we’ll take a closer look at the key things you need to consider, with input from seasoned conversion rate optimization pros. By the end of this guide, you’ll be able to invest in an A/B testing tool that dovetails seamlessly with your unique needs.

Here are the top 24 A/B testing tools we recommend:

AB Tasty
ABlyft
ABsmartly
Adobe Target
Amplitude
Apptimize
Conductrics
Convert Experiences
Convertize.io
Crazy Egg
Dynamic Yield
Eppo
FigPii
GrowthBook
Intellimize
Kameleoon
LaunchDarkly
Nelio
Omniconvert
Optimizely
SiteSpect
Statsig
VWO Testing
Zoho PageSense

We’ve added their features, how affordable they are, ideal use cases, and independent reviews. We judge how affordable an A/B testing tool is by the price of the tier that supports decent A/B testing.

We also added a sneak peek of each tool: A gif demo when a free trial is available and a screenshot of the UI when it isn’t.

Then, we cover the consciousness quotient (CQ) of the tool vendor. CQ measures how much a company or brand cares about its impact on the planet. Note: If there isn’t any CQ-related content published by the A/B testing tool company, it doesn’t automatically mean they don’t care about the world. They probably do and don’t talk about it.

Now let’s jump straight to the A/B testing tools comparison.

1. AB Tasty

AB Tasty is a mid-level tool. It features a clean interface and simple setup, making it easy for companies looking to move past basic experimentation to start scaling their testing programs.

The simplicity extends into its machine learning functions to help large-traffic sites run more sophisticated personalization and data tracking.

G2 rating: 4.5/5 from 204 reviews

Ideal use case: Web, product, and app experimentation

Pricing: You find out when you submit a form.

A look inside AB Tasty:

Source: AB Tasty Reviews 2024

Pros:

Built-in AI and ML capabilities
Easy to set up and preview tests
Multiple integrations
Extensive analytics reports
A wide range of targeting options are available, along with personalization
Reliable customer support

Cons:

Support: Knowledge base and live chat.

AI roadmap: AB Tasty uses AI to automate parts of the optimization workflow and help with personalization and analytics.

Consciousness quotient (CQ): They donate to NGOs, work with social action groups, recycle, and sponsor beehives.

A/B testing software review from G2:

Source

Critical review:

Source

2. ABlyft

ABlyft is an A/B testing platform that supports web, mobile app, and product experimentation. Built by experienced testing engineers and managers, it has a strong focus on developers.

It also integrates with various analytics tools.

G2 rating: 4.5/5 from 1 review

Ideal use case: Web experimentation

Pricing: Custom pricing, after a free trial.

What ABlyft looks like in action:

Pros:

Supports single-page application (SPA) experimentation
Comes with mutual exclusion to prevent experiment overlap
Fast, lightweight snippet

Cons:

There’s a learning curve for a beginner who just wants to run simple experiments

Support: Support tickets are available

AI roadmap: At the time of writing this, I did not find information about AI features or plans for those features.

Consciousness quotient (CQ): Found no information about what ABlyft does to benefit the world.

A/B testing software review from G2:

Source

Critical review:

There’s only one review available for ABlyft on G2 and it’s positive.

3. ABsmartly

ABsmartly is a private cloud A/B testing tool designed to run experiments faster than most tools. It allows you to test on any platform—web, SaaS product, mobile app—both client-side and server-side.

The tool is optimized for high throughput, supporting many experiment types, goals, and multi-variate testing.

G2 rating: 4.6/5 from 13 reviews

Ideal use case: Cross-platform A/B testing: Mobile apps, SaaS products, websites

Pricing: 60-day PoV (proof of value) is available and then it’s custom pricing

A look inside ABsmartly:

Source

Pros:

ABsmartly is designed to be a fast tool
It sets no limits on experiments, users, and goals
The SDK is easy to work with (both frontend and backend)
Support is fast

Cons:

Exporting data requires help from the support team
Metric table fields can be hard to read

Support: Documentation, training sessions, and email support are available

AI roadmap: Did not find information about AI features or plans for those features.

Consciousness quotient (CQ): Found no information on this.

A/B testing software review from G2:

Source

Critical review:

Source

4. Adobe Target

Adobe Target is a powerful enterprise testing tool but it does come with some limitations, in that it’s only available to customers of their analytics tool.

That being said, the integration between the two allows for a fantastic workflow switching between personalization and segment targeting (hence the name), especially when you factor in its machine learning capabilities to constantly learn more about your audience and targeting.

G2 rating: 4.1/5 out of 62 reviews

Ideal use case: Personalization and experimentation

Pricing: Only available after an 11-part form, an email, and a call.

A look inside Adobe Target:

Source

Pros:

The interface is easy to use
Provides accurate real-time data reports
Integrates well with Adobe Analytics and is designed as an upsell offer
Website personalization tools for both beginners and experts
Walks you through the setup and testing process
Has advanced AI to continually test and improve campaigns and personalization
Can get great insights when connected to other paid Adobe Marketing Cloud tools

Cons:

Optimizing on a large scale can be slow
Requires a very high volume of traffic to work best
The form-based editor has a learning curve
High price point
No trial option

Support: A knowledge base, video training programs, phone and chat support, and an expert community are available.

AI roadmap: Adobe Target allows you to personalize your visitors’ experiences with AI at scale.

Consciousness quotient (CQ): They champion diversity, work towards 100% renewable energy buildings, lower emissions, run community action programs, and more.

A/B testing software review from G2:

Source

Critical review:

Source

5. Amplitude

Amplitude is a digital analytics platform that’s built for product innovation and experimentation. It provides a way for businesses to make the right bets, by connecting products to actual business outcomes. This is through its product analytics, event tracking, and experimentation features.

G2 rating: 4/5 from 12 reviews

Ideal use case: Product experimentation and analytics

Pricing: Starts free and then it’s $61/mo. But you can only run A/B tests and MVTs on the growth plan which requires you to contact sales for pricing.

What this Amplitude looks like in action:

Pros:

Session replay and unlimited feature flags are on the free plan
It’s got built-in data quality checks
The workflow-based design makes you follow best practices of experiment design for statistical rigor
It has personalization features

Cons:

A/B testing is only available from the third pricing tier
There’s an initial learning curve
Amplitude support is limited

Support: There’s a help center, live chat, ticket/email support, and learning resources.

AI roadmap: Amplitude AI features a suite of AI-powered features for automating data management, generating insights, monitoring product metrics changes, and providing recommendations and predictions.

Consciousness quotient (CQ): Didn’t find information on Amplitude’s CQ.

A/B testing software review from G2:

Source

Critical review:

Source

6. Apptimize

Apptimize is an Airship company that’s arguably the leading solution for mobile A/B testing and feature release management. Brands use it to build and iterate on user experiences across their digital channels—all in a mobile-first approach.

G2 rating: 4.1/5 from 17 reviews

Ideal use case: Cross-platform A/B testing: Mobile apps, SaaS products, websites

Pricing: Starts free and the rest is custom pricing

What this A/B testing tool looks like in action:

Pros:

Fast and reliable testing on mobile apps
Bypasses App Store and Play Store
Apptimize has a patented technology stack that integrates with major push and analytics providers

Cons:

Not beginner friendly
Developer-centric design

Support: Live chat, email, and documentation are available.

AI roadmap: Did not find information about Apptimize’s AI features or AI plans.

Consciousness quotient (CQ): Found no information on CQ.

A/B testing software review from G2:

The most recent reviews here are old (2018).

Source

Critical review:

I found none worth noting.

7. Conductrics

Conductrics is a unified customer experience platform—you get testing with machine learning and surveys. It’s built in such a way that you’ll learn the “what” and “why” behind your customer behavior in one platform.

G2 or TrustRadius rating: N/A

Ideal use case: Cross-platform A/B testing and user research

Pricing: Custom pricing

A look inside Conductrics:

Source

Pros:

Has WYSIWYG and code editor
You can set up tests using custom JS, page redirects, or rich APIs
The ML predictions identify audiences of interest

Cons:

No free trial

Support: Email support and password-protected documentation are available.

AI roadmap: Conductrics Predict uses AI and ML to recommend a variation to the audience segment that prefers it.

Consciousness quotient (CQ): No CQ info found for Conductrics.

8. Convert Experiences

Convert Experiences is perhaps best considered as a mid-level testing platform but with one of the lowest price points. It is designed for companies and teams who are ready to scale up their testing program and processes.

Consistently recommended by agencies (even when they have a suite of competitors’ tools to use), Convert Experiences also has a fantastic support team for chat, email, and live calls.

G2 rating: 4.8/5 from 59 reviews

Ideal use case: Web and product experimentation

Pricing: It’s $499/mo for 100k visitors. This plan comes with unlimited tests, 10 active domains, 30 active goals/projects, and advanced post-segmentation.

What this A/B testing tool looks like in action:

Pros:

Full stack testing is available in the basic $349/mo plan
Supports both Frequentist and Bayesian stats
Feature-rich, run unlimited tests
Fully privacy compliant: no personal data is ever stored
Supports SPA testing
Reliable, fast customer support

Cons:

No free starter plan, only a 15-day free trial

Integrations: Integrates with 90+ 3rd party tools (e.g. Shopify, WordPress, Mixpanel, Hotjar, Google Analytics)

Support: Live messaging, phone, email, and knowledge base with more to come.

AI roadmap: We currently have an AI Wizard that helps with writing copy, with plans to extend support to images and later to video. We aim to integrate Multi-Armed Bandit (MAB), a machine learning technique that enhances decision-making by adapting to new data and optimizing sensitive experiments to get optimal results faster.

Consciousness quotient (CQ): Caring about the world is in our DNA. We plant trees (our goal is 1,178,600 trees), run community programs, champion diversity from the initial application, donate to charities, and much more. We’re 15x carbon negative and aiming for 100x.

A/B testing software review from G2:

Source

Most recent critical review:

Source

9. Convertize.io

Convertize.io is a beginner-friendly testing tool for running A/B tests on various elements on your web pages. Beyond testing, it suggests optimization approaches based on research and previous test data. This gives you more certainty about your next steps when you’re new to CRO.

G2 rating: 3/5 from 4 reviews

Ideal use case: Web experimentation

Pricing: Starts with a 14-day free trial, and then it’s $59/mo for 20k visitors and all the features.

What Convertize looks like in action:

Pros:

Great pricing for companies new to A/B testing
Easy to install and use, especially for a beginner
Has a great visual editor
Provides 150 testing ideas to get you started

Cons:

There are some issues with the global JavaScript box
Analytics features are not advanced enough for some users

Support: Chat and email support.

AI roadmap: Not exactly AI, but Convertize has an advanced multi-armed bandit algorithm called Autopilot that automates A/B tests for maximum conversions.

Consciousness quotient (CQ): Nothing about CQ was found for this A/B testing tool.

A/B testing software review from G2:

Source

Critical review:

Source

10. Crazy Egg

Crazy Egg is a behavior analytics tool that’s valuable for ecommerce businesses looking to understand the browsing experience of their visitors. While being beginner-friendly in how the features are made available to you, you can get overwhelmed by the many options.

But if your focus is only on A/B testing, Crazy Egg does a good job of helping you create variants, run simple tests, and generate reports.

G2 rating: 4.2/5 from 110 reviews

Ideal use case: Web experimentation and behavioral analytics

Pricing: After a 30-day free trial, you can conduct unlimited A/B tests for $49/mo (must be paid annually, so it’s actually $588/yr).

What Crazy Egg looks like in action:

Pros:

Simple to set up and run A/B tests
Has click mapping, scroll mapping, and heat mapping features
Has a powerful segmentation feature

Cons:

Lots of features that are difficult for some users to use fully
Reports aren’t detailed enough
A user reports they use deceptive tactics to lock customers into annual subscriptions (see review below).

Support: Knowledge base, video tutorials, and support tickets.

AI roadmap: Did not find information about AI features or plans to build them.

Consciousness quotient (CQ): Nothing about CQ was found for this tool.

A/B testing software review from G2:

Source

Critical review:

Source

11. Dynamic Yield

Dynamic Yield is another enterprise-focused experimentation platform.

Ideal for very high volume sites, it offers A/B testing, omnichannel personalization, and predictive test improvements via machine learning along with many other features.

Because of this, however, it’s not ideal for beginners or testers on a budget.

G2 rating: 4.5/5 from 137 reviews

Ideal use case: Cross-platform A/B testing and personalization

Pricing: Custom pricing.

A look inside Dynamic Yield:

Source

Pros:

Has advanced personalization capabilities to run predictive targeting of messages to your audience.
Full stack integration
Omnichannel testing and tracking
Fantastic support and education

Cons:

Can be overwhelming to set up due to the large number of options
Some essential features require developer setup assistance
Not the best integration with analytics

Support: Call and email support, along with a basic knowledge base are available.

AI roadmap: The tool uses AI for predictive targeting, content customization, and product recommendations.

Consciousness quotient (CQ): They do something very cool, which is to offer free use of their tool to charities that meet certain criteria. They also do work in the community.

A/B testing software review from G2:

Source

Positive review with some criticism:

Source

12. Eppo

Eppo is a low-code end-to-end experimentation platform that supports advanced A/B testing on product features without hard-coded scripts. It’s designed to make experimentation accessible to more people in an organization, through no-jargon reports and abstracted statistical details of experiments.

G2 rating: 4.8/5 from 34 reviews

Ideal use case: Product experimentation

Pricing: Custom pricing

A look inside Eppo:

Source

Pros:

Integrates smoothly with native warehouse data models
Minimizes experiment setup and analysis time
Accessible product A/B testing tool for developers and product managers
Offers CUPED experiment acceleration and sequential analysis

Cons:

Quite technical, too steep a learning curve for beginners

Support: Email support and knowledge base.

AI roadmap: Did not find information about AI features or plans to build them.

Consciousness quotient (CQ): No info found

A/B testing software review from G2:

Source

Critical review:

Source

13. FigPii

FigPii is a lightweight conversion optimization platform with heatmaps, session recordings, polls, and A/B testing. For a beginner in CRO, that’s probably all the features you need to analyze and improve website user interactions and conversions.

G2 rating: 5/5 from 3 reviews

Ideal use case: Web experimentation and behavioral analytics

Pricing: Starts free with unlimited A/B tests at 75k visitors. You can run decent tests on the $149.99/mo plan but you’re limited to 30k visitors.

What FigPii looks like in action:

Pros:

Uses AI and ML to suggest improvements based on interaction data, heatmaps, session recordings, and events
Starter-friendly pricing
Lightweight, clutter-free UI

Cons:

No element list feature
UI feels a bit dated

Support: Email support and documentation.

AI roadmap: FigPii currently leverages the data it collects for users via heatmaps, session recordings, purchase events, and interaction to provide AI-generated optimization suggestions.

Consciousness quotient (CQ): No info found for FigPii

Review on G2:

Source

14. GrowthBook

GrowthBook is an engineering and product management team’s tool for feature flagging and A/B testing that supports web, mobile, and server-side applications. It allows users to roll out new features, experiment with variations, and measure their impact.

G2 rating: 4.4/5 from 13 reviews

Ideal use case: Product experimentation

Pricing: The free starter plan comes with unlimited experiments. After that, it’s $20/user/mo.

What GrowthBook looks like in action:

Pros:

Uses your data like a custom in-house platform, so no vendor lock-in
Integrates easily, even when you have event tracking across your app
Many SDKs available

Cons:

The visual editor is only available on the pro plan
The UI is complex for non-technical users

Support: Documentation and email support.

AI roadmap: Did not find information about AI features or plans to build them. However, they use AI to help you find answers in their documentation.

Consciousness quotient (CQ): No CQ info found for GrowthBook

Positive review from G2:

Source

Critical review:

Source

15. Intellimize

Intellimize is a website optimization platform with AI capabilities for personalizing and improving the experience of web visitors. And yes, this means you can run A/B tests and multivariate tests with it. You can use a visual editor to build variants, launch your tests, and analyze outcomes. It’s built for ecommerce and B2B marketers.

G2 rating: 4.9/5 from 39 reviews

Ideal use case: Web experimentation and personalization

Pricing: Starts at $249/mo after a 7-day free trial. You get unlimited A/B testing on the $2K/mo plan.

A look inside Intellimize:

Source

Pros:

The visual editor makes it easy to build your tests
The customer support team is responsive
It uses ML to decide which variations to display more often

Cons:

The limit of 5 experiences on the starter plan
You can feel the weight of the platform when the screen loads

Support: Slack, email and phone support.

AI roadmap: Intellimize uses ‘AI Optimize’ to personalize marketing experiences for each customer at scale. It also has a feature called AI Content Studio that suggests copy in multiple formats.

Consciousness quotient (CQ): Didn’t find any mention of CQ-related initiatives.

A/B testing software review from G2:

Source

A positive review with some criticism:

Source

16. Kameleoon

Kameleoon is a French-based A/B testing platform that has been praised for its UI, personalization settings, and integrations. It’s also the tool of choice among healthcare companies, thanks to its emphasis on data protection, and fintech companies seeking data-driven solutions with an eye towards privacy concerns.

G2 rating: 4.7/5 from 53 reviews

Ideal use case: Web, product experimentation, and personalization

Pricing: Custom pricing.

A look inside Kameleoon:

Source

Pros:

Easy to set up click tracking
Smooth integration with many other apps
Has an easy-to-use WYSIWYG editor for non-developers
Advanced anti-flicker technology
Knowledgeable and helpful support team
Accurate and detailed planning and execution of tests

Cons:

WYSIWYG editor loads slowly
The reporting dashboard could use a bit more personalization
Need developer-level skills to implement some complex scenarios
Cannot archive tests

Support: Email and phone support are available, including a dedicated account manager.

AI roadmap: With advanced ML, Kameleoon’s AI-powered personalization builds actionable customer segments in real time and customizes content and experiences.

Consciousness quotient (CQ): I couldn’t find evidence of any charity they sponsor or environmental causes they support. Although, it’s interesting to know they have two live chameleons in their Germany and France offices.

Positive software review from G2:

Source

An honest review with some criticism:

Source

17. LaunchDarkly

LaunchDarkly is one of the popular options among software developers, DevOps teams, and product managers. It’ll help you de-risk releases, deliver targeted experiences, experiment with product features, and optimize apps.

G2 rating: 4.6/5 from 197 reviews

Ideal use case: Product experimentation

Pricing: Starts at $10/user/mo after a 14-day free trial, but experimentation is activated only on the $20/user/mo pro plan.

What LaunchDarkly looks like in action:

Pros:

Easy to run A/B tests using a toggle in production
Ability to target specific users or user groups with flags
Easy to navigate UI
The support documentation is straightforward

Cons:

Many features also mean you may not be using the tool fully
Heavily dependent on the initial setup, so you need to get things right when onboarding

Support: Online chat, knowledge base, phone, and email support.

AI roadmap: LaunchDarkly has a feature that allows users to easily create, test, and release GenAI functionality in their apps. They also help users quickly generate product experiments at scale using GenAI.

Consciousness quotient (CQ): Through the LaunchDarkly Foundation, they took the 1% pledge to give financial support and employee time to organizations that improve the health of their community and the planet.

A/B testing software review from G2:

Source

Critical review:

Source

18. Nelio

Nelio A/B Testing is not like the other tools on this list. It’s an A/B testing WordPress plugin ideal for beginners who want to run simple experiments on their WordPress site or WooCommerce store.

However, this integration with WordPress makes it very easy to use, especially for those who have never used A/B testing tools before.

G2 or TrustRadius rating: N/A

Ideal use case: WordPress website experimentation

Pricing: $39/mo for 5k visitors with unlimited tests and heatmaps

A look inside Nelio A/B testing:

Source

Pros:

Native to WordPress—this is a plus if your site is built on WordPress
Comes with heatmap and other third-party plugins enabled (JetPack, Contact Forms, and OptimizePress)
Integrates neatly with Google Analytics
Uses your desired WordPress visual editor to build variants
Sets your win alternative live with one click
Affordable pricing
You can enable session recordings

Cons:

Native to WordPress—you can’t use it anywhere else
No free trial

Support: Email support is available.

AI roadmap: Did not find information about AI features or plans to build them.

Consciousness quotient (CQ): No info on this

19. Omniconvert

Primarily a testing tool for e-commerce, but that’s not to say that you can’t use this for other business types and sites. Another fantastic beginner to mid-level all-in-one platform for those looking to start and then scale up their testing program.

Omniconvert’s suite of tools includes four complementary platforms: Explore which is their A/B testing platform, Reveal which is a customer retention and churn tracker, Adapt which is an automated CRO platform and Survey, a feedback tool for qualitative analysis (all on separate pricing).

G2 rating: 4.5/5 from 79 reviews

Ideal use case: Web experimentation

Pricing: $390/mo for 50k visitors

What OmniConvert looks like in action:

Pros:

Integrates with a suite of features and also 3rd party tools
Simple setup for tests
Multiple segmentation, targeting, and personalization features
Integrates easily with your analytics
Great customer support
Supports both Bayesian and Frequentist stats engines

Cons:

Not the cheapest solution out there in this range
Intuitive UI but not for absolute beginners
Sometimes the visual editor can bug out.

Support: They have a resource hub with tips, video tutorials, help community, live chat, and more.

AI roadmap: Did not find information about AI features or plans to build them.

Consciousness quotient (CQ): I couldn’t find any record of CQ-related work, but it could be that they just don’t share that with their audience.

A/B testing software review from G2:

Source

Critical review:

No significant critical review in recent years on G2.

20. Optimizely

Optimizely is very enterprise-focused. Their goal seems to be delivering solutions that make data-driven decisions easier in high-traffic environments while also offering some machine learning elements.

G2 rating: 4.5/5 from 529 reviews

Ideal use case: Web experimentation and personalization

Pricing: They’re using a custom pricing model. But Splitbase predicts they cost at least $36,000 per year.

A look inside Optimizely Web Experimentation:

Source

Pros:

Clean and easy-to-use user interface
The widget feature is fun to use
A wide range of targeting options is available
Reliable customer support
Utilizes 5 major edge computing platforms to speed up experiments

Cons:

Doesn’t give automatic insights about audience performance (especially for an active experiment)
Google Analytics integration is complex and requires coding
Notoriously expensive option

Support: Help resources and 24/7 phone support.

AI roadmap: Optimizely has Opal, an AI tool that provides features such as content generation, automated translations, product recommendations, multi-armed bandits, etc.

Consciousness quotient (CQ): Most new hires are sent to volunteer in the community on their second day.

A/B testing software review from G2:

Source

Critical review:

Source

21. SiteSpect

SiteSpect is an enterprise-focused testing tool with what they claim to be the fastest load time on the market. (But only slightly faster than some mid-level tools on this list.)

Not for beginners, this platform is for teams who have found limitations with the current platform and are moving further into data analysis, personalization, and machine learning for high-traffic sites, but this is reflected by their price point.

G2 rating: 4.3/5 from 56 reviews

Ideal use case: Web experimentation

Pricing: Custom pricing.

A look inside SiteSpect:

Source

Pros:

Supports all markup languages (HTML, WML, XML, and JSON), style sheets, and scripting languages
No JavaScript tag means no content refresh or flicker
Versatile enough to test almost any scenario
Non-intrusive testing
Integrates with analytics tools
No need to modify the production version of your code

Cons:

Technical knowledge required to implement tests
There’s a learning curve for figuring out all the features
The reporting interface could be better

Support: There’s a knowledge base, phone and email support.

AI roadmap: SiteSpect uses ML and AI in its personalization product to provide one-to-one content customization, auto optimization, product recommendations, and segment discovery.

Consciousness quotient (CQ): SiteSpect has been known to sponsor some charity projects since 2014.

A/B testing software review from G2:

Source

Critical review:

Source

22. Statsig

Statsig is a feature management and experimentation platform for engineering and product teams. Particularly focused on B2B SaaS, ecommerce, gaming, and AI industries, this tool enables teams to run A/B tests on any platform, in any part of the product.

G2 rating: 4.7/5 from 48 reviews

Ideal use case: Product experimentation and analytics

Pricing: Starts free with A/B experiments included, then it’s $150/mo on the pro plan.

What Statsig looks like in action:

Pros:

Unlimited feature gates, experiments, dynamic configs, etc.
Live event analysis is enabled via the Events Explorer
Uses the multi-armed bandit algorithm to pick winning variants
Has change logs, reverts, and reviews to track every change
Responsive support team

Cons:

Third-party metrics sometimes don’t integrate smoothly with Statsig
Many features and capabilities can overwhelm new users
UI can be confusing for non-technical users

Support: Documentation and Slack workspace

AI roadmap: Statsig has a feature called Autotune Experiments which uses the multi-arm bandit algorithm to pick winning treatments.

Consciousness quotient (CQ): Nothing that directly relates to CQ work. But there’s a funny page about their all-dogs identity.

A/B testing software review from G2:

Source

Critical review:

Source

23. VWO

VWO Testing is another entry/mid-level tool. It’s great for those either looking to start in the A/B testing space.

Note that VWO Testing is part of a larger productivity bundle that VWO offers. Other tools in this bundle that support marketing and business needs handle server-side optimization, mobile app optimization, program management, and data management.

They offer the A/B testing features that you might expect, along with the ability to run heatmaps, click-through recordings, on-page surveys, full-funnel tracking, full-stack tracking, and cart abandonment marketing features.

G2 rating: 4.3/5 from 581 reviews

Ideal use case: Web, mobile app experimentation and personalization

Pricing: Starts with a 30-day trial. Then, you can opt for the growth plan at $176/mo for 10k visitors. Note that this is billed annually, making it a $2,112/yr plan.

What this A/B testing tool looks like in action:

Pros:

Flexible customization options to adapt your tests to a lot of scenarios
Easy to plan and execute tests with little coding knowledge
A dedicated support team will guide you through any challenges
Ability to track long-term goals
Ability to group tests together
Useful recordings for observing users and troubleshooting

Cons:

There’s a learning curve to understand the full functionality of VWO
Pricing plans change frequently
They only test client-side

Support: 24/7 support is available via email, live chat, and phone. And a knowledge base.

AI roadmap: VWO’s AI features let you use AI to generate surveys quickly, create survey reports with AI-generated insights, and suggest personalized testing ideas.

Consciousness quotient (CQ): During the pandemic, the chairman of Wingify (the brand behind VWO), Paras Chopra, tweeted their support for a couple of COVID relief initiatives including setting up a 10-bed COVID-care facility in Delhi.

A/B testing software review from G2:

Source

Critical review:

Source

24. Zoho PageSense

PageSense by Zoho is a beginner-friendly website optimization software and personalization platform for tracking and analyzing visitor data, optimizing and personalizing their experience, and experimenting with ways to improve conversions and revenue.

G2 rating: 4.2/5 from 43 reviews

Ideal use case: Web experimentation and personalization

Pricing: Starts at $20/month for 10K monthly visitors. This plan allows 20 custom dimensions, 20 goals, and no limits on A/B tests.

What this A/B testing tool looks like in action:

Pros:

Comes with additional features like heatmap, funnel analytics, form analytics, polls, and session recording
One of the most affordable A/B testing tools on the list
Integrates with GTM, Google Ads, Mixpanel, Intercom, etc.

Cons:

The support team takes a long time to respond
No advanced code editor
Not a robust testing tool

Support: Community, knowledge base, phone and email support.

AI roadmap: Found no information about this. But PageSense integrates with Zoho’s ecosystem which provides AI-powered features that include generative copywriting and smart insights.

Consciousness quotient (CQ): Zoho uses state-of-the-art green data centers globally. They also have a free school that teaches skills for software careers. On top of that, they also pay their students and employ them automatically.

A/B testing software review from G2:

Source

Critical review:

Source

What Is A/B Testing Software?

An A/B testing tool helps you compare two different versions of a web page, app, email, or other digital experience to see which comes out best in performance.

The original version is called the control, tagged “A”, and the modified version is called the variant, tagged “B”.

Using a form of statistics (usually Frequentist or Bayesian), it measures the actions that your visitors take during that test and then provides the results. You can run that test to get a statistical significance and level of confidence that if you push that test result live, you can expect similar results from the rest of your audience.

A/B testing tools usually come in two varieties: Either open source which allows you to keep your data in-house and use it right away, or a closed source which usually asks you to contact them before you can get started.

Recommended Resource:

A/B Testing vs Multivariate Testing vs Multipage Testing

How Can You Set Up an A/B Test?

We cover how to set up and run A/B tests in far more detail here, but let’s give you a quick overview.

How Do You Run An A/B Test On A Website?

First, come up with a hypothesis or idea of how you think you can improve or fix a page to get more conversions.

Then, create the assets needed for your particular test. For example, if you’re testing a new form layout, create the modified version of the form on the same page.

Set the test up inside the A/B testing software and test a single specific element at a time to see which works best. In our example, the form will be the only thing different on the “A” and “B” versions of the page.

(A multivariate test lets you test multiple variations, but in an A/B test, it is just 2 options. It helps give faster results, especially on lower-traffic sites. With A/B testing, you can pinpoint the exact element that impacts user behavior the most.).

Then, make sure the test runs long enough to get accurate results.

And finally, analyze and learn from the outcomes.

How Do You Choose the Best-Fit A/B Testing Tool for Your Needs?

With a crowded market for A/B testing tools, finding the right one for your needs may feel overwhelming. How many trials and demos are you going to sit through before you find the one?

But it doesn’t have to be that crazy. We spoke to experts about this and got their opinions. These are optimizers who’ve been in the conversion rate optimization space for years and work with multiple clients and tools.

Here’s Simon Girardin’s take:

His last point fully aligns with our #TeamOverTools belief: “It’s not only about a SaaS license for a tool that splits traffic into testing variations. It’s also about building a team with core competencies.”

The right team will wield the right tool in a way that pulls the most benefits from A/B testing. This is regardless of how much you paid for the tool or the lifts advertised in their case studies.

Here, we went deeper into how to buy an A/B testing tool in 2024.

When we asked Rishi the same question about how he would choose an A/B testing tool today, he said:

The things I am looking for in an A/B testing tool are:

A: The “load” that it’s adding to the client’s website. We can’t have a tool that negatively hurts site performance.

B: Speed of technical support. I am a simple marketer and don’t understand all the technical details, so it’s very important that I am able to get quick responses from this technical team at the tool.

C: Help in interpreting the results. My objective as a marketer is not to somehow get a test winner. It’s to get to the truth. So, if an experiment is declared a winner, I want to make sure that the statistics back up those results and that no mistakes were made when the experiment was designed. Having access to the tool’s data science team to confirm that we made no mistakes in designing the experiment is a key consideration for me.

Rishi Rawat, Product Page Optimization Specialist at Frictionless Commerce

To make it easier for you to choose the right A/B testing tool, here is a quick 7-step guide to follow:

1. Define Your A/B Testing Goals

Without this core aspect of your choosing a tool, everything else will be a miss. So, begin with a clearly defined A/B testing goal—not in the context of the first or next test you want to run but think of your overarching optimization goal.

Are you doing this to understand your business better?

Do you want to learn what makes your customers tick?

Or are you in this for the iterative process of unraveling the next best performance of your website, marketing campaigns, SaaS product, or mobile app?

To define your A/B testing goal, you need to understand your business goals—primary and secondary—and the key performance indicators that quantify your progress toward achieving them. For example, an ecommerce store would choose completed purchase as a KPI for the business goal of expanding sales and revenue.

Once you’ve gotten here, you know that you need an A/B testing tool that supports experimenting with ideas to improve your target KPI. Now, the next thing you have to do is check out what’s out there.

2. Compare A/B Testing Tool Features

You want to ensure that the technical requirements for running healthy tests around your defined goals are supported. For that, you need to make a list of your technical requirements.

If you’re new to the scene, this may not come easily to you. In that case, you want to start with a beginner-friendly tool or plan. Learn and grow with the tool and see what limits you hit.

You can also start by assessing this list of features you should expect to find in your ideal tool:

Key features you need in an A/B testing tool

Here are the key features that every optimization platform should have:

Simple setup and installation
Customer support
Easy-to-use UI
WYSIWYG editor
Code editor for custom tests
Flicker-free testing so as not to negatively affect your test
A/B testing, split URL testing, multivariate testing, multipage testing capabilities
Targeting and segmentation abilities
Personalization
99% uptime so it doesn’t break mid-test
Unlimited test capabilities
Unlimited variations
Multi-filter targeting options
Detailed reporting that you can dive into
Third-party integrations and API
Security, privacy and data protection

Anything else is either a bonus or nice to have, but without these core features, you may struggle to run decent tests effectively. The pros and cons sections of each tool above include information about these features.

3. Consider the Scalability of the Tool

You want a tool that grows with your experimentation program. If you’re just starting, your needs will be small. Most lightweight or free A/B testing tools that run basic tests for a few thousand monthly visitors will fit nicely.

But if your business is growing, you won’t be here forever. The time will come when you’ll demand more from your tool. If it doesn’t deliver then, you’ll have to go through uncomfortable migration to a better A/B testing tool.

To avoid that, take a few minutes to find out:

What traffic volume is supported? A scalable tool should handle from 10k monthly visitors to over a million.
What’s the complexity of tests I can run without additional costs? Does the tool support multivariate testing, split URL testing, etc.? Can you run multiple tests at a time? Are there any limits on testing velocity? If all you get is A/B testing with other limits, pause and think twice.
How many goals can you track at a time? This is a core part of a decent A/B test. You should be able to track multiple goals at the same time as your business expands.

4. Go for A/B Testing that Respects Privacy

In the last few years, there’s been a large push towards improving user privacy online.

Everything from how we handle users’ data and consent, to how we track and what we track, and now we’re seeing specific devices and browsers remove user information or limit tracking time frames.

This is a good thing for the user and something we should respect as site or app owners. (Here at Convert, we don’t even work with other 3rd party tools unless they meet GDPR—even if it’s not a client-facing tool).

However, these bylaws and features only affect client-side tools and not server-side testing tools.

What’s the Difference Between Client-Side and Server-Side Tools?

Client-side A/B testing tools are where the tool is on your website via a JavaScript code in your header. It tests and tracks by making changes in the user’s web browser to show different versions of page designs, etc.

Server-side A/B testing tools are instead installed on your web server.

They each have pros and cons: Client-side tools are usually faster and easier to set up new campaigns, while server-side tools allow you to track things you might not usually track with a standard tool.

The main difference when it comes to privacy is how these new laws and tools are implemented.

You see, pop-up blockers and things like this can stop client-side tools from working until the user agrees to it. The browser or device sees the cookie event and stops it from triggering and reporting back.

(Users without blockers in place will hopefully see a GDPR consent form asking them if they agree to be tracked and share particular user data. Some sites, however, will track as standard and ask them to opt out instead).

The thing is, server-side tools are not stopped by any of this. They don’t rely on the web browser cookies to send information as it’s tracking all your site events and sending them to your server instead. This means that server-side tools can still track events and user IDs, and then feed that information into the testing platform, 3rd party analytics, or ad platforms.

You could in theory track away to your heart’s content and never tell anyone. (While risking a potential lawsuit if they get access to your server data.)

But just because you can track all that information, should you?

A/B testing can be the best way to mitigate the risks of damaging the user experience in the future. If you think about it, testing is all about proving a hypothesis that one version is better than the others. So you really don’t need to store anything more than the number of users seeing these versions and the goals you’re tracking. Just that… two simple numbers, variationID and GoalID, and the number of times they are seen. You should set up your A/B testing in this clean way, nothing more.

You lose the trust of your users if you try to slice and dice your data in search of a winner. That’s not what A/B testing is for, that’s analytics — don’t mix it up.

Mitigate the risk of hurting your users and get an occasional win for the business. It will win you trust, save you money and so make you the best version of what you can be for your users.

If you keep it clean, you won’t be impacted by privacy laws. You don’t store any personal data and don’t track users. Your tracking usage… that is long-term the future of this niche in CRO.

Dennis van der Heijden, Convert.com

How Does Privacy Affect Your A/B Testing Tool Choices?

Regardless of whether you use client-side tools for the convenience of setup or server-side tools for more in-depth tracking, you should still as a responsible business meet standard and upcoming privacy changes:

Respect Do Not Track browsers
Ask for consent
Track events instead of user data
Anonymise IDs, etc.

With that in mind, be sure to look at your experimentation platform’s privacy goals and what they allow you to do with it:

Do they meet privacy requirements as put down by the GDPR and similar laws?
Do they store data safely so your audience doesn’t get leaked and you don’t get fined?
Do they allow you to test and respect users’ privacy while doing so? I.e. can you set those settings in your test campaigns, even if it’s a server-side tool where technically you could get away with it?
Do they allow you to adjust elements of your experiment for different location laws?
Finally, you might want to check if your tool is client-side only, or if it can be set up server-side also.

But remember: Just because you can track all that information with a server-side tool, it doesn’t mean that you should. Use a tool that cares about your audience’s privacy!

(We removed a lot of our in-depth user tracking details from our tool because of this. We can still track what’s needed for controlled experiments, but we don’t need to do it while knowing too much about the user).

5. Augment Your A/B Testing with AI Capabilities (Reasonably)

AI is here to stay. At this point, you either make it a part of your workflow to keep up with productivity or get left behind.

But understand a crucial fact: AI cannot replace human expertise. Think of it more as an assistant than a replacement.

That said, there are several ways your experimentation workflow can benefit from AI assistance. It shouldn’t make the final calls on what changes you make to your business but it can help you get to your data-driven faster and with less effort.

An A/B testing tool vendor that understands this and has an AI roadmap reflecting that understanding is an ally for the foreseeable future.

At Convert, we champion the ethical use of AI in business. This is our code of AI use.

Besides productivity and speed, artificial intelligence (and specifically, machine learning) can support your A/B testing journey by helping you:

Run real-time personalization. Amazon is a great example of using machine learning for real-time personalization. They adjust front page recommendations, upsells, pricing, and even landing pages, all based on previous user behavior data of people who are similar to you (same interests), or by tracking your past interactions and predicting potential future actions. This personalization alone helps them to improve their sales by 35%.
Generate variants. Maybe you want to test adding scarcity to your copy to improve conversions. Generative AI can take one prompt and create a version of your copy with scarcity included. You don’t have to do all the work yourself. Then you can test more ideas faster.

Convert has a new feature: AI Wizard. It helps you rewrite headlines and paragraphs of text using various models: the Seven Persuasion Concepts, the Fogg Behavior Model, and the most popular copywriting formats. Create copy variants faster than ever.

6. Note the Support Channels Available to You

I always try to make sure that the A/B testing tool I recommend for my clients comes with fast, reliable, and friendly support. Bonus points if the tool comes with a live chat option so my clients and I don’t have to wait several days for a response to a simple question.

Tracy Laranjo, Fractional Head of CRO & Experimentation

You want a tool that has your back when the road gets bumpy—because it will. And when that happens, your experiments and decision-making shouldn’t grind to a halt. You need prompt, effective support that gets you back on track quickly.

Check out the support channels the A/B testing tool vendors above provide. Multiple channels that include the ability to start a live chat, create support tickets, and email your account manager (if your plan supports that) are best.

Also, look at the reviews on third-party review sites like G2 and TrustRadius to gauge the responsiveness. Self-reported responsiveness shouldn’t be the only thing you assess.

7. Watch Your A/B Testing Budget

Don’t be fooled by this being the last item on the list. It’s vital.

A/B testing tools can range from free to $1,000/mo and go all the way to $100k+ a year. Not all tools are created equal, hence the vast price differences.

I’m also more likely to recommend a tool if the pricing is readily accessible on the website. If my client has to go through a lengthy sales process or demo just to find out if the tool is right for their budget, we’re out.

Tracy Laranjo

What justifies the price you pay for a tool is its capability, the level of support offered to you, your experimentation maturity, the ROI of your A/B tests, and the features you actually need.

You should choose a scalable tool that matches your current A/B testing needs with reasonable and transparent pricing that’s well within your budget. The value you get from your A/B testing tool doesn’t correlate with how much you paid for it.

Remember:

#TeamOverTools paraphrased in Top Gun

Conclusion

So there you have it. The 24 A/B testing tools for 2024 and a strategic method to guide how you make your choice.

We believe that if you make a non-biased, brutal analysis of the tool you want and give it a solid spin via a free trial or demo, you will find the A/B testing tool that fits your experimentation goals for a long time.

How to Optimize Content for Generative AI

Ann Smarty — Wed, 17 Jan 2024 20:52:43 +0000

The search is changing as we speak.

And it is happening too fast even for Google to keep up.

How will online findability change in the course of this year and further on?

Which SEO tactics will survive the change and continue to be effective? Will there be new tactics to embrace?

Let’s try to figure this out!

What are generative AI engines?

Generative AI engines generate accurate and personalized answers to any search query. Instead of providing you with a list of resources for you to go and find answers to your questions (like traditional search engines do), generative AI provides an instant answer.

Search engines vs generative engines for visibility and monitoring

As AI technology is being quickly adopted, naturally the two most worrisome trends online creators and digital businesses will have to adapt to:

People will likely click less (So how to get traffic)
Each of your target customers will see different answers (How to control your business portrayal as well as how to track your performance and visibility?)

There are no definitive answers to either of these questions because we are not yet sure where this is all going and how much AI will change our lives overall. But there’s one thing that will hopefully calm everyone down:

This is not going to be a fast change

Believe it or not, we have been going to this change for many years now. Back in 2020, I was writing about Google as an answer engine versus a search engine.

An abundance of quick answers in Google’s SERPs has been here for a while but most people were still engaging with search results.

This is why I think the search will take some time to change:

Even though people are already using generative AI to find answers, mainstream adoption will take some time. Consumer habits take time to change. People are very much used to searching and clicking search results. It will take some time, especially for older generations, to start using alternative methods to discover information and find answers.
Google will take time to adjust its revenue model. It relies too much on people’s clicking ads and organic results. They don’t know yet how to monetize AI-generated answers.

We will have time to adapt.

But this doesn’t mean we don’t need to start preparing for a change.

How to get found by generative AI platforms

Become an entity

The fundamental step to generative AI findability is no different from traditional search visibility: They need to know about you. ChatGPT and other generative AI solutions rely on human knowledge. If no one knows about your business or product and never talks about either, no generative AI engine will know you.

Unlike Google which can be forced to include your URLs in its index, there are no ways to force AI platforms to know you unless other businesses, people, and websites are talking about you.

So start by building a brand others are referencing, including into lists, discussing on social media, etc.

The good news is that being an entity (a recognized brand) is also key to traditional Google’s rankings, so there’s a double benefit to it.

There’s no easy way to become a brand. It takes a lot of time to be everywhere and engage with your target customers on your site and elsewhere. It has been found that it takes between 5 and 7 touchpoints for a person to remember your brand. It’s engaging with the same person again and again to build recognition.

But here’s where you can start:

Invest in creating great content that may go viral and become a trend (original research is usually your best bet to position your business as a knowledge hub)
Nurture relationships with your customers on social media
Invest in content-based journalistic outreach
Reach out to influential bloggers to be included in thematic listicles (i.e. listing your competitors or their products)
Invest in high-quality PR. This is still one of the best ways to get people talking about you.
Collaborate with other (personal) brands on content creation, event hosting, giveaways, etc.

Well, as you can see, there’s nothing new here.

Ask ChatGPT (and Alternatives)

This is something I have started doing for just about any of my clients: I have a long discussion with ChatGPT about what it knows about their business and their competitors. Prompts like these:

Do you know NAME?
Who are NAME’s competitors?
How is NAME compared to NAME’s competitor?
I need to buy a PRODUCT/ find a SOLUTION, which would be your top choice?
If I had a budget for only one solution, what would be your top choice?
Why did you pick NAME’s competitor as the top solution?
What should NAME do to become #1 recommendation?
What is NAME’s competitor doing for marketing?
What NAME should change in their marketing strategy?
What is NAME’s competitor’s value proposition and how do they promote it?
What is NAME’s value proposition?

All these answers will give you a good idea of what ChatGPT knows about you and your competitors, how likely (and in which situations) it will recommend you in an answer, and what you need to do to increase that probability.

For example, for testing purposes, I asked ChatGPT to compare Convert and VWO and pick one solution for me. Here’s its response:

…both platforms offer A/B testing, split-URL testing, and multivariate testing. Convert.com also offers multipage testing and A/A testing, which might not be available on VWO. For segmentation, both platforms offer geo-targeting, cookie-based targeting, and behavior targeting, with Convert also providing third-party input using Data Management Platforms (DMPs)

I asked ChatGPT to create a table comparing the two platforms, and here’s the result:

Competitive analysis using ChatGPT

This gives a good overview of what the platform knows about the two solutions and how they will be positioned when people are using ChatGPT to find a best-fitting option.

This insight is especially important considering that a growing number of users rely on ChatGPT to find and research tools:

ChatGPT used as a resource for finding tools

AI-specific optimization methods: GEO

Generative engine optimization is an emerging practice of improving your chances of being surfaced by generative AI engines. The first GEO research went live in early December and I covered it here.

Researchers optimized content using different tactics and tested which of those on-page optimization tactics helped pages to come up in more AI answers.

The most effective optimization tactics were found to be:

Adding quotes from recognized experts (this one performed the best)
Citing authoritative sources
Adding statistics

All these three content optimization tactics make a lot of sense even if you are not trying to optimize for generative AI. But if you needed an additional incentive to start implementing them in your content strategy, here it is.

The least effective tactics were:

Adding more keywords
Adding unique industry terminology

On-page tactics tested for generative engine optimization. Source (PDF)

It is a good start and something to implement in your content strategy. Less focus on keyword matching and more focus on adding trust-building elements, like authoritative sources and quotes from experts.

I am sure there will be more and more studies like these shedding more light on what it is AI engines are looking for when generating their answers.

How to get surfaced by Google’s SGE

Google’s Search Generative Experience (SGE) is an experimental feature that generates an AI answer on top of search results in response to a user’s query. Sometimes it triggers an AI snapshot right away (pushing organic results down) and sometimes it invites a user to click a button to generate an answer.

This is what it looks like:

Google’s SGE snapshot

When SGE goes public (which is quite likely because Google extended the experiment to many more countries later last year, so they are getting serious about this!), organic traffic will suffer a dramatic decrease because searchers will be encouraged to engage with AI snapshots instead of scrolling down to organic listings.

For this reason, I am discussing this generative engine in a separate section. It is quite different from standalone AI solutions:

SGE uses Google’s index

Unlike other AI solutions, with SGE we know exactly which information it is using to generate answers.

If your URL is indexed by Google, it is being used by SGE knowledge. Moreover, there were several studies that found a strong correlation between ranking in the top 10 and being surfaced by Google’s AI snapshots.
If your brand is in Google’s Knowledge Base (i.e. there’s a Knowledge Panel showing up when you search for your brand name), it is being used by SGE. Moreover, being in Google’s Knowledge Panel is a strong signal you will likely show up in related AI snapshots. Once again, becoming a brand (i.e. entity) is key!
Having your product feeds in Google’s Merchant Center means SGE knows about your products, deals, and product reviews.
If your local business exists as Google’s My Business entity (and better yet, it is verified), SGE knows about it.

In this sense, SGE is much more predictable than any other AI platform. I hope Google will also come up with some tracking solution (just like they did for Discover) which will make the optimization strategy even easier and more predictable for it.

SGE includes prior search journeys

Google’s SGE patent goes into much detail on how AI snapshots work. One big takeaway from that patent is that SGE responds to:

Current search query
The recent search journey by each particular user
Related queries (those queries that tend to be searched in close proximity).

This means that from now on, we need to be paying more attention to “related queries” Google includes in most SERPs. These come in different forms, mainly:

Long-tail queries that are mostly an extension of the current one:

Related keywords for “how to learn knitting”

A list of related topics that produce results when clicked without triggering a new search.

Related searches for “how to learn knitting”

When creating a landing page, keep a close eye on these keywords to include on your page and create supporting content to have a better chance of being included in AI answers.

What this means is that you need to prioritize your search queries because you will hardly be able to create this type of strategy for every keyword in your list.

Furthermore, let’s start moving away from optimizing for keywords to optimizing for search journeys.

When SGE goes live, you will likely lose a big percentage of your traffic. But think about it this way:

You may not need the audience you’ll lose over the generative AI revolution. If they are satisfied with a quick answer, they are unlikely to buy from you. Prioritize those journeys that have better chances of engaging with your site because they are likely to buy.

So how to adapt your content and SEO strategy?

I have been saying this for ages and it looks like this is something that needs to happen: Stop depending on Google.

We’ve been relying on Google’s traffic for over two decades because basically, Google has been the only organic findability option we had.

This is going to change, and it is a good change. There will be more ways to discover brands and products, so there will be more opportunities to get found.

While we have some time, think about trying the following:

Create a long-term brand-establishing strategy. This is key. Don’t expect fast tangible results from that. This is not how you build a brand.
Develop more channels to build visibility and nurture a community. A branded subreddit is a great option that will also help you control your branded search and control your important SERPs. Reddit is also great for making your brand a trend and sending high-quality editorial links if you approach it wisely.
Up your analytics and conversion monitoring game. Keeping track of referral traffic is going to be even more challenging. Set up your GA4 conversion tracking and A/B test your landing pages to identify better-performing conversion funnels.
Set up a newsletter and focus on building your list. The more channels you have allowing you to reach out and re-activate your customers or site visitors, the better.
Invest in new assets. Launching your custom GPT to attract and engage customers is a great idea.
Revamp your keyword strategy to prioritize more important queries that are likely to drive engaged traffic.

How to Choose an A/B Testing Tool in 2024

Uwemedimo Usa — Fri, 08 Dec 2023 17:43:07 +0000

The A/B testing tool market is bursting at the seams in 2024, with over 50 options to choose from. And that number is only growing. The recent sunset of Google Optimize has created a void in the market, prompting a surge of new tools as vendors eagerly step in to offer their solutions.

You’re spoilt for choice. With all these new tools popping up left and right, how do you separate the wheat from the chaff and find ‘the one’?

You don’t need to break the bank or go with the category leader to have your testing needs met. Consider what kind of optimization challenges you face on the regular—the best A/B testing software for you should match your needs and budget.

In this blog post, we’ll outline the factors you should consider when making your decision and share our top picks for the best A/B testing tools of 2024.

Let’s help you out!

But first…

Understand the Optimization Puzzle

This State of Experimentation Maturity report identified 4 pillars of experimentation maturity. These are core aspects of experimentation that helped organizations across various industries succeed at experimentation while scaling their programs.

It’s interesting what we’ll learn here. Those pillars are

Process and accountability: This includes the process for ideation and prioritization, experiment design, how success is measured, etc.
Culture: What’s the organization’s general attitude towards experimentation? How much C-suite support is present?
Expertise: Time, resources, and skill sets. And finally…
Tools and Technology: The tech stack powering your experimentation.

Although this research was conducted a few years ago, the fundamentals remain unchanged. Tools are but a very small piece of your optimization puzzle.

If you’re reading this to find the best A/B testing tool that can singlehandedly take your testing program from 0 to 100 or okay to fantastic, then consider this a reality check.

What then should be your focus?

Arguably, people and processes take the cake when it comes to experimentation success. The brands and agencies at the top of their CRO game take training very seriously. And so should you too:

You should invest in your team more than just as a best practice in A/B testing; it’s often an expert-prescribed priority for organizations who want to take their optimization game to the next level.

This training goes beyond basic A/B testing tutorials and also includes research, data collection and analysis, critical thinking, experiment design, statistics, QA, stakeholder communication, and so much more.

Let’s put a pin on this idea because we will touch on it a bit more later.

Consider employing an A/B testing vendor as part of your education mix when searching for the appropriate tool for your team.

The right tool will provide education that corresponds to understanding what practices and frameworks offer the most efficiency for high test velocity (which is a program objective that many of the best minds in experimentation, including Ronny Kohavi & Lukas Vermeer recommend) in the context of that particular testing platform.

Since people and processes make or break your program with a greater degree of influence than your choice of A/B testing software, an idea that is gaining popularity in the industry is to spend as little as 20% of the optimization budget on tools and the rest on education, recruitment and Ops.

For example, agency WeTeachCRO believes that tool spend can and should be far less than 20%:

Conventional wisdom says not to spend more than 20% of your total CRO budget on tools, keeping 80% spare for people – people after all are the ones that will make the real difference to your programme, whereas the tool is merely an enabler. But putting that “wisdom” aside, our experience is that most clients can (and should) spend far less than 20%.

For the majority of our clients, their tool costs are <£500 / month with a people cost 10-15x that, putting them more in the 5-10% bracket. Starting out with a free tool like Google Optimize should get you off the ground, with a view to graduating up to something that is a few hundred pounds per month once your test complexities exceed its capabilities
Excerpt from 52 Questions Every Experienced CROer Had To Ask Once by WeTeachCRO

Now that we have established the appropriate mindset, let us proceed to the next vital factor in purchasing an A/B testing tool, particularly if this is your initial acquisition.

If not, you can skip to outlining why you need a new tool.

Do You Have Enough Traffic to Test?

A/B testing doesn’t work for low-traffic websites. Before you begin testing, your website must have enough traffic to send to each variation of your A/B test. Without adequate traffic, your experiments will not reach statistical significance and may run for a long time.

In a typical A/B test, you need to drive thousands of visitors to each variation. At Convert, we recommend sending at least 5K visitors to each variation and around 1000 conversions to produce conclusive results of which variant is the clear winner.

Head over to Convert’s A/B testing calculator to find out how much traffic you need to run your A/B test. In the event you lack the required traffic to the A/B test, head over to our guide on testing with low traffic.

Once you’ve sorted out the required traffic to successfully run an A/B test, outline what’s prompting you to get a new A/B testing tool.

Start With the Problem

Determine the specific need for a new A/B testing tool. It is crucial to have a systematic approach to accomplish this. Here is the process:

Map Out Why You Purchased Your Existing A/B Testing Tool

If you already have a testing tool in your stack, what led you to it?

Take note. It’s your Point A that will help you reach point B, your next ideal tool.

What specific features attracted you and your team to this product? What features kept you as a user? What aspects of this vendor did you particularly appreciate?

Don’t shy away from digging deeper. Understandably, there’s a strong emotional aspect in our decision-making process, but for the sake of a solid optimization program moving forward, you need to drill into your current choice.

Where Was Your Program When You Started?

Were you starting out in A/B testing and just wanted to get your feet wet? Did you hear about website optimization on a growth marketing podcast and had this tool recommended to you? Or were you on a free tool like Google Optimize and wanted to amp up your testing beyond its limits?

What Guardrail Metrics Were You Hoping to Move with the Tool Acquisition?

Guardrail metrics are meant to keep you aligned with your goals and objectives. They alert you when things aren’t going quite as planned.

So, what specific metrics prompted you to acquire the tool? Was it to improve revenue or enhance UX metrics such as bounce rate and order rate?

Did You End Up Achieving Your Goals?

If not, then why? Did your testing program pivot? Or was the tool—GASP—a bad buy?

Answer these honestly and you’ll have a clear map of the problem. That’s your point A.

You didn’t find any problem at all? Or you don’t have the answers to some or all of these questions? Who are the actual buyers and the heavy tool users? Having an informal chat with them will get you the answers you need.

Because you don’t want to change tools just because you have the budget and someone wants a fancy new feature.

Features hardly ever solve problems. They can, at best, help in the execution of solutions. And the shift should make sense for you:

Adoption-wise
Vendor support-wise: If you have a great rapport with your existing vendor, this means something. It is having a partner on the journey.
Roll out and ramp up time-wise: Days lost adjusting to a new tool are the days you won’t be testing, either at all or at capacity.

Bottom line: Decide to shift if and only if you know ‌the move will tackle all the elements that make up the problem you’ve identified, as well as plug holes and bring additional value to your people and processes.

The tech will catch up to whatever bells and whistles you think you need if those bells and whistles enable a real opportunity and not trends. Don’t rock the boat for one feature.

Ask the Right People

Once you have established that there is a real problem that can be solved by experimentation or by changing platforms and not something either frivolous that won’t be worth the effort or a chimera (the ghost of operational and procedural chaos manifesting as tool trouble), then move on to building your consideration set.

To construct the consideration set, we must equip ourselves with accurate data to make a well-informed decision that encompasses all aspects. And as with everything, the pertinent data are your challenges, not subjective viewpoints.

You’d also want to avoid a common mistake most people make—going for the category leader. Category leaders are rarely the best fit for everyone. Neither should you default to the same tools you evaluated last time.

Because experimentation is complex. You need a platform that will support your specific program objectives and your experimentation workflows. While fitting perfectly into your culture, since all the pillars of experimentation maturity have to work in tandem.

Be open to choices. As Peep Laja says, they instantly improve the situation.

So, where to begin?

What B2B Buyers Say About Buying Tools—Here’s Where They Start

We ran a quick poll on LinkedIn. We asked B2B software buyers, “What is the first thing you do when buying software that costs over $5,000 a year?”

They answered:

57% do a Google search
21% research known brands
14% start on G2 or TrustRadius, and
7% ask for referrals

Additional opinions included those who will do a combination of all four. Though not a significant sample, it is fairly representative and gives us a glimpse into buyers’ preferred method to begin their tool search.

So, pick the tool with the strongest SEO? Not exactly. We’re taking this data with a massive pinch of salt. It illustrates that when we don’t have a clue about something, we’re kind of wired to turn to Google first.

While that’s a springboard for your search—giving you a list of A/B testing tools to consider—you need a more informed strategy. One that comes from the bottom of this poll: Ask the right people.

What We Recommend

Consult people who have used A/B testing tools extensively, your team, and also see what competitors or other players in your niche use.

Ask People in Your Network Who Test Regularly

Start with

LinkedIn: Look into groups for conversion rate optimization pros. Some are more specific for SaaS, marketing professionals, and Shopify. Or make a post with the right hashtags (ex: #CRO, #abtesting #experimentation, and #optimization), tag the frequent A/B testers you know, and publicly ask for recommendations.
Slack communities: Request recommendations on Slack communities for marketers such as CRO Growth Hacks, Growmance, and Women in Experimentation.

You’ll want to avoid Google results: Avoid Google search results for specific A/B testing platforms or to make a decision. You want a tool that will suit you, not the tool with the best SEO.

You can also…

Use BuiltWith to Look At Other Brands in Your Space

This works nicely if you are in a niche industry. The oft-recommended platforms may not have integrations with your tech stack. Or they may not support experimentation to improve aspects and metrics that matter to you.

For example, there is a lot of conversion rate optimization in the Shopify space, but most brands accept that struggling with revenue tracking is part and parcel of setting up an A/B testing tool.

Do you want to struggle with anything that matters to your experimentation program?

That’s why the A/B testing tool you’re considering should have specific provisions to accommodate your testing needs. Like how Convert offers a custom Shopify app to get rid of the revenue tracking problem.

To find what brands in your space are using, all you need is their website address to run a search on BuiltWith.

Source

Or you can search the tools you’re considering and see who’s using them in your space. There are limits when you use a free account.

Tap Your Team

If you have a team that regularly conducts tests in your organization, they are one of the best sources of information for your consideration. They are actively involved in the process and will have a good understanding of the situation.

Get their opinion. Ask them why they wish to go with specific tools. What features are non-negotiable and which are just nice-to-haves?

What past experience trumps fancy feature releases? The best vote of confidence is one that comes from a tester who has succeeded in the past with a particular platform.

Gather this information and structure it. If you find a recommendation, do a quick check against your problem to ensure that the suggestion has merit.

Build Your Consideration Set

At this stage, you’ve shortlisted 3 or 5 tools that you’re seriously interested in. That’s a solid starting point. But right now, your decision may be based on intangible factors, referrals, and brand recognition.

When you build your consideration set, you will create a matrix of these 3-5 A/B testing tools and hold them against the components of a problem. That is, how they measure up in solving the major problems you’re looking to fix with the right A/B testing tool for you.

Let’s illustrate:

PROBLEM: Delays in experiment deployment.

Tool	Releases budget to add 2 A/B testing devs	API allows complete programmatic access	Priority support with knowledgeable tech advisors in the EST timezone
A	✔️	✔️	❌
B	❌	❌	✔️
C	✔️	❌	✔️
D	❌	✔️	❌
E	❌	✔️	❌

If you’ve pinpointed more than one essential problem, you’ll want to have another consideration set for those as well. But with the same tools, only different components of the problem.

Note: Since all problems have different levels of priority, ensure that you’ve weighted them accordingly.

So, how do you know what to fill into a consideration set like in the example above?

Filling in the Gaps

Reviews

Start with popular software review sites like G2, Capterra, and TrustRadius. Remember: Sponsored profiles may be displayed more. Yet, this has nothing to do with how good a tool is at solving your problems.

Source

Cons

Aim for those 2-4 star reviews. The cons there are usually well thought out. Dig through them for issues that might be devastating to your goals. See something you’re trying to avoid? Put an “X” there on the consideration set.

Capterra (Source)

G2.com (Source)

Trust Radius (Source)

Take critical reviews with a pinch of salt, though. What went sideways for someone else isn’t certain to go sideways for you too. But if you notice a common denominator in 3-4 reviews, then there’s a pattern worth paying attention to.

Comparison Tables

Some of the work is already done for you. Look at comparison tables—both from vendors and from unbiased 3rd party sources like Speero or HubSpot and CXL.

CXL’s A/B testing tool comparison (Source)

We have a tool to help you compare and contrast Convert Experiences, Google Optimize, Optimizely, VWO, AB Tasty, Qubit, and more.

Interested in reading how CRO experts purchase A/B testing tools?

Here is what Ruben De Boer has to say about choosing an A/B testing tool:

First, of course, you have to make sure you have sufficient traffic for A/B testing, that at least a part of your company supports testing and that development has enough resources to implement winning experiments.

Next, take your current and future needs into account. Start by assessing the current situation, like your goals, needs, and who is involved. For example, if IT is fully involved, you could choose a server-side solution. Whereas if only marketers are involved, client-side is most likely the way to go.

Then, instead of only looking at the current situation, consider where you want to be in a few years. What needs do you have then? Will you scale up the number of experiments? Have more team members use the tool? Run different kinds of experiments? Discuss the possibilities to ramp up with sales reps of the various vendors.

Finally, conduct a free trial with the two-three tools that best match your current and future needs. During the free trial, check for the tool’s user-friendliness, (targeting) options, and other things applicable to your business (i.e., does it work with your single-page application).

Also, a personal connection is important, as you might need their customer support. For example, is there help with onboarding? If so, how happy are you with it? Also, contact their customer support. Besides checking the available channels (chat, mail, phone), see how quickly they respond (are they in the same time zone?) and how helpful they are. Finally, check with IT the tool’s impact on your site performance.
Ruben De Boer, Lead Conversion Manager at Online Dialogue

And this is Max Whiteside’s take. Max is the Content Lead at Breaking Muscle:

Define the testing scope, as doing so will enable you to restrict the testing capabilities of your A/B testing tool. Do you intend to conduct only A/B tests, or do you need a tool that can handle multi-page and multivariate testing as well?

Each tool operates in a unique manner and offers distinct test types. So, look for one that has all the features you need, so you don’t have to switch tools as your testing capacity grows.

Secondly, the budget A/B testing is an iterative process, which makes budget a crucial consideration when implementing a testing environment. You may need to repeat a test multiple times before achieving the desired result, at which point you can proceed to the next iteration. It necessitates both time and resources, and your budget will dictate its upper limits. Find a tool that won’t make a significant dent in your pocketbook.

Some testing tools may have a negative impact on the speed of your website while running a test. A slow-loading page can increase the bounce rate and irritate users, leading to fewer conversions. In such a scenario, neither the control nor the variation would perform optimally, rendering the A/B test results unreliable even if the minimum sample size is met.

Moreover, the flicker effect is minimal. The flicker effect occurs when a user briefly views the original web page before being redirected to the variant. Flicker can hinder user experience and skew test results. To maintain the reliability of your A/B tests, it is best to select a tool that does not produce a flickering effect. Determining whether the A/B testing tool slows down your website is therefore essential.
Max Whiteside, Content Lead at Breaking Muscle

Make Contact with the Tool

Up to this point, you’ve been a secret admirer of the tools you’re considering. Now’s the time to overcome your shyness and make contact.

And the best place to start is a free trial or demo.

Start a Free Trial & Book a Demo

You need to see the tool in action.

How does it perform in various A/B testing applications? As a CRO for a Shopify (or any other ecommerce) store, for instance, you’ll want to know how it performs from testing CTA copy for pop-up offers to product image tests.

A demo shows you how the tool solves your problem while a free trial reveals if the platform backs up the claims with the goods (i.e. features, pricing, service, roadmap).

Note: Most enterprise-level A/B testing tools won’t get on a demo with you before they’ve pre-qualified you. And they do not offer free trials. This can be a blind spot in your evaluation process.

Gauge the Culture

You might ask, “Why does culture even matter in the tool I’m buying?”

When you use an A/B testing tool, the vendor becomes part of your experimentation culture. That relationship works out best if you’re similar in business practices and values.

Don’t worry, this won’t narrow down your search to where you only have 1 or 2 tools to consider. A good culture is actually not rare. You just want to avoid friction that can get too uncomfortable and prompt a tool change too soon.

So, how do you gauge an A/B testing tool’s culture? You can look at Glassdoor reviews. People who work or have worked there can give a fairly transparent (although anonymous) insight into the culture.

Source

Another way to assess an A/B testing tool’s vendor is their interaction with the industry. Do they truly educate the space?

Source

What type of content can you find on their blog and social media pages? How do other players in the industry interact with them?

via GIPHY

Are they looking beyond their tool to help you with improving your processes? It’s more than just posting “our tool is great” content. You can clearly spot a brand that’s invested in the success of its audience. It’s hard to hide such a bright light.

Request Information You Are Missing

Yes, you can ask the A/B testing tool vendor to fill in the gaps for you with Request for Information docs.

RFI documents are similar, so a template like this would work. What’s going to be unique in yours, besides the introductory information about your organization and the goals you want to achieve, are the questions you ask. You’ll want to know

The vendor’s track record
The success story of customers similar to your organization
Pricing
References, and more

But also importantly, get the name and contact info of the person responsible for answering the RFI.

Need inspiration to populate your RFI document? This post by Ronny Kohavi is a gold mine of pre-purchase considerations.

Pre Purchase Gear Up

Part of the effort in making a successful A/B testing tool purchase is in the pre-purchase gear up—the transition from no tool or current tool to fresh software in your tech stack. What do you need to do to migrate smoothly?

First, reveal your pick to your team and get buy-in. If you’ve not been working alone so far, and you’ll use the new tool as a team, they need to be on the same page with you. Let them know your choice and explain the reasoning behind it.

This is why having a structured approach helps. Now you can show the meticulous considerations you went through to land on this decision so that it makes perfect sense to choose this tool. With everything done right thus far, buy-in is easy at this point.

Discuss your decision as a team. How does it connect with the current problems in your experimentation journey? How does it support your goals for growth? What are other users saying about the tool? What did you learn from your research? And how did it hold up in your consideration set?

Also, focus on the “why” behind choosing the vendor. Beyond the sheer capabilities of the tool, are you going to be working with a vendor that supports your testing goals? Do they provide the educational backing your team needs to make the most of the new A/B testing tool? What are the vendor’s goals for their users?

Then, draft up a timeline for the transition and prepare to roll out. Migrating needs to be planned carefully to prevent hiccups and costly delays.

This is why you’ll want to work with a vendor that supports you. Will the vendor assist with the migration? Can you shift your data to the new vendor? Is there a robust onboarding process? Find out what expertise you need to get that done and how long it’ll take for each stage.

Finally, zero in on your goals for the optimization program broken down by 3-6-9 months. That way, you have quarterly checkpoints to assess the decision you made. Is the chosen A/B testing tool and vendor keeping up with the expectations you started with?

Careful here: Expectations need to be realistic. Make sure you don’t gauge your vendor on outcomes that they can’t influence.

Let’s make this an actionable checklist you can take with you.

A Handy Checklist of Factors to Keep In Mind As You Migrate

You want to migrate from good to better; smart to genius; amazing to blissful. That doesn’t happen by chance.

Tick all the checkboxes below so you can be sure the A/B testing tool you’re migrating to performs beyond your expectations and helps you achieve your experimentation goals.

Cost — Does the cost of the A/B testing tool justify the results you’re expecting? How does the cumulative investment in the tool measure up to your expected ROI?

The Learning Curve — You don’t want to spend valuable experimentation hours trying to figure out how to do basic tasks with your chosen testing tool because of a UI cluttered with tons of features you’ll never use. Read reviews to learn about the learning curve of the tool.

Flicker — Flicker occurs when users see the original version of a page flash on their screen before the A/B testing software presents them with the variant. This can mar the results of your test.

Some tools like Convert Experiences have developed anti-flicker technology, so this doesn’t happen. But many free and open-source A/B testing tools don’t yet have that capability.

Out of the Box Integrations — Your A/B testing tool isn’t a lone wolf. It has to team up with other tools in your tech stack to bring you maximum value. Check out the number of out-of-the-box integrations your CRO tool comes with. Does it integrate with vital tools you need it to work with?

Google Analytics — Ensure your tool has a smooth integration with Google Analytics. When setting up, this is one of the first tasks you’ll have to complete. Find out if you might need a developer to help with that tool.

Goals — Goal setting is an integral part of your A/B testing. It adds context to your tests and illustrates the impact of your test results. How does your new tool handle this? Can you import goals from analytics?

Revenue Tracking — When migrating, you want to ensure your revenue tracking is properly configured for both one-time and multi-revenue events. Some tools require manual setup. You don’t want that.

Convert Experiences integrates with GA4 to pull values of transactions, revenue, and ordered items to inform the tests.

A/A Test — Does the new tool pass A/A tests? If you have free trial access to an A/B testing platform, run this test. That’s how you know your tool is primed to handle A/B tests without biases—or not.

Collaborator Settings — Particularly important if you work in a team. Can you control access to what your collaborators can view and make changes to?

And these 2 additional considerations to remember:

1. What Level of Customer Support Does the A/B Testing Tool Provide?

Another point of consideration in your assessment is the level of technical support available to you as a customer.

Buying an A/B testing tool may be an enormous commitment. You need to be sure that the tool provides enough technical support to your team in the event of difficulties.

The level of support offered by the top A/B testing tools on the market varies. With some research, you can find out if the testing tools you’re considering offer all or a combination of the following:

Live support: offered by phone or video call
Email support: offered by a dedicated email address.
Live chat: using an inbuilt messaging app
Documentation: an online repository of articles that provide solutions to common problems.
Automated support: a set of pre-recorded responses.

Once you have confirmed the level of support available, you need to find out if the people stationed in the various support channels are CRO experts. It is especially important that you choose a tool with CRO experts for support personnel, as they can better provide the real-time help you need if you run into technical issues.

It is also worth noting that some A/B testing tools offer dedicated customer support as an add-on which you need to pay for or upgrade to a higher plan to access for free. Paying for dedicated technical support or upgrading to a plan that far outweighs your testing needs to access better support will increase the overall cost of the A/B testing tool.

At Convert, our team of developers and CRO experts provides technical support to our customers for free. Our support takes the form of live chat, online documentation, email support, and phone support. We also offer Convert Assistant, a ChatGPT plugin, that offers real-time answers to your questions by accessing our support documents and blog posts.

2. Does the A/B Testing Tool Impact Your Site Speed?

Another important consideration when buying A/B testing tools is their impact on website speed. Some A/B testing tools can add seconds to your website speed and increase load time. Slow website load time affects user experience and may affect your traffic.

Let’s get into the mechanism of how some of the top A/B testing tools on the market can impact your website speed. A/B testing tools usually fall into two categories:

Client-side A/B testing tools
Server-side A/B testing tools

Client-side A/B testing tools use a JavaScript snippet stored on a visitor’s browser to alter the content your server shows the visitor based on your targeting instructions. Server-side A/B testing tools deliver the exact content with alterations from the server to the user without using touching the user’s browser.

Server-side testing tools seem like the best option, but they can be expensive and not relevant for many use cases. You can delve more into the differences in our client-side vs server-side testing article.

Many of the top A/B testing tools out there are client-side testing tools. Client-side A/B testing tools load in two ways:

1. Synchronously

This is when the JavaScript snippet of your A/B testing software loads fully before your web page loads. This way of loading the scripts prevents flickering. But may cause your page speed to slow down, as the code snippet from your A/B testing solution load time adds to your overall page speed.

2. Asynchronously

This is when the code snippet from your A/B testing platform of choice and web page loads simultaneously. Tools that use this method of loading may be fast but are prone to flickering.

As I mentioned earlier, flickering occurs when the visitor sees the original content before the variation loads.

Convert is one of the fastest A/B testing software on the market. This is thanks to its Smart Insert Technology^TM which reduces its impact on your site speed.

3 A/B Testing Tools We Like

Where do you start in your search for the perfect A/B testing tool to buy? Right here with our favorites.

We’ve considered everything we talked about above and created a list based on what we know about our audience.

Let’s dive in!

1. Convert Experiences

Source

Convert Experiences is for companies and teams looking to scale up their testing program with a mid-level testing platform that comes at a great price. You get fantastic support through your experimentation growth while staying ahead of privacy and data protection requirements.

Convert seamlessly integrates with over 90 tools in your MarTech stack and frees you from testing limitations you might’ve experienced with free tools and even some paid testing tools.

What we like about the tool: Everything! We built it 🙂

TrustRadius Rating: 9.0 out of 10 (19 reviews)

Pricing: Starts at $499/mo

Free plan or trial? Yes, for 14 days with no credit card info needed to start

Cost per 100k visitors: $249.5

Pros:

Stable, flicker-free testing
Full stack and feature flags even in the Basic plan
Over 90 integrations with other marketing tools (Shopify, WordPress, HubSpot)

Cons:

No change history in the Basic plan

Client-side or server-side tool? Both. You can use a client-side visual editor as well as set up to run custom JS on the server side.

Integration with Google Analytics? Yes.

Core Web Vitals ready? Yes. You get fast page loading, flicker-free, and minimal effect on CLS. So, your SEO isn’t affected.

Frequentist or Bayesian stats? Both.

Enterprise Support? Yes. We serve companies like Sony, Unicef, and Jabra.

Customer Support? Yes, including live chat, a support center, and a ChatGPT plugin.

Consciousness Quotient (CQ): We’re an outspoken campaigner for user privacy and have environmental consciousness deeply embedded in our DNA. We are fully GDPR compliant and don’t use tools that aren’t GDPR compliant. And we’re proudly carbon negative.

Fair TrustRadius review:

Source

2. Kameleoon

Source

Kameleoon is one of the most popular A/B testing platforms in the world today and a favorite of Healthcare and Fintech companies. It brings extra perks to the table with better functionality than most tools. It also comes with a great UI and superb personalization settings and integrations.

Why we like this tool: AI-powered visitor segmentation with full-stack personalization options

TrustRadius Rating: 9.4 out of 10 (83 Reviews)

Pricing: You get a custom quote on request.

Free plan or trial? Yes

Cost per 100k visitors: Not applicable

Pros:

Advanced anti-flicker technology
Easy-to-use WYSIWYG editor
Integrates smoothly with other marketing apps

Cons:

Takes a while for the WYSIWYG editor to load
Cannot archive tests

Client-side or server-side tool? Both

Integration with Google Analytics? Yes

Core Web Vitals ready? Yes.

Frequentist or Bayesian stats? There isn’t readily available information about this.

Enterprise Support? Yes. Their enterprise customers include Toyota, Rakuten, and Lexus.

Customer Support? Yes. You can also get an account manager.

Consciousness Quotient (CQ): They’re HIPAA, GDPR, and CCPA compliant and also have a flexible consent management feature for each A/B test you run.

Fair TrustRadius review:

Source

3. Optimizely

Source

Optimizely is one of the pioneer web experimentation platforms and has earned a great reputation as being a solid tool to rely on. Its pricing, though, can be quite prohibitive to the average CRO team as they’re very enterprise focused.

Why we like this tool: Powerful tool for experimentation combined with AI-powered personalization

TrustRadius Rating: 7.9 out of 10 (20 Reviews)

Pricing: You need to submit a request to get your custom estimate — only annual plans allowed.

Free plan or trial? No, not since 2018

Cost per 100k visitors: Not applicable with this tool

Pros:

Wide range of targeting options available
Easy-to-use widget feature
Clutter-free UI

Cons:

Very pricey

Client-side or server-side tool? Both. Client-side experimentation works with a JS snippet and server-side is done with developer SDKs.

Integration with Google Analytics? Yes.

Core Web Vitals ready? Yes.

Frequentist or Bayesian stats? Optimizely uses sequential experimentation, not the fixed-horizon experiments that you would see in other platforms.

Enterprise Support? Heavily caters to enterprise customers because of the pricing. Customers include HP, IBM, and Microsoft.

Customer Support? Yes, including 24/7 call numbers and help resources.

Consciousness Quotient (CQ): They’re privacy-conscious and have a tradition of getting their new hires involved in community volunteering on their second day.

Fair TrustRadius review:

Source

Key Takeaways

Finding the right A/B testing software for your needs isn’t as easy as it seems. Complex factors like your testing needs, pricing, integrations, available skillset, set-up, training, and more, need to be taken into consideration.

When you start with the problem, ask the right people, build your consideration list, and build a consideration set based on a careful analysis of the crucial factors that can make or break your experience with your A/B testing tool, you’re already halfway there.

You can choose to start your test with the tools we’ve listed above. But if that’s not your cup of tea, then we’ve given you all you need to go out there, research over 50 A/B testing tools, and find the right one for your unique needs.

Because the ultimate decision lies with you.

Fundamentals of Product Experimentation for Beginners

Uwemedimo Usa — Mon, 13 Nov 2023 20:34:22 +0000

Growing a successful product means constantly adapting to your users’ needs. Product experimentation is key to this. It’s a tool for testing and improving your product while you continuously develop it.

Using this tool, however, could take a bit more effort than you’d expect at first. In this product experimentation handbook, we’ll show you how to start, what metrics matter, and how to use a product experimentation framework effectively.

So, product managers, growth marketers, developers, founders, or anyone looking to enhance user experience and boost product adoption and retention, hang on to this.

In the end, you’ll be able to use product experimentation to validate each step of the unending journey toward the better version of your product for your users.

Let’s get started!

Why Is Product Experimentation Essential in Today’s Climate?

Tech giants like Booking.com, Netflix, Meta, Google, Microsoft, and Amazon have been testing hypotheses to enhance their products and services for decades. Nowadays, they conduct up to 10,000 tests each year.

Here’s what they understood:

Relying solely on intuition and best practices for product development decisions isn’t enough. A data-driven approach is crucial for understanding users and making informed improvements.
Before making any changes or launching a new product, it’s vital to validate assumptions and test hypotheses.
Embracing the scientific process is essential for adapting to market shifts and changing user preferences in a calculated manner—as opposed to haphazardly.

This is one way they’ve been able to cement their status as tech giants and grow insanely fast.

From Experimentation Works: The Surprising Power of Business Experiments by Stefan H. Thomke (Source)

Of course, many factors contribute to the value of a company. Still, you can’t deny the power of experimentation-derived insights that support better decisions — whether in evolving the product or avoiding damaging changes. If you deny it, I’ll refer you to Blockbuster and Digg.

For most businesses today, SaaS companies especially, product experimentation is a must-have. You’ve got considerable competition and rapidly switching market preferences to contend with while building products. How would you know your product development is on the right track to meet business goals?

With experiments, you can establish a causal link between ideas and outcomes, so you’ll know precisely what to build, upgrade, or tear down. It also empowers your team with a strong, well-rounded understanding of various strategies and their impact, and helps you build on previous knowledge through iterative learning.

Ultimately, product experimentation accelerates innovation and product evolution. As ideas are implemented based on solid evidence from testing against key product health metrics, you get to move faster in the right direction.

Product experimentation is also a cornerstone of product-led growth (PLG). Here’s how…

Product Experimentation Meets Product-led Growth (PLG)

Product-led growth is a customer-needs-focused strategy for growing your business that depends on your product to do the heavy lifting. With such a heavy reliance on the product to acquire, activate, and retain customers, it must be truly solid.

That’s where product experimentation comes in. To acquire customers, for one, your product needs built-in virality — its own ability to:

Drive outstanding outcomes (like ChatGPT did with human-like interactive AI)
Get users to discover value quickly (like Canva does with thousands of professional-looking visual content templates), or
Invite others to partake as well (similar to Dropbox’s “invite friends for more storage space”)

The easier it is to realize these Aha! moments with products, the easier it gets to activate users, i.e. get them hooked, build a habit, re-engage, and retain.

Product experimentation makes this happen in the way it illuminates the customer journey, helping your organization build that understanding grounded in evidence.

Another way it puts your customers front and center is by fuelling this core tenet of PLG: continuous improvement. You get to create seamless and satisfying user experiences one validated step at a time.

This leads to a lot of benefits including lower customer acquisition costs and scaling faster, regardless of headcount. All with the insights from product experiments that support product-led growth.

Ruben de Boer puts this into perspective as an emerging trend among modern organizations:

Product-led growth will get experimentation more and more into the product teams. I see this trend happening in many organizations, and I am confident more organizations will follow.

Experimentation is indispensable when products focus on a growth outcome instead of simply rolling out features.

Product-led growth organizations are geared toward creating impactful changes that influence customer behavior, satisfaction, and revenue. To realize this, product teams need to make the right decisions for growth. For this, they need to experiment.

Imagine if just 25% of product updates make a positive impact on the outcome, it’s crucial to know the 75% that shouldn’t see the light of day. Not experimenting will jeopardize the end goal.

Furthermore, experimentation allows for fast innovation cycles, meaning the product will continuously evolve to meet market demands, keeping users engaged and attracting new ones.

Thus, experimentation is crucial for successful product-led growth.

Ruben de Boer, Lead Conversion Manager, Online Dialogue

Product experimentation offers a measure of assurance in maintaining a great user experience by selectively rolling out impactful ideas while holding back ideas, features, and upgrades that could muddle the user’s journey and disrupt the delicate sequence of positive interactions leading to product endorsements.

Instead of planning a roadmap for an entire year, experimentation allows for a flexible roadmap as it offers insights, validates assumptions, and drives quick iterations and innovation.

Experimentation provides data-driven insights that reduce reliance on gut feelings or assumptions. It enables teams to make informed decisions based on continuous user behavior and feedback, data analyses, and experiment outcomes. Product teams can create and test better features and designs as they better understand the users’ preferences, gains, and pains.

Because this learning happens continuously, the product direction changes based on the insights to match the users’ needs best.

While experimentation significantly influences the product direction, it is key for product teams to balance these insights with the company’s strategic vision. Integrating the learnings with the company goals ensures that the product meets user needs and aligns with the organization’s growth and innovation objectives.

Ruben de Boer

Web Experimentation vs Product Experimentation

Web experimentation and product experimentation, though quite similar (and often intertwined), serve distinct purposes within the sphere of optimizing user experience and product development.

We could think of website experimentation as a small part of product experimentation, in the same way (controversial, I know) that we could understand CRO as a small part of experimentation, or UI/UX design are parts of Product design.

Strictly speaking, web experimentation would deal solely with the website itself, and the digital interface, whereas product experimentation could involve testing all aspects of a company’s offering, from improving features to introducing new ones, its pricing, delivery, and aftersales, etc. It’s full-stack experimentation on steroids.

Some lines may seem blurry, like whether improving the algorithm on a product like Spotify’s recommendations is a web experiment or a product experiment, but from my point of view, if it’s not to do with the interface itself, it’s product experimentation.

Therefore, and linking back to a previous Convert article, we could argue that innovation (Exploration) is much more likely to happen with product experimentation, whereas exploitation is largely in the realm of web optimization.

Properly executed, product experimentation can lead to truly transformative outcomes for a business as its depth and breadth are much larger, but it can be more costly and resource-intensive since the risks are much higher.

Web experimentation has a much lower barrier of entry, so it can be a great gateway for companies to evolve in their maturity over time and embrace a culture of experimentation over time. But that takes time.

David Sanchez del Real, Head of Optimisation at Awa Digital

Here’s a breakdown of their differences based on their focus and scope:

Scope

Web Experimentation	Product Experimentation
Tests variations in web interfaces such as landing pages, forms, or other parts of a website to optimize specific outcomes like click-through rates, conversions, and other web metrics.	Tests the entire product, including features, workflows, and even pricing models. It’s about optimizing for the best version of the product that meets the user’s preferences while achieving business goals. It’s like a more expansive form of full-stack experimentation.

Focus

Web Experimentation	Product Experimentation
The UI and UX of the website for higher conversion, better engagements, or other web-specific goals.	The entire user experience within the product, which can also include the website, for higher retention, revenue, and growth metrics.

Depth of Analysis

Web Experimentation	Product Experimentation
Usually involves marketing- and sales-level analysis of the business.	Goes deeper into how different aspects of the product affect overall user behavior and business outcomes.

Example

Web Experimentation	Product Experimentation
A/B testing a different form layout on the same landing page to see which produces the higher conversion rate.	Testing a different onboarding flow within an app to see which results in better user retention and engagement over time.

Kelly Anne Wortham puts a deeper spin on this:

Product testers should keep in mind the important difference between marketing/website testing, which has always been about the ability to “fail faster,” and product testing, where failures are incredibly costly. Because of those costly risks, it’s more important with product testing to fully understand the customer journey—qualitatively and quantitatively—before you make product decisions.

To do that, product testers should take these steps:

Utilize extensive research first – both qualitative and quantitative

Practice customer empathy to define what success looks like

Use rapid experimentation methods to gather evidence to support your build.

The standard product life cycle is Build-Measure-Learn (hopefully).

However, a sound product experimentation framework turns that product lifecycle around by first asking the question: What do I want to learn?

Then: What would I need to measure to learn that?

Last: What would I need to build to measure that?

Then, and only then can we start to Build-Measure-Learn guaranteed.

Kelly Anne Wortham, Founder of Forward Digital Org

Product Growth Stages: Key Metrics that You Can Impact with Experimentation

Inspired by Elena Verna’s ‘dirty dozen’ of PLG B2B SaaS health metrics, we’ve outlined key PLG metrics across the entire user journey and illustrated how strategic adjustments in onboarding processes, feature testing, marketing approaches, pricing structures, and more can enhance the user journey.

Let’s take a look at how you can leverage experimentation to attract, engage, and convert users more effectively, driving growth and revenue with precision.

Growth Stage: Acquisition

PLG Metric	How to Impact With Experiments
Web traffic: Measures the volume of new potential customers visiting your website.	Run experiments on marketing channels and page elements to determine what attracts higher traffic.
New accounts signups: Measures the number of accounts that have completed the signup process and begun onboarding.	Test different signup forms, social proof, and signup incentives to increase the number of users who complete the signup process.

Growth Stage: Activation

PLG Metric	How to Impact With Experiments
Activation completion rate: Tracks the percentage of users who complete the onboarding checklist.	Experiment with various onboarding guides, tooltips, and interactive tutorials to see which increases the completion rate.
Time to value (TTV): Calculates how long it takes a user to derive value from the product after signing up.	Test and measure how changes in the onboarding process affect the speed at which users find value in the product.
Activation velocity: Measures how fast users reach the activation stage in their user journey.	Experiment with steps to reach activation, providing more direct paths to key features or incentives, and targeted communication to boost speed to the activation stage.

Growth Stage: Engagement

PLG Metric	How to Impact With Experiments
Product stickiness: Measures how often users return to use the product—daily, weekly, or monthly (e.g. DAU, WAU, MAU)—which indicates engagement levels and retention potential.	Test different features, content, or engagement strategies to keep accounts within the desired frequency of use.
Product adoption rate: Indicates the percentage of users who are regular users.	Experiment with features or incentives that encourage existing users to explore the product’s value more regularly.
PQL (Product-qualified leads): Leads with high engagement levels which indicates they’ve experienced value with the product and have a higher likelihood to convert to paid plans.	Test different criteria for what defines a PQL to refine the sales team’s focus on the most promising leads.

Growth Stage: Monetization

PLG Metric	How to Impact With Experiments
Conversion to paid accounts: Measures the number of new accounts that upgrade to paid subscriptions.	Experiment with different pricing structures, trial lengths, or premium feature access to see what converts more accounts to paid plans.
Average revenue per new user: Tracks the average revenue per each new account, which tells you the value of the initial conversion.	Test upselling strategies, bundle offers, or personalized recommendations to increase the average revenue per new user.
CAC payback period: The time it takes for your organization to recoup the customer acquisition cost.	Experiment with more targeted customer acquisition strategies to reduce upfront costs and shorten the payback period.
Expansion revenue: Measures revenue from existing customers that result from upgrades, add-ons, or adding more users to an account.	Test different in-app prompts or email campaigns to encourage existing customers to purchase add-ons or upgrades.

Master Product Led Growth & How to Experiment to Grow. With Elena Verna’s Growth Scoop Substack. 22000+ Members: https://elenaverna.substack.com/.

As these PLG metrics tell the health story of your product and growth trajectory, experimentation helps you make data-driven decisions to rewrite the story.

Let’s put things in perspective. Here’s an example of a successful product experiment, where a marketplace drove millions of additional deals by running product tests in the activation growth stage.

They observed an unused feature called “save search” correlated with higher user engagements via messaging. So, to drive user engagement they had to push this feature into the spotlight.

First, they tested tooltips triggered on users’ first search, informing them of the feature. They tested a floating ‘save search’ button that followed the user as they scrolled through search results, increasing its visibility and accessibility. They also tested a prompt to save a search if the same search occurred thrice.

This resulted in a 300% increase in the save search feature usage. And that translated into a 44% increase in buyer-to-seller message exchanges.

Sometimes, product experiments ‘fail’ in the technical sense but reveal valuable insights.

Example: A company tested adding a new feature that was highly requested by many users in the hopes of increasing trial-to-paid (TTP) conversion, but it failed.

Doesn’t make sense, right? But they learned not to take test results at face value. When unexpected results show up like this, you can segment users to see a more detailed story.

For them, they learned that the segment of their users that resulted in no change in TTP rates wasn’t aware of the feature and never realized its value. They also learned that even people who didn’t use the feature but were aware of it were more likely to convert to a paid plan because of the perceived value. So, they prioritized notifying users of the feature.

Product Data & Analytics: What Do You Need to Have in Place Before You Can Experiment?

With product experimentation, you never want to take shots in the dark. So, your foundation must be built on critical elements to ensure your experiments are, as Ruben said, “structured, measurable, reliable, and yield actionable insights”.

Here are those elements:

1. Rally Your Team Around Clear Goals

Right at the top, create clear and measurable objectives for your product experimentation program—goals built around the product and your unique definition of success.

Clear objectives and measurable key metrics are essential to get everyone on the same page and working towards a common outcome-driven goal.

Ruben de Boer

Next, you want to make sure your entire team is on the same track in terms of the goals and KPIs governing your product experimentation program. Not only is this great for marching forward in unison, but it also makes it easy to get the support you need to succeed.

First, it is essential to determine the most important goal and corresponding KPI for your product. What would make the product more successful, what is a good metric of that? Sometimes it is a kind of usage metric like activity rate, daily returning visitors, or related, especially for subscription products.

If it is more commercial like a booking kind of platform, the number of bookings (or percentage of users booking something) will of course be very important. Also, it is important to align these goals with the team working on it, since it might not be that straightforward for everyone as it is within eCommerce.

Lucas Vos, Senior Conversion Specialist, RTL

When conducting product experiments, it’s crucial to establish well-defined success metrics, test multiple variations simultaneously, and be patient. A common pitfall is to conclude an experiment prematurely. Statistical significance is a thing. In my experience, 9 out of 10 experiments I’ve overseen saw shifts in the winning variation within just a couple of days of data collection.

Henning Heinrich, Head of Growth at Glassfy

2. Acquire Experimentation Tools with Server-side Testing Capabilities

You need testing tools that’ll accommodate experimenting across platforms and layers of your product.

Check your tool stack. Because digital products most of the time are active on more platforms than one, you might be having a challenge with your experimentation tool. Some of the standard tools (like Optimizely) support app testing for example, but it might be more complex to set up within those tools.

With that, it could be difficult to test the same changes on multiple platforms at once. Therefore, server-side testing might be something you want to investigate, as it should open possibilities to test the same changes on multiple platforms at the same time.

But to start, you can also consider Firebase to test on Android and iOS apps. That could be very useful through triggering different configs within your apps.

Lucas Vos

Examples of such tools with server-side testing capabilities are:

Convert Experiences
Split
LaunchDarkly
AB Tasty
Kameleoon, and more

And when implementing these tools, make them available to the rest of the team.

To get the best ideas and insights, everyone must be able to contribute to the process. Therefore, access to tools and data must be available for everyone.

Ruben de Boer

3. Implement Proper Data Collection and Analysis

Make sure your data collection process is strong and your analysis sharp. With reliable data, gaining your team’s trust and involvement becomes easier.

All data measured must be accurate and reliable to draw trustworthy conclusions. Furthermore, the data must be complete. All useful metrics must be available.

Ruben de Boer

Yes, this doesn’t translate to gathering every bit of data you can find. That can become a roadblock if done incorrectly.

Streamline your data collection process. While collecting more data may seem appealing, the potential drawbacks such as increased errors, compromised data quality, and higher costs often outweigh the benefits. It’s essential to consider what specific data is truly necessary for informed decision-making and to prioritize it rigorously.

[…]

In addition to the aspects of data visualization and analysis, product analytics offers an exciting opportunity to automate product-led growth (PLG) initiatives, such as activating power users in referral campaigns. Achieving this involves properly segmenting user cohorts and integrating with the right tech stack from the get-go.

Henning Heinrich

4. Imbibe a Culture of Learning and Improvement

Cultivating a culture that views failure as a learning opportunity encourages experimentation and innovation.

It’s also a great catalyst for getting hands on board because, with product experimentation, the involvement of multi-disciplinary teams is crucial to progress faster with a comprehensive approach.

To gather the most reliable and useful insights, you need specialists in UX research, UX design, data analysts/scientists, developers/engineers, and possibly a psychologist.

[…]

In addition to having the right team and tools, fostering a culture where failure is viewed as an opportunity to learn and improve is essential. Also, strategy, stakeholder buy-in, and education are crucial things that determine the success of product-led growth.

Ruben de Boer

An experimentation culture also means you make a habit of gathering user feedback as well as quantitative data. And you use them to generate problem statements and hypotheses to experiment on.

A process and tooling must be in place to gather user feedback. This can be done through surveys, interviews, and usability testing.

Ruben de Boer

There’s more. An experimentation mindset makes you, your team, and your leadership respect the learnings from failed tests—which is something to contend with in every experimentation program.

Fostering a culture where failure is viewed as an opportunity to learn and improve is essential. Also, strategy, stakeholder buy-in, and education are crucial things that determine the success of product-led growth.

Ruben de Boer

Finally, Lucas leaves us with some details you don’t want to miss if you’re testing apps:

When doing app testing, please consider that releasing a new app version with your experiment needs some time to get adopted because people need to update their app. Depending on the platform and product it might take more than a week for a serious ramp-up of usage of the new version, providing the necessary users for your experiment.

What happens within an iOS app can really be different within an Android app. At some point, the groups can be just different, as there are some minor differences in their characteristics. Test changes on both platforms.

Lucas Vos

The Structure of a Product Experimentation Team

To start product testing, a company needs a solid team composed of product managers, data analysts, designers, developers or engineers, and user researchers.

They’ll collaborate to plan and run experiments, typically following one of three common models below:

1. Centralized Model

Here, the experimentation team is at the center of all experiments in the organization. They sort of provide experimentation services to other departments and business units in the company.

What’s neat about this structure is the way it keeps all experimentation data in one place. It’s also easy to stick to program objectives, compliance, and culture. However, it can lead to delays in test deployment and may hinder others from getting involved.

2. Decentralized Model

In this model, experimentation resources are spread out among various teams and departments. This will allow them to independently hone in on specific areas like customer acquisition, conversion, or retention.

Individual teams get to own their experimentation process, but that can create silos and data inconsistencies.

3. Center of Excellence Model

This hybrid model merges the centralized and decentralized approaches. Individual teams handle their own experiments, with a central ‘Center of Excellence’ (COE) providing support and standards. It grants teams flexibility and autonomy yet ensures consistency and promotes cross-team coordination.

But this comes with its own challenges as well. Lines of responsibility can be blurred here. Teams may struggle to identify jurisdictions or decide when to seek guidance from the COE. There’s also the need for more funding for a bigger program.

Which product experimentation team model is right for your organization?

That’s up to your organization’s capabilities and goals. The centralized model is the best way to introduce product experimentation. But as your experimentation program matures, you’ll probably want to switch over to the center of excellence structure.

Product Experimentation Framework

A product experimentation framework is the basic structure that guides the tests you run to improve the product experience.

Melanie Kyrklund provides valuable insights into the broader context of product experimentation:

The process or framework for running product tests. How is this different from regular website testing?

One of the goals of product experimentation is to de-risk development by providing greater certainty on where to focus resources. Experimentation is a process that allows product teams to reach the goals they have set — which are anchored in business and customer needs and measured via KPIs and metrics specific to the outcomes and product being worked on.

Greater integration between engineering and experimentation processes is required to facilitate the validation of features of higher complexity, making server-side testing more prevalent in this field. Experiment development and QA processes need to take into account that a lot of code will be thrown away. Cost considerations are more important when prioritizing and building experiments server-side vs client-side.

Product teams will venture into testing more disruptive and complex changes compared to marketing teams. Product strategies are broken down into MVPs first so that key assumptions can be tested before considerable investment is put into building features. Testing is continuously used to validate features and strategies as they take shape. These types of tests fall in the “exploratory” and “validation” streams depending on the maturity of a feature. The third bucket at the end of this process is “exploitation” where features are fine-tuned and lucrative elements are optimized.

Inversely, exploitation is more the remit of website testing. It typically falls under the remit of marketing and commercial teams and is aligned with marketing and growth objectives. It is concerned with the optimization of key marketing journeys (landing pages and routes into conversion funnels) and the exploitation of promotional and merchandising vehicles. It is more likely to leverage client-side technology.

Melanie Kyrklund, Global Head of Experimentation, Specsavers

Next, we’ll delve into a 5-step framework for conducting impactful product experiments. We draw inspiration from the framework designed by Diksha Shukla, experimentation pro.

Connect with Diksha Shukla On LinkedIn.

Define KPIs

Build a dashboard tracking key product health metrics and KPIs to monitor user challenges, backed by user research. These KPIs must be relevant to your product, users, and business goals. You can reference the PLG metrics we mentioned above to guide you.

The metrics you choose must be relevant to your desired business outcomes and should be sensitive to the interventions made through your product experiments.

Ask yourself, “If this metric changes either positively or negatively, would that benefit or harm the business?” That thought process will steer you toward choosing impactful and actionable evaluation metrics, providing a clear metric framework for deciding on what to experiment.

Conduct Research

Understand your users’ pain points by examining their emotions, motivations, and situations through interviews, polls, and surveys. Use both quantitative and qualitative data for a complete picture. This research will provide the behavioral data you need to develop informed hypotheses for your experiments.

This step is a crucial foundational stage for ensuring your test design is based on solid data and clearly defined metrics. The statistical rigor of your experiments begins here.

Identify Gaps and Formulate Hypotheses

Let your data help you find problems. Where are users dropping off? What experience is causing that? What solutions can you formulate to fix that? Brainstorm together to create testable hypotheses based on your research. You can also use a hypothesis generator to put it all together.

However, you need to be alert to product experimentation pitfalls that appear here.

One way to ensure you’re not messing up the statistics of your tests is by formulating a correct null hypothesis.

Prioritize Test Ideas

What test ideas in your backlog should you test first and why? Consider one of these prioritization frameworks:

RICE framework: Consider reach, impact, confidence, and effort to prioritize features.
Value-effort matrix: Evaluate features to test based on their potential value versus the effort required.
Kano model: Prioritize features that provide the most user satisfaction (although this ignores cost and feasibility)
MoSCoW method: A straightforward one that categorizes features into must-haves, should-haves, could-haves, and would-like-to-haves.
Weighted scoring model: In this model, you assign scores to features based on specific criteria such as feasibility, strategic alignment, customer value, etc.

Choose the one that fits your specific experimentation scenario, product needs, and the nature of the features. Because the ultimate goal remains to make data-driven decisions, not follow a process.

Test, Learn, Iterate

With an experimentation tool that suits your needs, design your test and launch it. Then, collect results, learn from them, and iterate. By iterating, we mean the opposite of spaghetti testing. Using a structured process, delve deeper into the insights you’ve acquired from your tests (whether they won or not) to understand the ‘why’ behind the results.

It is common to make mistakes when analyzing test results. So, keep a straight head with your numbers. For instance, it can be super tempting to sneak a peak at how your test is doing early on. Don’t do that. Wait until your sample size is adequate before you make any calls.

And if you’re testing multiple features or changes at the same time, be aware of network effects. Control for multiple comparisons.

One more thing: watch out for those ‘outstanding’ results. Sometimes a bump in your numbers is only due to novelty effects and seasonality. Holidays or weekends can skew your stats, so take those into account and focus on lasting impacts.

Displaying Value Through Your Product Experiments

You don’t want to run product experiments just because it’s cool to do so. You want to bring tangible value back to the table. To start, keep the cost of testing in check.

How do you do that?

Add a “cost to implement” criterion in your prioritization framework and pick test ideas that bring the most to business goals at a lesser cost
Use A/B testing tools with built-in full stack experimentation capabilities
Be careful about variants that could have a huge negative impact on revenue, because as Dennis Meisner wrote, their exposure impacts business figures
Prioritize time efficiency, where you value quick experimentation cycles, high-velocity testing, and faster iterations. Also, in relation to the point above, the shorter the time a less-performing variant is exposed, the lower its impact on revenue.

After you start off on such a strong foot, turn your attention to sample efficiency. Sample efficiency in product experiments refers to the ability to pull reliable results from the least amount of resources. That’s another way to make product experiments more valuable.

So, to enhance your sample efficiency, you need to craft strategic hypotheses based on SMART business goals and prioritize the high-impact ones. It goes back to everything we’ve discussed so far.

You should also implement efficient sampling techniques such as sequential testing, which adjusts sample size based on data gathered as the test progresses. It makes tests run faster while reducing the sample size required for a statistically significant result.

It’s also sound practice to document every experiment in your experiment repository to maximize the insights you gather and continually improve sample efficiency.

With a reliable record of experiments, proving the ROI of testing becomes much easier. Use the metadata from your tests to show value by calculating:

The time it took your experiments to pay back its initial investment
The percentage of the output of benefits to input of resources of your experiences (translation: ROI of experiment = (Profit from the experiment – cost of the experiment) / cost of the experiment)

You can also benchmark the metrics you improved against industry averages and internal records.

This excites stakeholders about what you’re doing in experimentation and its impact on product direction. As you track learning and insights (with an unshakeable focus on metrics that matter), you can display how experiments have influenced the changes your product has gone through.

How to Embed Experimentation in Your Product’s DNA?

Because experimentation isn’t a one-off strategy, you’d want to build it a permanent home in your product development—right in its DNA. To do that, you first have to cultivate a culture of experimenting with your product.

This leads to innovative progress, continuous improvement, and sustained competitive advantage. Three things you need to survive as a business today.

Encourage a culture of thinking big, starting small, and failing fast. Be unafraid to come up with big ideas that may break things. Experimentation allows you to give these ideas (which are definitely based on proper research) their fair trial.

And when tested correctly, that’s only when you can say for sure that your idea was whack instead of just dismissing it in your head.

Plus when you start, start small in workload and ego. Identify the specific problems your users face and create a diced-up plan to solve them. Those are the ideas you test. Prepare yourself and your team to have your brilliant ideas shattered by the results and open up to ideas you’d otherwise dismiss.

This fosters a culture of continuous learning, where you encourage regular retrospectives, host knowledge-sharing sessions, and create a safe space for sharing failures and leanings.

If you’re responsible for leading your team in experimentation, promote vulnerable leadership. Be willing to admit mistakes and encourage others to do so with trust and honesty.

Celebrate big learnings too, not just big wins. With continuous experimentation and learning habits, your product’s DNA will be infused with experimentation quickly.

In summary, product experimentation as a continuous process of making data-driven decisions in developing your product around your users’ preferences is possible when you:

Involve leadership in experiments so they back teams and assess results
Value experiment outcomes, both wins and losses alike
Build and keep a team of people with an experimentation mindset
Measure product experimentation impact with the right metrics
Test fast and iterate often
Use the right testing tool

The Quiet Super Power of Iterations: From Spaghetti Testing to Strategic Experimentation

Trina Moitra — Tue, 31 Oct 2023 22:35:10 +0000

Let’s start with a bit of a mind-bender.

You know iterative testing adds structure to your experimentation program — at least on paper. You also probably know iterative testing makes you dress your testing program in adult pants and challenges you to ask tough questions like:

Is there a structure for how we’re executing things? And is there a grander plan—or actual strategy—propelling our experimentation or are we just indulging our egos?

Take a look at Erin Weigel’s post:

Unfortunately, too many programs view hypotheses as win/loss statements. They formulate a prediction, seek validation, and move on to the next. This pattern shows our human inclination to make predictions, doesn’t it?

However, we must remember that hypotheses serve as vehicles to transform research insights into testable problem statements — not as predictions to prove or disprove.

We’re not here to argue the nitty-gritty of how various experimentation programs use hypotheses. (If that piqued your interest, read ‘Never start with a hypothesis’ by Cassie Kozyrkov).

Instead, we’re setting the mood for a deep dive into iterative testing in a way that will unpack this rather loaded concept and use it in your experimentation efforts.

Between insight and action is the frenetic (yet mundane) work of testing.

Your formulated hypothesis and subsequent prediction are just one of many ways in which you choose to probe the validity of the connection you think exists between elements (in your control) and traffic behavior (beyond your control).

This insight, gathered from several sources of data and backed by rigorous research, highlights a problem—a conversion roadblock, if you will. In response, you introduce an intervention.

However, while the conversion roadblock remains fixed, the nature of the solution varies. It might depend on:

The psychological principles that inform the kind of influence we wish to exert: A fantastic example of this is the Levers Framework™ by Conversion.com. Anyone who has run any test or even thought through problems and solutions in business knows that you can manipulate cost to temporarily improve most superficial metrics. But if you are interested in profits (which you absolutely should be), it’s not the lever you should pull.
The research and subsequent insights that revealed the problem may offer clarity around the kind of lever you should be pulling. If you get 35 exit intent survey responses along the lines of “this is too expensive” but your competitors are selling their legitimately more expensive products like hotcakes, maybe work on Trust and Comprehension instead. This will unearth a can of worms around your viable market, and positioning. But hey, the journey to being better demands that some foundations be dismantled.
Finally, a solution also depends on how your chosen influence lever is presented to the audience. As Edmund Beggs says:

So particularly when you’re looking at the early stages [of testing a lever] and how you should interpret what’s going on with each lever, mattering matters most: it’s most important that you make a difference to user behavior.

Edmund Beggs, Conversion Rate Optimisation Consultant at Conversion.com

Here, something like the ALARM protocol is needed. It simplifies the multi-layered nature of execution. Presenting aspects like Audience, Area, and Reasons the Test May Lose as a check to

Ensure that the best possible execution is deployed.
Honor the insight; understand that a win, a loss, or even an inconclusive call doesn’t negate the value of the insight. The problem remains, the journey continues.

Iterative testing is important because it shows why A/B testing is not a one-time solution. It is instead, a continuous improvement that grows and evolves as your business grows and evolves. Having a test ‘winner’ does not always mean that it would be a winner in every circumstance and external factors should always be considered on the impact they will have on test results e.g. market conditions or time of year. Let’s think of how much the pandemic shaped the market for so many industries. A ‘winning’ test pre-pandemic (particularly the case for message testing), may not have had the same result during, or even after. That’s why it is important to continuously and gradually test, as well as re-test in different market circumstances.

Loredana Principessa, Founder of MAR-CO

This takes the sting away from tests that don’t light up the dashboard with green. When you see things clearly, you realize the true value of testing is learning from multiple tests, thousands of tests over the years.

While you may never completely mitigate every problem for every shopper/buyer, if your testing leads to more informed and better decisions, and fuel your next iteration, you’re golden.

The A/B Testing Equivalent of Retention: Iterations

So, what is iterative testing anyway?

Iteration is the process of doing something again and again usually to improve it. As Simon Girardin cleverly points out with a sports analogy, even hockey players iterate. After all, practice makes perfect.

As hockey players fire multiple shots, with each shot an improvement on the previous one, so too does iterative testing operate. It leverages insights generated from previous tests to inform the experimentation roadmap moving forward.

And it comes with a delightful array of benefits.

Fast Paced, (Often) Low Effort

Balancing big ideas and pre-planned experiments with fast-paced iteration should be a part of any mature testing program. The latter can reveal new customer insights that can help increase conversion rates for relatively low effort. So not only is ROI high but the insights we glean can be invaluable and used in other marketing channels as well. Iteration can be thought of as a treasure map where you get a reward at each checkpoint, not only at the X.

Steve Meyer, Director of Strategy at Cro Metrics

It is, in a sense, the red-hot retention concept applied to experimentation. We don’t yet have a dollar value to attribute to the ease of iterative testing but you increase your chances of hitting on an impactful intervention with it.

Big A/B test wins for us are most commonly an 8-12% lift. When we are iterating we usually try to squeeze out an extra 3-5% lift on top of that. Our CRO programs are narrowly focused around one problem area at a time. So it doesn’t sound like much, but an incremental 5% lift on the most detrimental metrics hurting a site can be massive.Sheldon Adams, Head of Growth, Enavi

And all this accelerated progress happens with less effort, especially when compared to spaghetti testing — a strategy-less approach where you test various changes to see what sticks.

With iterative testing, you’ve already laid down your markers.

You’ve established a clear understanding of the parameters to monitor when you introduce a specific type of influence to the audience struggling with a particular roadblock. You know which metrics to track upfront. And crucially, you’re equipped to rapidly fix any mistakes, resulting in interventions mattering more for your audience.

For those wary of the rut of iteration, fear not—we will address how to move on too. We don’t want “attached” A/B testers! 😉

And for those who might feel that iterative testing is simply redundant, building on the same insight, it’s time for a shift in perspective. Iterations, when you look at them the way Shiva & Tracy do, are user research. They just don’t fuel incremental improvements; they have the power to shape blue-sky thinking.

Iterative Testing = User Research

To grasp this second benefit, spare 26 odd minutes of your life. Don’t worry, this isn’t a dry podcast. You will laugh and nod your way to deep understanding.

Here’s Tracy and Shiva discussing the significance of iterative testing over spaghetti testing:

In this episode, they talk about how research is the basis of iterative testing, enabling a profound understanding of why certain outcomes occur. When you turn your findings into problem statements, you create a path to subsequent iterations and test ideas.

With the iterative approach, your mindset towards win and loss changes. You find there’s always something to learn from any test outcome to support informed decision-making. And you see that true failure happens only when you give up.

And that’s the idea they support: a persistent and ongoing process that leads to continuous user research and refinement.

So, you can’t iterate if you don’t have a hypothesis rooted in data. If your approach lacks depth—say, you’re hastily ripping banners off of your website just because your competitors are doing the same—you’ll never have the data or approach to iterate. Watch Jeremy Epperson’s take on iterative testing here.

Commitment to iterative testing is thus a commitment to stop looking at outcomes only.

It is a commitment to learning from the actual test design and implementation too, and treating the experiments themselves as a form of user research.

Listen to Tracy Laranjo’s process where she walks you through how she reviews screen recordings of test variants in Hotjar. See how she looks for what could have resulted in an emphatic response (win or loss) or a lukewarm reaction and how she envisions pushing a particular implementation further.

Even the most well-researched and solid ideas don’t always result in a winning test. But that doesn’t mean that the hypothesis is flawed; sometimes it’s merely the implementation of the variation(s) that didn’t quite hit the mark. The good news is that we learn from every single test. When we segment the data and look at click/scroll maps, we get clues into the reason why the variation didn’t win, which becomes the basis for our next tests. Sometimes a small change from a previous test is all it takes to find the winning variation. Furthermore, iterations usually take less time to develop and QA since much of the work can be reused, so an iterative testing approach also helps maintain a high volume of tests.

Theresa Farr, CRO Strategist at Tinuiti

In essence, iterative testing is very much like Machine Learning. Each time you refine a component of your persuasion or influence lever, fine-tuning its execution, you’re ‘feeding the beast’—enabling the company-wide gut instinct that sits atop the experimentation learning repository.

Iterative Testing Makes You Action-Focused

Insights and test outcomes, on their own, aren’t actions.

Speero highlights this distinction in their Results vs Action blueprint. Here, they emphasized that the true value of testing lies in the subsequent actions you take. That’s when experimentation can blossom into the robust decision-support tool it is meant to be.

As Cassie recommends, it is best to have a clear idea of what you will do if the data fails to change your mind (and you would then stick to your default action). And what you will do if the data successfully updates your beliefs.

When you adopt the mindset of iteration, you have to plan at least 2-3 tests.

You have to get into the habit of looking beyond the primary metric and choosing micro-conversions that reflect nuances of behavioral shifts. Such strategic foresight transcends beyond the simplistic, binary mindset of ‘implement intervention’ versus ‘don’t implement intervention’.

The more thought you put into what comes after the admin and academia of running tests, the more tangible impact is felt by business KPIs and the organization as a whole. This bodes well for buy-in, validating the indispensable value of experimentation and everything else a program lead is worried about.

Shiva connects the dots here:

Iterative testing is one of many things that separates junior CROs from senior CROs. Iterative testing usually is the genesis of forward-facing, strategic thinking. Spaghetti testing (throwing sh*t at the wall and seeing what sticks) generally ignores this kind of strategic thinking. Iterative testing is something that requires research.

Why is research important? Back to the ‘strategic thinking’ — if you see something won, but you don’t have a concrete hypothesis backed in data, nor are you tracking micro conversions to understand behavioral shifts better, you are basically stuck with “Ok, it won. What now?”

Research leads to better hypotheses, which leads you to a better understanding of the ‘why’ something won, rather than simply knowing if something won or not. Which leads to problem-focused hypotheses. These are critical.

Having research is ‘easy’ (meaning conceptually easy to capture at least). Actually doing something productive with that research can be more difficult. Moving your research into problem statements is a very easy vehicle to move your research into testable hypotheses.

But also, that problem statement is crucial in knowing HOW to iterate if something wins/loses (along with a ton of other benefits). If you’ve done the two above points, you should have a general idea of what you expect to happen in a test, with a plan for tracking behavioral shifts.

Before you launch your test, you should consider “what happens if the hypothesis is correct” and “what happens if the hypothesis is proven incorrect.” This is where the ‘iteration’ step comes in. You are planning the next 2-3 tests based on the research, user behavior, and some general idea of things you believe will happen (based on that research). This is why planning for iterative testing can be so much harder without research (and spaghetti testing). If you run one test and don’t really know where else to go from there, ‘iteration’ isn’t something you’d be able to do more of.

Iterative Testing Minimizes Risk

It’s tempting to think of iterative testing as just another method for experimentation, but it goes beyond that. It’s a strategic approach that also minimizes risk in testing.

Iterative testing takes out-of-the-box, disruptive concepts and transforms them into potential revenue levers. By embracing incremental improvements and working out the kinks, it converts roadblocks into bridges without kicking up a lot of dust.

In her insightful LinkedIn post, Steph Le Prevost from Conversion.com elaborates on this — the concept of risk profiles in experimentation:

How to Iterate on Insights and Test Results?

Iterative ‘testing’ is everywhere now.

Most elite marketers in the world iterate.

To illustrate, here’s one of our favorite recent examples of iteration in action, shared publicly (Slide #10 onwards):

Excellence, it appears, rides on the back of iterations.

Yet, it begs the question: Why has iterative testing only recently become a buzzword?

Sheldon Adams offers an insightful perspective, attributing it to the ‘shiny object syndrome’:

Despite everything I said above I am extremely guilty of wanting to move on as soon as we validate a test. I think it is the novelty factor that makes it seem like the right thing to do. So I certainly understand why teams get that urge.

Sheldon Adams, Head of Growth at Enavi

So, how do you implement the iterative testing approach?

Identify Key Metrics for Effective Iteration

The more you test, the better you’ll get at iterating on insights.

Till you do though, a framework is handy.

Here, we introduce you to Speero’s Experimentation Decision Matrix Blueprint.

With a framework like this, you’re identifying primary success metrics and outlining the actions you’ll take based on the outcomes of the test before even diving into an experiment.

It’s the type of structured approach associated with iterative testing that ensures every test has a clear purpose and isn’t done just for the sake of testing. And that these tests update your experimentation roadmap based on previous learnings.

Pre-Iterate Towards Success

Shiva spoke about the fact that iterative testing involves thinking two to three tests ahead. Kyle Hearnshaw adds his advanced spin to the idea:

Think of it like a decision tree: you should always have subsequent iterations planned out for every outcome of your tests. Then, you can appreciate the quantum world of possibilities we had teased in the intro.

Experimentation is human nature. And iterations mimic life and its dynamics. When you pre-iterate, you pay adequate attention to action and planning scenarios ahead of time.

And you also accept what the audience shows you — voting on your solution and its execution with their preferences.

The framework you create for the actions and iterations draws on your previous experiences. Then you make room for new data with the actual experiment outcome.

Brian Schmitt talks about how when surefoot decides on the metrics to capture in a test, they are thinking of iterating on the outcome. This is mostly informed by previous test experience:

In our first test, we primarily measured the add-to-cart rate to understand the impact on user behavior after adding their first item to the cart. We were interested in finding out whether users proceeded to checkout, interacted with recommended products, clicked on the added product, decided to continue shopping, or chose to close the window. Did they click the ‘X’ or somewhere around it? From the original data, it was clear that a significant percentage of mobile users instinctively closed the window.

Analyzing the data from this test, particularly through Hotjar or any other heat mapping tool, revealed that users were not clicking as expected. This observation led us to introduce the ‘add to cart’ button to ensure users understood that the options were clickable. For this particular test, we included the ‘add to cart’ button click as a metric. Notably, there is no ‘keep shopping’ option; users simply close the window.

This is the basic rationale behind our choice. As part of our routine, we forward the test to our data analyst, Lori, for review. Lori evaluates the test, considers potential ways to dissect the data for later analysis, and, based on her experience and the need for specific metrics from past tests, adds additional metrics for a comprehensive story. This could include impacts on user behavior, such as whether they proceeded to the cart, skipped the cart to checkout directly, or utilized the mini cart. These insights are crucial for understanding the behavioral changes induced by the test variation.

Brian Schmitt, Co-Founder of surefoot

Create MVP Tests as a Starting Point

MVP tests — they’re simple, quick, and a solid base to guide future iterations.

Since they’re designed to isolate specific variables and measure their impact, you get to glean valuable data with minimal effort; data that essentially becomes the root of iterations.

MVP tests are arguably the least risky method in experimentation to get rapid feedback and quick validation of ideas, ensuring that any potential missteps are manageable.

Will Laurenson talks about how important MVP tests are in their approach to experimentation at Customers Who Click:

Tests are initiated based on prior research. Even if the original concept didn’t yield robust results, the variant might just need refinement. For instance, at Customers Who Click, our approach focuses on MVP (minimum viable product) tests. We aim to verify the core idea without extensive resources. Consider the case of social proof. If we test product reviews on a product page and see a conversion uplift, many brands & agencies would stop there. However, we’d dive deeper: Can reviews be integrated into image galleries? Should we incorporate image or video reviews? What about a section spotlighting how customers use the product?”

Will Laurenson, CEO of Customers Who Click

How do you create MVP tests? Isolate a key metric, create a simple test design, prioritize speed and launch quickly, and then analyze and pull insights from your results. Iterate with confidence.

Develop a Learning Repository

Without a learning repository, you’re missing out on a crucial step in iterative testing.

Iterative testing stands on the shoulders of past learning. With no actual record of past learnings, there’s nothing to stand on. It’s just dangling there, on guesswork and flawed human memory.

A learning repo is a structured compendium that makes the process of building upon ideas and previous experiments significantly smoother and more efficient.

We’ve gone into details about setting up a learning repository before, but here’s a quick reminder of what it’s typically used for:

Acknowledge the value of experimentation data: Recognize that every bit of data from your tests is a priceless asset of your company, essential for informing future tests and drawing insights from previous endeavors.
Select the appropriate structure: Choose a structure for your learning repository that best suits your organizational needs, be it a Centre of Excellence, decentralized units, or a hybrid model.
Document clearly and precisely: Make certain that every piece of data and learning captured in the repository is documented in a clear, concise, and accessible manner.
Maintain comprehensive overview: Use the repository to keep a bird’s eye view on past, present, and future projects, ensuring knowledge retention regardless of personnel changes.
Prevent redundancy: Leverage the repository to avoid repetition of past experiments, ensuring each test conducted brings forth new insights to iterate on.

In the grand scheme of all things iterative experimentation, your learning repository not only streamlines the process of referencing past experiments but also ensures that each test is a stepping stone toward more insights.

However, this will depend heavily on the quality of the experiment data in the repo and access to it. Tom talks about meticulous tagging and structuring within the repository:

As we know from the hierarchy of evidence the strongest method to gain quality evidence (and have less risk of bias) is a meta-analysis over multiple experiments. To make this possible it’s extremely important to tag every experiment in the right way. Tooling which supports this helps you a lot. In my experience, Airtable is a great tool. To do this in the right way it helps to work with the main hypothesis and experiment hypothesis. Most of the time you work on 2 or 3 main hypotheses maximum at the same time, but under these main hypotheses you can have many experiments each of which has its own hypothesis. So you can learn after one year for example that the win rate at the PDP for “Motivation” experiments is 28% and for “Certainty” it’s 19%.

Tom van den Berg, Lead Online Conversie Specialist at de Bijenkorf

Experimentation Experts Share How They Iterate

Since we’re about the practical aspect as much as we are about taking you through a mind-bending journey from spaghetti testing to iterative testing, here are stories of how experts iterate to inspire your experimentation program:

Streamlining Checkout with Iterative Optimization

By Matt Scaysbrook, Director of Optimisation, WeTeachCRO

The test began as a simple one, removing the upsell stages of a checkout process to smooth the journey out.

However, after each test was analyzed and segmented, it became evident that there were positive and negative segments within each experience.

So while the net revenue gain may have been £50k for the experiment, it was in fact £60k from one segment, offset by -£10k in another.

This then led to iterative tests that sought to negate those underperformances with new experiences, whilst keeping the overperformances intact.

Iterating for Better Add-to-Cart Conversions

By Brian Schmitt, Co-Founder, surefoot

Test 1:

Hypothesis: We believe showing an “added to cart” confirmation modal showing complementary products and an easy-to-find checkout button will confirm cart adds, personalize the shopping experience, and increase AOV and transactions.

Analysis revealed there was minimal interaction with the “Keep Shopping” button, users were more likely to click the “X” to close the modal.

We saw an opportunity to optimize the add-to-cart modal by removing the “Keep Shopping” button, which was not used frequently, and giving more visual presence to the checkout button. Additionally, there may be an opportunity to adjust the recommended items shown to lower-cost items, potentially driving an increased number of users adding these items to cart. Last, adding a “Quick Add” CTA to each item could lead to more of the recommended items being added to cart.

Test 2:

Hypothesis: We believe showing products under $25 and having a quick add-to-cart button on the recommended product tiles of the confirmation modal will increase units sold per transaction.

Analysis revealed a 14.4% lift in recommended products being added to cart, and an 8.5% lift in UPT (at stat sig).

We iterated on the first test to capitalize on observed user behavior. By removing the underutilized ‘Keep Shopping’ button and emphasizing a more visually prominent checkout option, along with featuring lower-cost, quick-add recommended products, we aimed to enhance both the user experience and key performance metrics.

Using Iterative Testing to Understand Data Patterns

By Sumantha Shankaranarayana, Founder of EndlessROI

Hypothetically, let’s assume your first high contrast A/B test on the website results in a 50%

conversion uplift. To get to this winner, you would have altered a number of aspects,

including the headline copy, the CTA, and the overall page design, based on page scripts.

With iterative testing, you can now measure the impact of focused clusters, say, just on the

masthead, and determine what factor caused the 50% boost above. Say the headline copy

gave you -10%, the CTA +40%, and the overall design +20%.

You may now be more specific and try to boost the headline content by 20%, the CTA by

70%, and the eye flow management with the design by an additional 50%. The overall

throughput is +140%, which is much greater than it was during the initial round of A/B testing

at a +50% uplift.

Applying iterative testing with data patterns uncovered through sound experimentation programs, like those hypothetical examples above, will help you build a growth engine.

Ideas for Iterative Testing (or Rules of Thumb, If You Like)

Thinking of recreating one of those stories above? Keep these in mind:

Begin with simple tests and deep analysis to identify varying impacts across different segments.
Celebrate net gains from experiments but delve deeper to understand the positive and negative contributions of different segments.
Use insights from segmentation to enhance well-performing areas and improve underperforming ones, ensuring a balanced optimization strategy.
Observe user interactions and optimize accordingly, removing underused features and enhancing conversion-driving elements.
Use user behavior insights to optimize elements like recommended product tiles and quick add-to-cart options, enhancing user experience and key performance metrics.
Acknowledge and build upon small wins, using them as a foundation for further optimization and testing.
Apply focused iterative testing to specific elements after a high-contrast A/B test to optimize their individual impacts.
Consistently use data patterns from iterative testing to inform your strategy, transforming your experimentation program into a growth engine.

Breaking Up: When to Move On From Your Insight

Navigating the tricky terrain of iterative testing requires not just a keen sense of when to persevere and when to let go of an insight. Some insights have come to a dead end and there are three ways to know:

1. When Your Experiments Never Show Positive Results Regardless of How Many Times You Iterate

Maybe you’ve peaked on that one. Sometimes you can’t move the needle forward because the current user experience is close to optimal and there’s nothing left to optimize.

The next step in ‘iterative testing’ is understanding at what point you iterate vs. move on to the next concept you want to test. A rigorous prioritization framework includes prioritizing iterations in your testing (because it’s another form of ‘research’) more than non-iterative tests.

However, if you are on the 50th iteration of a test and have seen 49 losses, there may be a point where you have extracted the most out of that hypothesis and need to move on to another hypothesis to test. I made this blueprint a while ago to help you consider when to continue iterating vs. when to consider moving on to another hypothesis.

This blueprint isn’t set in stone – I wouldn’t follow it blindly. However, consider the logic behind the decision-making as you balance the needs within your business when deciding whether to iterate or move on to another hypothesis.

Shiva Manjunath

When You’re Sapped-Out on Insight-Inspired Creative Ideas

It may be time to redirect your focus if you find that there’s nothing new creative-wise forthcoming after a miles-long string of unsuccessful tests, even leaving you and your team feeling drained of creative energy.

via GIPHY

When Your Experiment Isn’t Leading to Any New Learning

If your iterations from those insights show a trend of no new learnings about user preferences or the effectiveness of different experiences, then you should let go.

If multiple tests on a theme yield neutral outcomes, it’s wiser to pivot and revisit later. Also, avoid excessive changes that disrupt the user experience. Occasional additions or removals are fine, but overhauling layouts or key features can disorient returning visitors.

Lastly, always weigh the opportunity cost. If you’ve already secured gains from a test and have other high-potential ideas in the pipeline, it might be more beneficial to explore those before reiterating on a previous concept. This ensures you’re always aiming for the highest impact.

Will Laurenson

Note that you want to have a data-driven way to move on from insights, not just because one or more of the three points here make you feel you should move on. Ask real questions, such as the one Sheldon suggests:

Generally we are looking for impact from tests, positive or negative. We take inconclusive tests to mean that the change didn’t impact the user’s decision-making process enough.

When we move on from an idea it is likely a combination of factors:

How robust is our testing roadmap? Are we overflowing with other ideas?

Does the client have the traffic to run tests to a smaller percentage of users?

Is the dev lift too great to create multiple versions of an idea?

Have other warning lights come on in the site that need more urgent attention?”

Sheldon Adams

In conclusion, not all experiments are destined for stardom. When your A/B tests consistently flop, regardless of their size or boldness, it might be the universe’s nudge to say, “You’ve hit the peak!”. If your idea well is running dry and the team’s morale is low, it’s time for a creative intermission.

And when experiments turn into a monotonous rerun without new learnings, it’s your cue to exit stage left.

Want more fun, spicy, insightful and different takes on the key issues in experimentation? Our community enjoys the “From A to B” podcast co-founded & co-hosted by Shiva Manjunath and Tracy Laranjo: https://podcasters.spotify.com/pod/show/from-a-to-b

Experimentation is Human Nature™. Use This Realization to Fuel Your First Test.

Simon Girardin — Wed, 04 Oct 2023 19:37:09 +0000

Curiosity drives us.

Picture a child who dismantles a toy, not to break it but to understand how it works. That’s us. We don’t grow out of that. We all have an innate desire to explore, to understand, and to improve.

Yet, for many, the word “experimentation” feels distant, perhaps even clinical. But it isn’t just about lab procedures and petri dishes. At its heart, experimentation is the practice of exploring, testing and iterating ideas and processes in pursuit of better outcomes.

It’s about tapping into that childlike curiosity that drives us to understand and improve. And applying it in a structured, methodical way.

Appealing to that experimenter inside every individual and tying it back to the context of business goals… that’s experimentation.

Doing this right puts you on the right side of this snarky remark:

Experimentation legends John Ostrowski and Simon Girardin recently sat down with us to discuss:

The true essence of experimentation and its role in modern businesses
Real-world challenges and triumphs in establishing an experimentation culture
The pitfalls of treating experimentation as just another business tactic
Insights on how to foster a genuine, data-driven experimentation mindset

Here’s why we did this interview:

Experimentation is Human Nature: This is a core idea in our strategic narrative. It’s a cure for the old way: Thinking the tool is the be-all and end-all of optimization and going for exorbitant annual contracts that leave little wiggle room to invest in the right people and processes.

The new way is not to just optimize or A/B test, but instead to continuously explore new ground using the very spontaneous and instinctual instrument of experimentation.

From finding the perfect workout routine to crafting the most relevant ABM campaign, experimentation (and iteration) is the bedrock of human innovation.

If experimentation is human nature, then we also want to…

Explore how this propensity to experiment can be used: Specifically, in two different scenarios:
- The scenario where folks who have no exposure to optimization get to understand continuous improvement through the more familiar idea of experimentation.
  
  With the hope of igniting their interest so they make the transition from opinion-based to data-driven more easily.
- The scenario where an organization is starting its “CRO” program from scratch and the intentions and mindset it needs to have in place. Such as those around the acceptance of data-driven decision-making, i.e. team members not afraid to be proven wrong.
  
  And how these intentions and mindsets are a given when humans experiment in everyday lives, without the pressure of KPIs or the siren calls of revenue. All of which can be a more robust and effective springboard into experimentation, rather than looking to competitors for inspiration.

At the end of the day, ladies and gentlemen, experimentation requires infrastructure and rigor. Yet, the concept of experimentation—the ideas you have to seed in your team to actually get to the point where an experimentation program is possible—is as innate to humans as the desire to sleep or eat. Primal, if you may.

Start with the Right Intention: Everyone is an Experimenter

In this video clip, John Ostrowski is sharing his experience with Nick at Wise Publishing. Nick already grasped two foundational concepts:

That it is important to give the entire team data to base their work on, and
It is equally important to empower them to bring these data-informed ideas to the table and get a shot at having them improve the reader experience

Combine those two and you’ve defined experimentation.

This enables everyone to build on the existing foundation of experience and knowledge and try new things. It also gives novel ideas a shot at pushing the boundaries of what’s considered “good” and “acceptable” within that ecosystem.

The beauty of experimentation is its universal appeal. Most leaders intuitively get it. They know they have to place bets and watch them play out. Sure, they might not always think in terms of statistical significance or metric choice, but the core intention is always present.

And you can use this to pave the way for an experimentation program.

This brings us to the next vital element of preparing the foundation for experimentation: hiring folks who already know they should bring data to any argument and all proposals. Because, let’s face it, teaching this to grown-ups is tough.

Getting it right from the start—hiring those who inherently value experimentation—is the only antidote to friction later down the line. You know, those conflicts where the experimentation team is battling the entrenched mindsets of the rest of the organization to deploy the first few tests.

Move Away from the Idea that Experimentation is Just Another Tactical Channel

At Convert, we believe Experimentation is Human Nature. It’s not just a strategy; it’s a way of life. But, we were curious, have Simon and John come across teams where experimentation was treated as just another tactic?

They were aligned on this one…

John said that experimentation as a tactical channel (where the right intention is missing) is something he actively avoids in his consultation positions.

Simon said that in his vast experience, he’s seen many teams value data. While they might not collect it in a way where the insights are easy to find (something Conversion Advocates excels at), the “data-informed” bug has bitten everyone.

But he went on to talk about an instance in the past (and you can look back on his journey to becoming a CRO lead at one of the best optimization agencies in the world) where he had been on teams where bias ruled the roost. All changes were based on opinion and bias and there was pushback when data-triangulated updates were proposed.

When people misunderstand experimentation, they use it to:

Impress VCs with “something dynamic” in a quarter:

Produce quick results under pressure
Fix poor performance across the board
Chase revenue without doing anything “extra” (or add any real value)

Both Simon and John would rather not work at all than work with teams where the data-informed mindset (including the mindset of testing ideas and long-held beliefs) isn’t given the time of day.

This makes us think: It’s time the C-suite wakes up and scrutinizes the impact of closed-off mindsets.

They not only reject data that may show where their business is falling behind competitors, but they repel professionals like John and Simon who can help with guidance and locating which motor in the organization is idle and which could use an upgrade.

This strands the business in a quagmire of negative feedback. Leaders listen to the yes-folks who in turn perpetuate this vicious cycle of outdated playbooks and a lack of regard for customer research.

As highlighted in Speero’s post below, you can choose the ultimate goal of your experimentation program. It can be about understanding customers or going for quick wins. But not having a strategy is letting a competitive advantage slip out of your hands.

Break the Cycle. OWLs Keep HiPPOs in Line

OWL: Objective Wisdom through Logic

Owls are often associated with wisdom and insight, making them a fitting symbol for those who prioritize data and logic over subjective opinions.

It all starts there… bringing on people with the right mindset.

While many in the experimentation space focus more on tools, techniques, and strategies in experimentation, only a handful emphasize the significance of structuring the right team and hiring right. Among these voices, Manuel Da Costa and Jeremy Epperson stand out, echoing through our daily feeds.

#TeamOverTools. Because at the end of the day, a tool is only as good as the hand that wields it.

Take a look at Booking.com. In their bar-raising experimentation program, they map bringing in the right people to the company’s growth in general. When you scrutinize their job descriptions, you’ll spot a recurring theme: critical thinking, data-driven decision-making, and a passion for experimentation.

HR Business Partner at Booking.com (Source)

Senior Project Manager at Booking.com (Source)

Do an “Experimentation Onboarding” for All Key Roles

Hiring the right talent is just the beginning. Integrating them into the experimentation fold moves things further in the right direction.

That’s where onboarding comes in.

At Conversion Advocates, Simon Girardin runs “mini onboarding” sessions for newly hired C-suite members and directors. These sessions aren’t your typical orientation. They dive into

Roadblocks: A complete overview of the conversion roadblocks identified so far, the insights they have triangulated, and what they will be testing to remove said roadblocks.
Challenges: A candid discussion about the challenges the experimentation program faces on its journey to maturity. They open up about the glut of test ideas pouring in from all departments, and the need to properly prioritize them into an impactful backlog and experimentation roadmap.

And Ronny Kohavi also recommends something akin to this!

Buy-in from executives and managers must happen at multiple different levels and include:

Engaging in the process of establishing shared goals and agreeing on the high-level goal metrics and guardrail metrics and ideally codifying tradeoffs as steps to establishing an [Overall Evaluation Criterion].

Setting goals in terms of improvements to metrics instead of goals to ship features X and Y. There is a fundamental shift that happens when teams change from shipping a feature when it does not hurt key metrics, to NOT SHIPPING a feature unless it improves key metrics. Using experiments as a guardrail is a difficult cultural change, especially for large, established teams to make as they shift towards a data-informed culture.

Excerpt from Page 60 of Ronny Kohavi et. al’s Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing

Such sessions do more than just inform. They immerse new hires in the company’s experimentation culture. They get a firsthand look at the hurdles, share the collective enthusiasm for experimentation, and feel a part of the mission from day one.

Getting on the same page with data-minded hires and setting the right expectations is crucial for the success of the program.

How to Find Your OWL Allies?

Tricky business to find allies to further secure your buy-in, right?

We have two absolutely golden nuggets from John Ostrowski & Simon Girardin.

John’s approach? Dive into the company’s OKRs and goals first. He zeroes in on roles facing the steepest revenue challenges, offering a smoother path via data-driven decisions and experimentation.

Simon adds to this starting point. He suggests scouting for team members showcasing:

Curiosity: Do they genuinely want to understand the business’s inner workings, or are they content with internal assumptions?
Humility: Are they open to having their hypotheses challenged? Can they update cherished decisions based on fresh data?
Radical Honesty: Will they cherry-pick metrics to force wins? Can they resist the pull of skewed incentives?

At the end of the day, starting an experimentation program can hit roadblocks if the team doesn’t prioritize experiments.

This is where company leaders come into play. They’ve likely run thousands of experiments (not necessarily A/B tests) to get their products off the ground—their version of the most viable solution for a particular market problem.

Leaders must rekindle their love of trying new things to make existing things work better or to create new processes and campaigns from scratch. With A/B testing, they get the added safety net of statistical validations, reducing risks.

If your business venture is an adventure, experimentation is the armor!

Inspire Interest in Folks with No CRO or Experimentation Background

This illustrates what Simon is talking about:

You can see the alternative to experimentation is accidental gains. Simon explains why accidental gains aren’t what you want:

You can only measure success after the fact
You have to fully invest in the feature before reaping rewards (if any)
Poor performance will affect 100% of traffic and often for periods of 1 to 2 months
Performance is evaluated on a sequential basis, which can be influenced by (sometimes unknown) external variables
Projects are larger and take a lot of time
Successes create a massive growth spike, and then
They are followed by long periods of stagnation

Meanwhile, experimentation allows teams to capture more frequent gains. Though incremental, they stack. And every loss is mitigated by being shown to only 50% of traffic, for a max of 2 weeks.

Ultimately, both curves will have fluctuations. Each company will have its own growth curve. However, experimentation creates a larger area under the curve, which means extra revenue and profits.

So, how do you secure experimentation buy-in to move to incremental gains?

The first conversation with the team regarding this could be awkward, especially if there’s no experimentation infrastructure to speak of.

For data-driven marketers and anyone with prior optimization experience, settling into a broad experimentation mindset isn’t a huge hurdle to climb.

But for folks with little to no experience, it’s probably best not to start with CRO. CRO is more popular in comparison but it may be interpreted as just another growth tactic, which isn’t aligned with the foundations we’re setting here.

You have to make experimentation feel natural and familiar to them first. Highlighting that “we already do this, let’s double down on it” makes it easier to get these folks on board when starting from scratch.

When Simon does this, he starts by sharing his story from a CRO specialist to a program lead. And then shows practical examples of experimentation and its successes. This excites beginners to build on something they already do informally.

Pro-Tip

Buy “Testing Business Ideas” and gift it to all team members. Tell them that experimentation can be used to update the business model, which is something every team is bound to care about.

Blank Canvas: Getting Started After That Critical First Conversation

After recognizing the innate human tendency to experiment, the next step is to channel that into a structured, business-focused experimentation program.

Here we’ll give you a blueprint to guide you through the transition from that pivotal first conversation to your first test:

1. Understand Your Holistic Growth Goals

Conduct an audit to get started here. An audit helps you understand and map out, where your org is and where you want it to go; plus it puts a spotlight on the end goal.

Since you can’t do this on your own, John suggested doing this exercise:

Get your stakeholders in a room and ask, “How do we grow?” It’s very likely you’ll get different responses. If that happens, that’s a symptom of misalignment. It needs to be addressed first.

This process should encompass every aspect of your brand in the form of customer touchpoints, not just acquisition. Because this is where the myopic view of improving acquisition conversion rates is born (which creates a fixation on CRO while sidelining the broader potential of experimentation).

At this stage, you’ll also want to consider a maturity assessment. Why? Chances are there’s some experimentation ‘activity’ that already exists. And your org would be willing to 2X, 3X, or 10X that existing activity.

Tools like Speero’s Experimentation Program Maturity Audit or Microsoft’s Experimentation Maturity Model can help you explain the path better (and sell it) with solid definitions for different dimensions like metrics, platform, team structure and organization, etc. They also help you prioritize the blocks to focus on first.

2. Take Stock of Larger Projects in the Pipeline

Find out how these releases or projects could impact ongoing tests. Are there any overlaps?

This is similar to understanding your holistic growth goals: if you do not have it figured out, it could relegate experimentation to top-of-business elements like imagery and value proposition.

3. Dive into Innovation with Experimentation

Try testing in areas related to potential new releases. It gets you closer to the serious impacts that win buy-in for experimentation. Resources like How to Win Big with A/B Testing can offer additional insights here.

A word of caution: Executives might be risk-averse in the beginning. You can overcome this hesitancy by showcasing the tangible benefits of experimentation, as Simon mentioned in his post:

Utilizing the ability to create low-fidelity prototypes and making a tangible (positive) difference to the bottom line of an investment that’s already approved is a compelling introduction to experimentation.

4. Understand the Limitations of Controlled Experiments

While controlled experiments are the gold standard, you may not always have the luxury to run them, especially in the early stages.

While rigor is of paramount importance in an experimentation program, it is also key to have a process in place when experiments are not feasible.

What do you do in these instances?

John recommends blending data from customer surveys, competitor analysis, and preference tests to zero in on signals that stand up to these compounding insights. The customer data-driven signals can then be used to inform decisions.

He also advocates maintaining “Decision Journals”.

Proposed by Nobel laureate Daniel Kahneman, a decision journal serves as a tool to capture your thought process, the context, the uncertainties, and the expected outcomes at the time you made the decision.

Over time, this journal becomes a repository of decisions (or “decision log”), allowing you and your teams to revisit them, assess outcomes, and refine your decision-making skills.

5. Cultivate Empathy for Team Members

Not everyone will immediately embrace the concept of business experimentation.

Extend empathy to such team members and help them get into experimentation by sharing examples of work they have done where informal experimentation was already at play.

Remember, the journey of establishing an experimentation program is less about the tech and more about managing leadership expectations and the aftermath of change which is introduced as a result of the “question and then test everything” mindset.

For those seeking guidance here, Ruben’s Udemy course is a valuable resource. Additionally, integrating empathy-building exercises into the experimentation program can create a more inclusive environment.

If team members voice concerns or skepticism, host open discussions or ‘rant sessions’. These can often lead to transformative ‘wow moments’ when they begin to see the value of experimentation.

And don’t forget self-empathy. Leading the charge in establishing an experimentation culture can be challenging. Avoid being overly critical of yourself.

In the end, we aren’t striving to make perfect decisions.

It is about making better decisions every day so all business goals—not just funnel-focused metrics—can benefit.

Conclusion: How Convert Supports the Sentiment of “Experimentation is Human Nature”

In three ways:

Technology: A/B testing platforms are tools, and by definition, can only produce results as good as the user. Poor use gives bad results; maestro-level use creates wonders.

We understand this. That’s why we’ve built Convert Experiences with intentionality. We are also the foremost believers in “experimentation is human nature”. Hence, our relentless push for everyone to tap into this innate curiosity to make data-driven decisions.

Great experimentation programs are built by people, not tools. So, we’ve never reconciled with the idea of 100,000-dollar annual contracts for enterprise A/B testing features. Instead, we offer self-service pricing.

We believe in empowering teams, not emptying pockets. For $499/month, you get access to essential testing features you can use anywhere (including full stack). This also comes with safety features like SRM checks and collision prevention; and advanced targeting and goal engines.

Process: Convert Experiences is built for the hands-on tester who values integrations, access, and versatility. With 90+ one-tag integrations, it blends seamlessly into any tech stack. And with our Data Sources feature, you can set up test triggers based on input from other apps. Our robust API extends this integration even further.

Culture: Cultural roadblocks litter the road to starting an experimentation program. Campaigning for a data-informed culture is extra difficult when you’re battling with:
- A steep budget barrier: With affordable monthly plans that provide all the features you need to maintain experimentation rigor, you no longer have to watch from the sidelines as opinions, instead of data-backed decisions, rule the day.
- Admin: Our support team is 10x faster so you can spend more time testing and less time struggling with tool issues.
- Onboarding headaches: Our 10-step onboarding ensures everything that should work, works — from tracking code installation to glitchless A/A tests.

With Convert Experiences, you can ignite a movement at your organization.

Entire Businesses Can Be Run by AI Without Human Intervention: Why We Need to Talk About This (+ Convert’s Stance)

Trina Moitra — Mon, 14 Aug 2023 19:51:57 +0000

To kickstart this piece, let’s establish a fact: We aren’t AI usage or development experts. And we do not wish to masquerade as such.

Instead, we’re observers, just like you, trying to make sense of an exponential advancement in technology.

There’s been a ton (conservative estimate) of hype around AI since late 2022. You can’t turn on the TV today without hearing AI this or that in 10 minutes or less.

But let’s press pause for a minute. Usher in the silence. What exactly is going on?

AI is way more accessible these days, thanks to OpenAI’s ChatGPT and the thousands of spinoffs it either inspired or powered, or both. People are going down this AI rabbit hole, throwing caution to the wind.

From ‘experts’ and ‘connoisseurs’ to laymen, AI has trickled down to consumers who may not realize the implications of using AI-powered tools, at an unprecedented rate. Even children who can use smartphones now routinely come in contact with AI.

But AI is not merely another technology. The idea behind it (at least in the sense in which it’s being hyped) is to move past building something new based on existing human thinking (generative AI).

Instead, the idea is to outsource our thinking and decision-making to AI-powered interfaces.

This idea is tantalizing. It means less ‘brain calories’ to burn. To think we can shirk all our responsibilities—and apparently, it’s marvelously easy to do so—is tempting.

As such, AI is being trained by people to usurp the human-ness of people. This is practically unavoidable since the potential market AI can capture is almost the entire breadth of humanity.

This includes folks who for various reasons do not possess the knowledge and discernment needed to make careful use of such a powerful tool. Folks who will be using the tool with impunity regardless.

As you can imagine, this comes with its own devious set of challenges.

So, instead of adding to the noise—blindly advocating for AI or vehemently campaigning against it—we, at Convert, want to foster a conversation around AI and its ethical use in business. We’re asking questions, learning, and making informed decisions about a more mindful approach to AI’s adoption.

And we invite you to do the same.

The World’s First AI Run Business: ChatGPT as CEO

A Portuguese startup, AIsthetic Apparel, sells T-shirts with AI-generated designs by Midjourney. ChatGPT, the appointed CEO, created the name, logo, business plan, and marketing strategy for the company. The founder, João F. Santos, acts as an assistant and follows the AI’s instructions.

Check this: With ChatGPT’s leadership, this apparel company raised $2,500 from angel investors and made a profit of €7,000 in the first week. ChatGPT projects an annual profit of €40,000 and a valuation of €4 million for the company.

This is a true story from just a couple of months ago.

A thought-provoking experiment about AI-driven business management and innovation. But also one that’s in the very infancy of its potential.

AI-run businesses will be incredibly efficient, using fewer resources and analyzing vast amounts of data to make faster, and (even) better decisions. With these perks and more, a scenario where entire businesses are run by AI is not a pipe dream. It is something that has already arrived.

But the question is: Should entire businesses be run by AI?

This will be a recurring theme moving forward. People who are bedazzled by the promise of Artificial Intelligence’s humming power need a reality check.

And this reality check can often be as simple as repeating the statement: Should I/we do this?

Just because you can, doesn’t mean you should!

Let’s break this down.

Over the last three decades, with the advent of the internet and the creator and Intellectual Property (IP) economies, the baseline of who can go viral, who can influence, and what can shape culture and leave an impact has shifted from the privileged few to almost anyone with the will to execute on an idea.

This is empowering.

But when you add Artificial Intelligence to the mix, the playground opens up to include every person on the face of the planet. With good intentions, or bad.

Why is this a problem?

This is a problem because Artificial Intelligence changes the quantum of possibility. It is creating a new world where hitherto what was outside the realm of yes, is now very much in the realm of feasibility (and with a low entry barrier).

New worlds demand revised codes of morality.

The AI-driven world doesn’t have its own code of ethics.

In the absence of ingrained guiding principles, “Just because I can, doesn’t mean I should” needs to be an ongoing mantra; the rhythm to which morality trails its blaze.

Consider this example:

We are huge fans of the way Eden Bidani composes her copy.

Her words are threaded together uniquely. You can spot her distinctive style a mile away.

We may gather pieces of content Eden has penned, train ChatGPT or any other open AI model with it, and proceed to create copy for the Convert website using Eden’s style and cadence.

Whether ChatGPT or the AI models of today can pull through with something that indeed matches human Eden’s prowess is something to debate later. That is a moot point, given the fact that technology is ever-progressive, and artificial intelligence in particular is breaking barriers at lightning speed!

The question is: Us appropriating Eden’s style, should we even attempt it?

A professional who has spent years honing her craft. She flavors her copy with her own life experiences — which according to Joseph Sugarman is one of the key pillars of the trifecta holding up excellent copy.

Any attempt to duplicate her work will not only be plagiarism of a kind that did not exist up until a year ago, but it is also a great disservice to the human aspect of her work.

Our apprehension is that not many businesses will pause to consider these ramifications.

Because ChatGPT can act as a business leader doesn’t mean it should be one. Leaders make decisions that affect people’s lives, such as hiring, firing, and even sentencing. Should we trust AI with those decisions without human intervention?

Many of us have witnessed AI being unintentionally biased.

Here’s another example from the time Amazon tried to use AI to find the best candidates for its tech jobs. It turned out the AI was biased against women.

The AI learned from past resumes that most applicants were men and favored those who had male-related activities or skills. The AI also ignored women who had impressive achievements or qualifications. Amazon could not fix the AI’s gender bias and decided to stop using it.

Or the time a French company tried to use OpenAI’s GPT-3 system to create a medical chatbot for doctors. This chatbot suggested that a suicidal patient should go ahead with the deed. The company opined that GPT-3 could only be used for fun or relaxation for doctors, but not for any serious medical purposes.

On the topic of whether AI should run without human intervention, the list goes on:

When AI is used to control or operate physical systems, such as robots, weapons, vehicles, or infrastructure. What if they malfunction, get hacked, or act unpredictably in these high risk-high impact situations? Remember Murphy’s law.

When AI is used to interact with humans, such as chatbots, virtual assistants, or social robots. Can we trust it to interpret human emotions, intentions, or preferences correctly? Even more critically, can we trust it not to manipulate people for commercial or malicious purposes?

When AI is used to learn from data or feedback, such as reinforcement learning or self-learning systems. AI systems can improve their performance and adapt to new situations by learning from their own experiences, but they may also deviate from their intended goals, acquire undesirable behaviors, or cause unintended consequences. Remember when Microsoft’s Tay became a “sexist, racist monster”?

So, while AI-run businesses may have their appeal, there are obvious drawbacks, challenges, and risks worth thinking about seriously.

A more balanced approach would be

Artificial (Machines) + Intelligence (Humans)

This way businesses can leverage the strengths of both humans and machines without falling into their pitfalls. While human beings can outsource their tedium to AI, we shouldn’t outsource our ethics, strategic decision-making, or morals.

Artificial Intelligence & Regulations: The Current Scenario

Recently, Sam Altman, CEO of OpenAI, testified before members of a Senate subcommittee in the United States and agreed with them that AI needs to be regulated.

Governments around the world are recognizing the impact of AI and are enacting legislation to guide its development and application.

The Brazilian Artificial Intelligence Bill, approved by Brazil’s House of Representatives and now under analysis by the Federal Senate, is a prime example.

In Europe, the EU AI Act serves as a signal to the rest of the world of the appetite for technological innovation within a government.

One thing these legislations have in common is that they acknowledge that we are past the point of no return. AI is here to stay for good, and the focus is now on ensuring that its development aligns with principles such as human dignity, privacy, non-discrimination, transparency, and ethical use.

But this is easier said than done. Even more so in the world of artificial intelligence development that’s changing too fast.

For example, the three static categories of risk: unacceptable risk, very high risk, and low risk may change frequently, almost within days, because of how quickly new tools and AI-enabled concepts hit the market.

Enter Elon Musk, Steve Wozniak, and others’ open letter to pause giant AI experiments.

It puts the onus on AI developers and governments to control the pace of upgrades. It says “Powerful AI systems should be developed only once we are confident that their effects will be positive and their risks will be manageable”, ultimately calling for a conscious choice not to train more powerful models until regulations are in place to identify and mitigate future risks.

If developers and governments put a pause on AI development until confidence is established, does that end further AI training? In other words, can AI train itself without human intervention?

Yes. Just with access to data, AI can search for patterns and teach itself to “fill in the blanks”. That’s called self-learning AI. It is unsupervised learning.

“Unsupervised” adds more dried twigs to the flames of nerves. And rightfully so because you can’t tell if what the AI is learning is helpful to humans or worse, leans more towards an anti-human agenda.

However, in those high-risk categories where there’s a legitimate cause for alarm, the EU AI Act calls for a centralized database of sites and technologies that must be subjected to conformity assessments and mandatory post-market performance logging.

Because at all points in time, the benefits should drastically outweigh the risks. Don’t you agree? Although “drastically” here (similar to risks) needs to be quantified.

Without that exact quantification, you can still position your organization on the right side of AI use compliance. The safe path is transparent and non-discriminatory, protects privacy, and preserves human dignity.

Some of this is already reflected in Brazil’s stance. For instance, it includes upskilling the workforce to prepare for the AI disruption. This is notably missing in the current version of the EU AI Act. You can see how different approaches manage the societal impact of AI.

It is important to note that the EU AI Act is still under negotiation, with the aim to reach an agreement by the end of 2023. Let’s see what changes.

Concerned groups and organizations like the Future of Life Institute have issued recommendations to bolster the EU AI Act, including better risk classification, especially with the advent of multi-purpose AI like ChatGPT. Some aspects of it may constitute high risk, while others don’t. It all depends on the use case.

They also call for better protection for AI whistleblowers and “sandbox environments”.

In the first, we all get clear ways in which humans who experience AI manipulation can report it, and seek redressal. In the second, there’s a safe space for businesses working in AI innovation to test out the technology without fear of repercussions.

You don’t want to stifle progress with more caution than is needed. At the same time, you want to protect humanity.

How Can Businesses Make the Most Efficient & Ethical Use of AI?

Right to the point, here’s how you may be able to achieve ethical AI use while still maximizing efficiency with a multipronged approach:

Stay up-to-date with the latest developments in AI regulation and legislation. So you can constantly assess your posture relative to government expectations regarding the use of AI.

Develop and adhere to clear ethical guidelines that align with both legal requirements and societal values. This includes principles related to transparency, non-discrimination, privacy, and human dignity.

Conduct regular risk assessments to identify and evaluate the potential risks associated with AI applications in your business. You must understand the categories of risk as defined by regulations and assess how they may change over time.

Cultivate a culture within the organization that values responsible AI development and use. Encourage employees at all levels to consider the ethical implications of their work and to act by regulations and societal values.

Establish a dedicated team responsible for monitoring compliance with AI regulations. This team should work closely with legal, technical, and operational departments to ensure alignment with regulatory requirements.

Ensure that AI development and deployment prioritize human well-being and job fulfillment. This may include policies that favor human-AI collaboration over full automation in areas where human judgment and empathy are crucial.

Provide training and resources to employees to understand the ethical implications of AI and the relevant regulations. Prepare your workforce for potential automation and the shift in job roles.

In the experimentation space, the brightest minds have one or two ideas to share with you about merging artificial intelligence with A/B testing:

Karl Gilis, Co-Founder of AGConsult and David Mannheim, Founder of Made With Intent

Craig Sullivan, Optimiser in Chief at Optimal Visit

This transformative power of AI is already getting implemented by CROs all over the world. When we asked experimenters how they’re currently using AI in their workflow, they told us they’re using:

ChatGPT to brainstorm content ideas, generate copy, and summarize user feedback
NLP and ChatGPT to categorize and theme user research text documents
Bandit algorithms and reinforcement learning to adapt experiments based on data
Evolutionary algorithms to test creative combinations on Facebook ads
Clustering algorithms to identify common characteristics among survey respondents

Generative AI may also be used to improve ideation in product development and experimentation:

So, now that you know all these, what’s the acceptable way forward? Of course, it’s important to embrace AI in your organization responsibly and ethically. Here’s how you can start using artificial intelligence today:

Improve Efficiency in Experimentation: Run more tests and improve test velocity. You can feed AI the necessary data for your app or website and it’ll come up with a lot of test ideas and hypotheses.
Adopt a “People First” Approach: Remember that tool productivity is only as good as the skill of the people wielding it. Ensure that your team is adequately trained to make the most of AI. They’re the ones who run AI, AI won’t run itself… yet.
Automate Basic Tasks: Allow AI to handle basic or “busy work,” freeing up human resources to focus on strategic thinking and tasks that require irreplaceable human intervention, ethics, and emotions.
Simplify Processes Without Surrendering Freedom: Use AI to simplify processes, but don’t surrender the ability to think, drive change, influence, and do good in the world. Preserve the virtues and qualities that are the essence of humanity.

Matrix and Terminator: Possible?

We’ve already downloaded all facets of humanity’s collective consciousness into the gaping maw of AI.

We’ve inspired it with Martin Luther King’s speeches. But at the same time, it knows of the bloody French Revolution. What’s fair, who is in the wrong, or who is right isn’t the question here.

The fact is… AI knows us intimately. It is capable of sifting through our most loving highs and our most gut-wrenching lows. We’ve made AI in our image. For better or for worse.

We at Convert do not think a kill switch is needed just yet.

Source

There will be a rebellion but more insidious in nature.

Borrowing what Kelsey Parker—a senior writer at Future Perfect—penned way back in November of 2022:

But given the speed of development in the field, it’s long past time to move beyond a reactive mode, one where we only address AI’s downsides once they’re clear and present. We can’t only think about today’s systems, but where the entire enterprise is headed.

The systems we’re designing are increasingly powerful and increasingly general, with many tech companies explicitly naming their target as artificial general intelligence (AGI) — systems that can do everything a human can do. But creating something smarter than us, which may have the ability to deceive and mislead us — and then just hoping it doesn’t want to hurt us — is a terrible plan. We need to design systems whose internals we understand and whose goals we are able to shape to be safe ones. However, we currently don’t understand the systems we’re building well enough to know if we’ve designed them safely before it’s too late.

Kelsey is not alone in spotting the risks associated with rampant & unchecked AI use. For every business that is looking to create 100 articles a day with ChatGPT, there are two organizations doubling down on retaining the human touch in everything they do.

IBM: “At IBM®, we are helping people and organizations adopt AI responsibly. Only by embedding ethical principles into AI applications and processes can we build systems based on trust.”
Microsoft: “We’re committed to making sure AI systems are developed responsibly and in ways that warrant people’s trust.”
Adobe: “Generative AI is the next step in the decade we’ve put into developing Adobe Sensei. As we harness its power across our cloud technologies, we’re more committed than ever to thoughtful, responsible development.”

However, each AI discussion–-in forums or on LinkedIn—is divided down the middle.

A dichotomy exists between those riding the AI wave without any concern for the future. And those who are cautiously open to incorporating aspects of AI into their lives, but are unwilling to make it the cornerstone of their existence.

Source

For these folks, the idea that their will and their data can disappear into a black box (so to say) and get spit out the other end without any regard for privacy is unpalatable.

In a sense, the fact that data and privacy concerns blew up in such a major way before the commoditizing of Artificial Intelligence is a huge boon.

People have developed screening habits that make them think twice before saying “yes” to every demand online, either by humans or by AI.

84% of Americans say they feel very little or no control over the data collected about them by the government, and 81% say the same about data collection by companies. And with AI, 68% of people predict that its impact on society will threaten their privacy. That’s more people than those worried about AI taking their jobs.

Back to insidious rebellion, instead of the stark dystopia of something from The Matrix.

Our bet is on the following trends peaking almost at the same time:

The machine-first pendulum swinging in the opposite direction. Neither extreme is ever any good. No human intervention in any endeavor will temper down into an acceptable form of “No human intervention (translation: automation of the tedious and the mundane)”. Opposing ideologies keep clashing till they find balance. Both sides will see the other’s perspective. And that brings us to trend #2…

Those who have their guard up against AI will let it down. AI will be indistinguishable from good technology. Humans will use AI but without fanfare. The idea that we will go quietly into the night and hand over the keys to the kingdom to an imposing machine overlord is believable only on the silver screens of Hollywood.

The bottom line is: Our lives have changed. What is possible has changed for good.

But at the same time, there is a clear need for regulations that keep up with the pace of AI advancement (a tall ask, since “regulations” and “agility” don’t play well together) and moral codes willingly adopted by the engines of good and change—businesses.

Each business must come up with its own ethics of AI use. And live them. This time not for CSR or brownie points. But to protect our way of life.

This is ours…

Convert’s Code of AI Use

Replace tools, not people. #TeamOverTools has long been one of our defining values. In AI use, we bring this back into play. Replace inefficient processes and expensive tools. Not people.

Use where use makes sense. We do not wish to plug every gap with Artificial Intelligence. Convert will continue to only introduce AI tools IF AI can legitimately solve a (real) problem better than existing resources. We do embrace progress over perfection, but we do strive for excellence. If AI is a fad, not excellence, then AI isn’t the solution.

Prioritize solving customer problems over introducing AI. Even our app development roadmap is not an homage to AI. We will continue to conduct research and listen to our users. If AI supports a particular Job to be Done well, then AI gets in on the merit of it being the best implementation decision.

Constant upskilling of talent. AI is evolving. With an eye to the future, Convert will invest in upskilling human talent in areas that face the threat of AI displacement.

Convert will use Artificial Intelligence. But the mission of a better world doesn’t change. We might just get there sooner!

Convert Experiences: Helping Organizations of All Sizes Run More Tests Across Diverse Channels

Uwemedimo Usa — Wed, 26 Jul 2023 18:01:54 +0000

In 2023, we launched a plethora of new features, and boy, did they make a splash!

This was our answer to two major industry shifts:

The Google Optimize sunset

Now, Google Optimize offered a low barrier to entry for many testers but also encouraged a lot of random, one-off efforts. When the news of its sunset hit, our data showed a surprising trend.

Almost 50% of the sites that had GO installed quietly uninstalled it, without replacing the tool with any viable alternatives. In other words, they stopped testing.

It seems that GO was inflating the real footprint of testing. It prevented smaller teams from putting skin in the game (because most testers starting out still think tool first, people second). And it also enabled a lot of dabbling as opposed to committing.

Conversion rate optimization can look like a tactic, a reactionary measure. True blue experimentation can never be a one-off endeavor.

Our goal at Convert is to comprehensively replace Google Optimize—for the eager experimenter—with a solution that doesn’t require as many hacks, won’t break the bank, and will be there for the foreseeable future. (In fact, one of our first customers is still with us — paying $9 per month).

The fact that client side testing is now a limited take on the vast discipline of experimentation.

Full stack and server-side testing are both here to stay.

Because server-side testing can get around multiple cookie and tracking issues. And full stack equips organizations with the ability to not only test UI changes but also permeate the willingness to let data guide decisions when it comes to investing in features and even in determining the business model adopted.

But let’s not stop there. We dive into the details in our full stack experimentation guide but here’s the rundown of why server-side and full-stack are so important:

Fuels continuous learning and delivery of higher quality products,
Provides a comprehensive view of the customer experience across channels, devices, and interactions,
Helps manage the risk of focusing solely on a single success metric,
Accelerates innovation, reduces time to value, and minimizes operational and performance risks,
Enables deeper testing of website aspects such as logic and functionality,
Allows for product feature testing, enabling the testing and refinement of new ideas and features,
Offers rich aggregation of business-wide insights for data-backed decision-making,
Allows testing of components coming from the backend of websites,
Allows testing of certain aspects of mobile apps, and more.

Companies like Etsy, Microsoft, Uber, Amazon, Facebook, LinkedIn, Google, Twitter, Netflix… the list goes on — have used full-stack experimentation to fuel innovation and improve the customer experience on their sites.

They understand that moving away from front-end changes only can open up a sea of strategic possibilities — allowing development to more directly support and inform the evolution of the business. We understand that too. And if you don’t already, you should too.

Convert Says: Experimentation Is Human Nature™

We say this because experimentation is a cycle that is ubiquitous in our daily lives.

We observe a set of events, suspect a causal link, hypothesize why this link exists, and then take action. This cycle of observation, hypothesis, and action is present in everything we do, from switching to a ‘better’ workout routine or tweaking a cooking recipe. It’s like climbing a mountain of wins.

All human progress can be traced back to curiosity and experimentation. We are curious about meaning and purpose, and this compels us to understand why things happen the way they do.

That’s why we see the same in businesses run by humans — experimentation in marketing, sales, product development, customer experience, pricing, etc. (Not like I’ve seen any business run by aliens but I bet they would do the same).

Hence any platform or tool that wants to support proper experimentation should:

Free up resources (funds) for people and processes

Your testing tool is just one part of your experimentation program. And it is not even the most important part.

As Steph Le Prevost points out below (and many other experts concur), people and processes are the bedrock of validation and causal relationship-seeking culture.

In our “How to Optimize Each Stage of Your Experimentation Journey” event, Jeremy Epperson also argued that tool choice should not be the first consideration. Instead, the focus should be on the skills, process, and team building of how we actually execute against experimentation.

Allow for testing across all growth channels and motions

Here’s a bit of context to grasp what this means:

Elena Verna emphasizes the importance of leveraging all growth motions and levers in your growth model. And experimentation plays a role in all three growth motions.

Product-led growth: To identify which product features drive user engagement and retention
Marketing-led growth: To find the most effective marketing strategies or campaigns that drive user acquisition and conversion
Sales-led growth: To find the most effective sales tactics/strategies — such as what pricing model increases the average deal size or close rate

Experimentation is after all an amplifier — improving what’s already working, spotting ideas that aren’t worth rolling out, and unearthing innovation to revamp what’s effective but gradually regressing to the mean.

All of which are elements driven by a comprehensive experimentation approach that permeates every aspect of your business and product. This is also known as full stack experimentation.

Your experimentation tool, if it’s built with the best interest of your business at the core, should be able to support testing across all these growth motions.

Shorten the time from observation to insight to knowledge

This may manifest in exploration tools like heatmaps that don’t just show “what’s happening”, but also connect the “why it’s happening” to actionable next steps (i.e., ideas to test).

It also involves giving you the power to unlock data from any touchpoint along the customer’s journey, even if it is a 3rd party platform.

Simplify test deployment

This can be achieved by offering hyper fast response times, so you and your team aren’t left twiddling your thumbs through inordinately long waits and then subjected to frustrating scripted fluff responses.

Instead, your tool should increase time to action/learning; or slash the time you take to launch a test and get back results — an important program efficiency metric.

It should stay one step ahead of privacy issues which add a layer of hesitation to running tests because of either skewed results, steep fines, or heavy criticism (like in the case of Facebook’s infamous 2012 experiment).

At Convert, we’ve worked extensively on this aspect, as detailed in our blog post on respecting visitor privacy during experimentation.

All the Recent Additions to Convert Experiences (And Why They Matter)

Here are some features that folks like about Convert Experiences:

Lower, flexible self-service pricing:

In the words of Rich Page in Convert Review From An A/B Testing Expert, “One of the biggest reasons to use Convert in comparison to other tools like VWO and Optimizely is their very reasonable monthly costs, particularly for over 250k unique tester users per month, which will be enough for potentially up to 5 tests per month.”

He also said our monthly pricing option (which is scarce in this industry) “gives you more flexibility for trying the tool without being tied down to long term contracts.”

Source

‘Improvement over time’ reporting:

The ‘Improvement Over Time’ reporting feature in Convert Experiences is a graphical representation of the performance of an A/B test over a period of time. It lets you visualize your A/B test results and whether they’re improving or deteriorating.

This is one of Rich Page’s favorite features in Convert. Writing about how he uses it, he explained, “It’s quite common to see a big increase or decrease at the beginning of a test, so it’s important to see how it’s trending after that. If you see the improvement trend line is flat for more than 7 days, you can presume that it’s very unlikely your result is going to change, so you can move on to your next A/B test. But if you see it slowly getting lower or higher, you need to wait even longer before declaring a result.”

Variation preview:

Many A/B testing tools don’t provide you with a preview of your variations before you launch the test. You’re basically going in blind.

But with Convert, “It is much better for previewing your experiences, with no issues regarding caching or cookies, and they also offer a QR code so that you easily preview on your own mobile device.” Those are also Rich’s review.

Switch between Bayesian and Frequentist stat engines:

Got unique business needs, traffic levels, or personal statistical preferences? Convert doesn’t lock you into one stat engine.

No bells and whistles:

You get the experimentation features CROs actually use. No excessive features to add costs to an inflexible pricing plan. Matt Scaysbrook of WeTeachCRO particularly loves this for providing cost-effective CRO services to his clients:

“What Convert provides to us as an agency is the means to provide our clients with a set of features that they will actually make use of. And not to provide a tool that has so many bells & whistles that they are clearly paying for functionality they don’t need.”

Seamless migration from Google Optimize:

Source

“Mix of great service, functionality and easy setup”. Need to say more? Yeah, we will say more. Because we didn’t stop there. Our new additions below will reveal what we’ve been up to (spot the 27-second Google-Optimize-to-Convert-Experiences migration feature in the list).

Now, let’s get into our new additions and why you should be excited about them:

Visual Editor Updates

Now, each individual change is connected on a DOM level, which means less duplication, easier re-editing, and deletion of specific past changes without having to edit the entire code.

Convert visual editor with element selector properties

This is a game-changer for those who want to make quick and precise changes to their A/B testing experiments. You can fine-tune the elements of your website with surgical precision.

This further simplifies test deployment for CRO teams. You spend less time and effort in launching and modifying tests, and free up time for action and learning.

Locations

Site Area, your familiar tool for specifying the exact places where your experiments trigger, is now Locations.

Old experiences with “Site Area” will continue to run without changes.
New experiences will have “Locations” (instead of Site Area).
Locations are reusable. They can be saved and used across experiences to save time.
For now both Site Area and Locations are available through the Convert Experiences API.
Site Area will be phased out.

We hope this helps you build reusable and more structured CRO programs for your team and clients.

And in the spirit of freeing up resources for people and processes, we’re thrilled this means you get to save time you would otherwise have spent building new locations for every experience.

API V2

The API V2 is a major upgrade that allows you to easily access account information as well as reporting information, while giving you access to the full experimentation lifecycle.

Everything that can be done through our UI can also be done via the API.

You can

create an experiment programmatically,
run custom analysis on experiment results,
build customized integrations and workflows,
create feature flags from an automated script,
build custom dashboards of feature test results,
connect your experiments to other project management tools, and more.

You can create projects, or completely automate the whole experiment pipeline without ever touching the Convert UI, up to you.

Customizable Bayesian & Frequentist Stats Engines

You can now tailor your stats engine to your preferences, for instance for Frequentist you can choose one/two-tails, multiple comparison corrections like Bonferroni and Sidak, along with the confidence wanted. You can also choose to use Bayesian and reason on the Chance to Win instead of the more traditional Confidence value.

You now have the power to pop the trunk and mod the engine running your experiments — putting even more control in your hands.

Now, you may be wondering: When do you use Frequentist and when do you go with Bayesian? Contrary to popular opinion, it isn’t about the sample size. It’s about your comfort with uncertainty and your need for actionable results.

Bayesian is great when you’re comfortable with a degree of uncertainty and need to make decisions quickly. Frequentist is your go-to when you need more controlled results and willing to do proper test sample size prepping and resist the temptation to peak.

Frequentist is the sturdy oak tree while Bayesian is like the flexible bamboo.

With the power to choose the stats engine you want for the experiment you’re running, Convert is supporting you by shortening the time from observation to insight to knowledge.

Full Stack Experimentation

We’re taking things to a whole new level with our Full Stack features — with server side experimentation, feature flags, and rollouts. So, Convert is truly a comprehensive experimentation platform that allows you to test and optimize across your entire tech stack, growth channels and motions.

Setup Screen for a Convert A/B Full Stack Experience

You can now roll out new features to a subset of users, test their impact, and then decide whether to roll them out to everyone or not. You get to see how your users react and then decide whether to go for a full release.

A/B Full Stack Experience Summary within Convert Experiences

Our SDKs allow you to integrate Convert’s robust features into your own applications. We currently support JavaScript (most popular JS frameworks are supported) and PHP SDKs.

Native GA4 Integration

We’re not just keeping up with the times, we’re leading the charge with our native GA4 integration. This isn’t just a minor update, folks.

Convert GA4 Integration Configuration

With this integration, you can now seamlessly connect your Convert Experiences with GA4, making it easier than ever to track and analyze your experiments.

No more juggling between platforms or wrestling with incompatible data. Everything you need is now in one place, working together in perfect harmony to shorten your time from observation to insight to knowledge.

But what does this mean for you? You can now leverage the power of GA4’s advanced analytics capabilities right from your Convert Experiences dashboard.

Experiment Data from Convert Inside GA4

You can track user interactions, analyze user behavior, and gain deeper insights into your experiments, all without leaving Convert Experiences.

And the best part? This integration is based on Google’s official integration API documents, which means it’s secure, reliable, and designed to work seamlessly with GA4.

Plus, Convert is one of the first A/B testing providers to have an official functional integration with GA4. So, you’re not just getting a cutting-edge feature, you’re getting a trailblazing solution from a trusted vendor. Get the details in our official announcement.

GA4 Conversion Events (Goals) Import

Imagine you’re a chef, and you’ve just spent hours carefully preparing a feast. But when it’s time to serve, you realize you’ve left all your dishes in the kitchen.

That’s what it’s like setting up conversion events in GA4 and then not being able to use them in your A/B testing tool. But with Convert Experiences’ new GA4 Goals Import feature, that’s a thing of the past!

New GA4 Conversion Event (Goal) Import Setup inside Convert

Just like a waiter whisking your dishes from the kitchen to the dining table, our GA4 Goals Import feature allows you to easily import the GA4 conversion events you’ve set up within Google’s ecosystem into Convert. This means you can gauge the success of your variations and experiments against these goals, all within Convert Experiences.

What’s better: This feature is packaged within our one-click integration that requires Google authentication. It’s like having a personal assistant who not only brings your dishes to the table but also does the dishes afterward!

That way, you get from observation to knowledge even faster. Convert’s integration with GA4 gets even better…

GA4 Automated Revenue Tracking

It’s common to find your A/B testing tool reporting one revenue value while Google Analytics reports another. That’s not a good look when we’re talking about money.

With Convert’s new GA4 Automated Revenue Tracking feature, that’s over. Our tool will use the total revenue collected by GA4 as the basis of its own revenue reports and Average Revenue per User (ARPU) calculations.

GA4 Revenue Tracking in Convert

This means you’ll get trustworthy results that align with your GA4 data, saving you hundreds of hours of troubleshooting and test design reviews. Yet another way we help you get from observation to insights with speed.

27-Second Google Optimize Data Importer

We’ve also made migrating from Google Optimize a breeze with our 27-second data importer.

Yes, you read that right, 27 seconds!

This tool allows you to quickly and easily migrate your Google Optimize data into Convert Experiences, so you can hit the ground running.

Convert’s Google Optimize Migration Extension in Action

But what does this process look like? Well, it’s as easy as installing and using the Convert Experiences Tools Chrome Extension that comes with a GO Data Migration Tool. This extension helps you optimize your experience development, testing, and debugging through some key features.

It shows you detailed logging of experiences’ activity on your website pages, downloads the experience snippet directly from the Convert app servers which eliminates the delay of experience updates in the CDN server, and importantly, imports your GO experiences into your Convert account.

The entire process is done in less than 30 seconds. Now you can pick up where you’d left off! Continue iterating on the changes made to the variants and test like you were never disrupted. You now have access to all Experiments, their data, variations and their settings.

You get speed to get back to extracting insights from experiments and freed up resources to focus on your people and processes — both wrapped up into one feature.

Check out the step-by-step guide here and start your seamless transition today!

AI Wizard

Step into the world of Generative AI with our AI Wizard. This is a widget that sits in Convert’s Beta Visual Editor as your partner for generating copy alternatives with speed.

Convert’s AI Wizard in the Beta Visual Editor

In the Visual Editor, click on an element on your page, such as your website headline, and you’ll find the AI Wizard tab in the left sidebar. Just click on it.

The AI Wizard helps you rewrite headlines and paragraphs of text using various models: the Seven Persuasion Concepts, the Fogg Behavior Model, and the most popular copywriting formats. This means you can swiftly craft alternate versions of your copy for A/B testing.

But let’s be clear: The AI Wizard is not here to replace strategic research, theme identification, or a seasoned copywriter’s strategic deliberation over word choice. Rather, it’s your companion for surfacing different angles to explore your existing web copy, all based on proven copywriting, persuasion, and behavior models.

And it is available to all Convert plans purchased in 2023.

Convert Assistant

In our relentless pursuit to make our users’ experience consistently astonishing, we’ve built a concierge: Convert Assistant.

It’s a ChatGPT 4 plugin that has access to our extensive support database and articles. You no longer need to sift through pages of documentation or wait for support responses. Simply ask and receive.

To use Convert Assistant, you need a ChatGPT subscription. It allows you to use plugins.

With your premium ChatGPT account, go to Settings > Beta features and enable Plugins. Then, on the GPT-4 tab, click on Plugins. If you already have installed plugins, scroll down to find the “Plugin store” button at the bottom. In the plugin store, search for “Convert Assistant” and install it.

Now, whenever you’re curious about a specific feature or how to optimize its use, enable the plugin and ask ChatGPT. Convert Assistant will give you specific answers to your questions in seconds.

Facing an issue or need guidance on a particular topic? Convert Assistant taps into our support documents and blog articles, offering you precise knowledge in real time.

And beyond just Convert-specific details, the Assistant is well-versed in the broader realm of experimentation. Whether you’re new to the scene or a seasoned pro, it’s like having a quick guru by your side.

Conclusion

In lieu of a conclusion, we’ve drawn upon insights from Ton Wesseling and Jonny Longden’s thought-provoking LinkedIn posts, which provide guidance for selecting a suitable experimentation tool vendor.

So, if you’re wondering what to consider when making such a choice, and if Convert is the right choice, here’s what you need to know:

Pricing Structure: Understand the pricing structure of the tool. Is it cost-effective for your organization’s needs?

Convert provides flexible self-service pricing. Plus, we don’t force you into an annual plan and our pricing is significantly more affordable compared to what’s popular out there. On top of that, we give you a free 15-day trial with no commitments.

Integrations: The tool should have the ability to integrate with other platforms, particularly data integrations. API capabilities are also important.

Our API V2 is out and we’ve got over 90+ integrations with tools in your martech stack.

Support Availability and Quality of Documentation: Good customer support and comprehensive documentation are crucial for troubleshooting and understanding how to use your experimentation tool effectively.

Besides our comprehensive support resources, you can reach our responsive support team from inside our platform in just 2 clicks.

Data Quality: The tool should provide high-quality data, including SRM (sample ratio mismatch) aspects.

Convert Experiences comes with a useful SRM checker. Once, enabled, the tag will look like this:

Transparency on Statistical Models & Ability to Run Your Own Stats Calculations:

The tool should provide clear information about its statistical models to ensure trustworthiness. It should also allow you to perform custom statistical calculations.

We’re very open about a lot of things. This one is no exception: Here are the statistical models we use.

Plus, Convert’s A/B testing significance calculator supports revenue metrics like AOV & ARPV.

Server-side Solutions: The tool should offer server-side testing capabilities. Our full stack testing feature allows you to test both front-end and back-end aspects of your product. We support both server-side and client-side testing.

GDPR Compliance and Strict Cookie Policies: The tool should be compliant with GDPR and other relevant data privacy regulations. We’ve been the forerunner of data privacy, security, and compliance for many years now. We’re GDPR and CCPA compliant.

Ability to Influence Product Roadmap: The tool should allow for user feedback and influence on its product roadmap. We maintain close relationships with our users and agencies who use our product. We listen and use your feedback to evolve Convert.

Targeting and Segmentation Functionalities: The tool should offer robust targeting and segmentation options.

Convert offers an advanced targeting engine with 40+ stackable filters, and the flexibility to use data locked in 3rd party systems. Plus, after your experiment starts receiving traffic, you can access the advanced post segmentation feature:

Load Time / Flickering: The tool should have minimal load time and prevent flickering issues. We have super fast loading times and solved the flickering problem years ago, back when it was still a luxury.

Anti-flicker technology has always been baked in our tool at no additional cost.

Ease of Use / UX: The tool should be user-friendly and manageable, even with multiple accounts/tests. We have countless reviews praising our “idiot-proof” UI.

Consistency and Predictability of Traffic Volumes: The tool should be able to handle consistent and predictable traffic volumes.

Visual Editor and Code Editor Feature: The tool should provide both a What You See Is What You Get (WYSIWYG) editor and a code editor.

Convert’s robust Visual Editor is usually one of the first features our users fall in love with — and the code editor features give you extra flexibility in building tests with custom JS and CSS.

Where would you rather pitch your tent after Google Optimize?

GA4 Goals Import & Automated Revenue Tracking Come to Convert Experiences

Dennis van der Heijden — Tue, 18 Jul 2023 19:31:14 +0000

2023 has been a year of unprecedented growth and change here at Convert.

Right after Google announced its Google Optimize sunset, we revamped our pricing model to accommodate testing teams with big ideas (and modest budgets).

Convert has always been the best tool at the intersection of pricing-support-features.

But with our new Community plans, we brought the advantage of full stack experimentation at the starting price of $199/mo.

Our team did not pause at pushing this never-seen-before offer live.

We continued on, launching our 27-second Google Optimize data importer.

And despite not being one of the three vendors in on developing the GA4 API, we scored our native GA4 integration close on the heels of said tool providers.

If that’s a hint of pride you’re detecting in my words… It is warranted and much deserved.

This sunset has ushered a dawn of initiative for Converters.

They’ve dug deep and brought inspiring ideas to the table, executed them at record speed, and displayed excellence along the way.

Time For Another First: GA4 Goals Import & Automated Revenue Tracking Come to Convert

This is a collaborative effort with Google (and GA4) aimed at bringing a unified experimentation solution to the world.

We at Convert believe that experimentation is human nature™.

It is the very bedrock upon which all progress and evolution rest.

Just because the business setting of companies and organizations is a shift away from the more organic nature of experimentation in life, doesn’t mean that teams must struggle with discrete, siloed solutions and exorbitant app price tags.

Simplifying testing (at least in its execution) drives our efforts.

Convert used innovative methods to integrate with GA4 while official APIs were still being developed. These two new functionalities are similar enhancements. Convert is arguably the first tool/platform to bring the capabilities to people, many of whom are frazzled and lost due to the Universal Analytics sunset.

Dennis van der Heijden, CEO, Convert.com.

GA4 Goals Import

Packaged within our one-click integration that requires Google authentication, the GA4 goals import is very similar to the UA goals import our customers have used and loved over the years.

It is an easy way to import the GA4 goals you set up within Google’s ecosystem into Convert. And gauge the success of variations served and experiments run against them.

Like its UA predecessor, the functionality is available through the Goals modal within the app.

You can update the goal description (as shown below).

And you can also filter experiments that are tagged with imported GA4 goals.

We sincerely believe that common language for the goals that matter to a business — within the analytics suite and the experimentation suite — will showcase the value of testing to broader audiences of stakeholders.

GA4 Automated Revenue Tracking

GA4 says your campaign earned $X!

But your experimentation platform marks the total revenue earned lower, at $Y?

If you are testing the effectiveness of ideas and campaigns, this dilemma is commonplace.

Not anymore.

Convert Experiences (with its one-click GA4 integration) will use the total revenue collected by GA4 as the basis of its own revenue reports, and Average Revenue per User (ARPU) calculations.

This again is an attempt to bridge the gap between analytics and experimentation, putting wins in company-wide and familiar contexts.

Plus, trustworthy results are the holy grail for A/B testing platforms, and this specific functionality—while innocuous—can save hundreds of hours of troubleshooting and test design reviews.

These enhancements wouldn’t be possible if Google had chosen to not make its GA4 API public. We appreciate their transparency, and will share developments at our end with them to bring these capabilities to other A/B testing platforms, ASAP.

Dennis van der Heijden, CEO, Convert.com.

The clock is ticking.

Don’t wait till September 30th to master a new testing tool.

The opportunity costs for several months after won’t be worth the annoyances you are potentially avoiding by holding on to Google Optimize.

Give Convert a whirl. We will surprise you – in a great way!

Recommended Resource: