Our checklist for the perfect testing set-up

The buzzword ‘Data Driven’ is on everyone’s lips. However, numerous decisions regarding the optimization of a website or an online shop are still made based on gut instinct. At times, supervisors want to see quick results, at other times they are convinced that the new version will definitely be better received by users than the current one. In the end, no one can tell which modification has actually led to the result. Only by continuously testing the changes made to your website, you will be able to make valid statements about their effect.


This checklist shows you the key questions on testing, how to set up your testing correctly, and what pitfalls you should be aware of.

1. When should you test a campaign?

Minor changes to the website do not necessarily have to be tested right away. However, elements that users interact with hold potential for improvement, making it worthwhile to frequently test different variants against each other. Having said that, it is important that a sufficient number of visitors are included in the target group, otherwise it may not be possible to obtain meaningful results in the end.


2. Which elements can I evaluate?

Generally speaking, nearly all components of a website can be tested. These include, for example:

  • Navigation: display, item order, design
  • Landing pages and product detail pages: Display and sequence of content elements, playout of recommendations
  • Category pages and search results pages: Display of results, additional visual elements, call-to-action buttons, filter options
  • Shopping cart: cross- and upsell measures
  • Check-out process: payment and quick check-out options, thank you page
  • Forms: Quantity of form fields, required fields, layout

What can and should be tested varies from industry to industry. However, every website holds potential for testing and optimization.

3. Which types of tests are there?

The most common type of test is probably the A/B test. A/B testing, also known as split testing, is an experiment in which two different variants are tested against one another. Variant A is the control group, the original version of the page. Variant B contains one or more elements that are modified from the original page.


Multivarianten-Tests are an extension of A/B testing, which is why they are often called A/B/n tests. The ‘n’ stands for the unlimited number of possible variants that can be tested against each other. It must be noted, however, that with an increasing number of variants, the size of the target group must also increase in order to obtain valid results. Thus, if you want to draw quick conclusions, you should test fewer variants against each other.


The third type are dynamic self-optimizing tests, also called Multi-Armed Bandit Tests. Multi-armed bandit testing eliminates a major weakness of A/B and multivariant tests. These tests distribute the users to different variants and at the end of the test, best case scenario, there is a clear winner, which then can be shown to 100% of the users. The disadvantage: Up to this point, a lot of traffic has already been directed to the losing variant(s). The multi-armed bandit test uses machine learning and AI to continuously optimize the test dynamically. Thus, traffic is dynamically directed to the best performing variants. Over time, the test learns which users respond best to certain options and, thanks to segmentation, can personalize them for each user. Again, the prerequisite being a sufficient amount of traffic on the page.

4. What should I test?

Before a test is set up, you must clearly define what is to be tested. Simply setting up a test without having thought about the objective in advance could, in the worst case, lead to the measured results not allowing any conclusions to be drawn about the actual question. Therefore, you should always work with hypotheses. The null hypothesis always assumes that a planned change has no influence on the KPIs. The alternative hypothesis, on the other hand, assumes that the change does have an impact on performance. This hypothesis must be confirmed or rejected as part of the test.

5. Which KPIs can I test?

There are countless possibilities for testing, meaning that in addition to the hypothesis, you also need to define the primary KPIs that will be decisive for evaluating the test result. This could be, for example, the click-through rate, the conversion rate or the conversion value. Of course, KPIs differ depending on the industry and the type of website.

6. What indicators need to be considered?

When assessing whether a hypothesis is ultimately to be rejected or accepted, you need to consider the significance and confidence of the test.

The confidence is the probability that a certain statement will be true. The significance, in turn, indicates the point at which the relationship measured between the variables did not occur by chance and the result can be generalized to the entirety. But don’t worry, you don’t have to enroll in a statistics 101 course for this: trbo provides a significance calculator which allows you to directly check the test results.

7. How long should I run a test for?

The length of time a test should run varies from website to website. How quickly a test becomes significant depends heavily on the traffic. Also, seasonal factors or the purchasing power of users can lead to fluctuating conversion and traffic figures and thus falsify the results. Thus, if possible, the test should take place in purchase-neutral months without any holidays – except if you want to test holiday-specific elements.

8. Is it possible to run multiple tests at the same time?

If the pages in question are clearly separated from each other, there is nothing wrong with running several tests simultaneously. If not, the tests may affect each other and the results will be distorted. And you should always consider the user flow, ensuring that users don’t come into contact with multiple tests in the course of their journey.

9. Which target group should I choose for my tests?

User segmentation is essential for testing. However, not every test is equally suitable for all users. In extreme cases, specific users can even massively falsify the test results, for example, if users who have already registered are included in the test of a newsletter registration form.

The biggest mistake of all: not testing at all

At first, this checklist may sound intimidating and complex. However, the biggest mistake you can make is not testing at all. To be able to make reliable statements about the performance and optimization potential of your online shop or website, there is no way around testing.


In our whitepaper A/B and multivariant testing, we have compiled further information for you on the above-mentioned points and provide examples to give you an idea of which tests can be easily implemented.


Apart from testing, trbo offers you many additional actions for onsite personalization and optimization. Interested? Arrange a free, non-binding demo!


Aktuelle Beiträge
A Beginner's Guide to Personalization: Simple Steps to Personalization
Du bist ein Anfänger in der Onsite-Personalisierung? Hier ist die Anleitung für dich mit einfachen Schritten....
Read More
Netflix und Personalisierung: Wie Personalisierung die Streaming-Industrie revolutioniert
Die Art und Weise, wie Netflix seine Plattform personalisiert ist revolutionär. Du möchtest lernen, wie...
Read More
trbo Insights - Florian's Favorite Feature
When you think about a typical working day, which customer inquiries do you enjoy the most?  I am most...
Read More