A/B Testing in ASO. What Is It and How to Conduct It in Apple’s App Store or Google Play?

App Store Optimization
CRO
Nov 1, 2023

A/B testing in ASO is the process of comparing two or more variations of visual or textual elements to determine what the store visitors perceive as the most appealing option. You can conduct A/B testing on screenshots, icons, or textual metadata within the context of Google Play. RadASO team will take you by the hand and explain what is A/B testing in ASO, the key differences in A/B tests for the App Store and Google Play, and how to do it correctly.

Join the open ASO & User Acquisition community on Discord - ASO Busters! Here, engage in insider discussions, share insights, and collaborate with ASO and UA experts. Our channels cover the App Store, Google Play, visual ASO, ASA, UAC, Facebook, and TikTok.

A/B Testing in ASO – How it Works

Split the total number of users into two groups: A and B. Group A continues with the usual experience and sees the current screenshots. Group B receives a new experience and views fresh new test screenshots. Continue testing until you identify the group with the superior installation conversion rates.

During the test launch, there is an opportunity to select various parameters:

  • The percentage of users to whom it will be displayed.
  • Countries in which the test will be conducted.
  • Conditions under which the test will be considered successful.

However, setting the parameters of whom the test will be displayed to or controlling the audience demographics is impossible.

The main objective of A/B tests in ASO is to improve conversion rates in one of the variants. Sometimes, minor changes, such as a different color for the CTA button, lead to significant differences in user interaction with the application. When creating a hypothesis, specify what you will change and why.

Attribute
Value
Format
PNG
Color space
Display P3 (wide-gamut color), sRGB (color), or Gray Gamma 2.2 (grayscale).
See Color Management.
Layers
Flattened with no transparency
Resolution
Shape
Square with no rounded corners
Attribute
Value
Format
PNG
Color space
Display P3 (wide-gamut color), sRGB (color), or Gray Gamma 2.2 (grayscale).
See Color Management.
Layers
Flattened with no transparency
Resolution
Shape
Square with no rounded corners
Word
Frequency in service
Qimai(China)*
AppTweak
Mobileaction
Appfollow
课程 (course)
5387 (high)
56 (mid)
5 (low)
34 (mid)
外语 (foreign language)
5699 (high)
35 (mid)
5 (low)
34(mid)
播放 (play)
5496 (high)
35 (mid)
5 (low)
35 (mid)
*Search index in China provided by Qimai and based on multidimensional calculations such as search results, downloads and relatable keywords.
App Store
Google Play
1. Icon impact on user choice
Important, but not more than a video with screenshots, as they take up most of the screen
Icon is displayed with several other icons in search results, and therefore it has the greatest impact on the user's choice
2. Icon localization
1 icon for all locales
Can be localized for each country
3. Screenshots quantity, orientation
From 1 to 10, any orientation
From 2 to 8, any orientation
4. Screenshots the most important are the ones that are visible before you start scrolling
  • Vertical screenshots: 3 are visible
  • Horizontal screenshots: only the first one is visible
  • Vertical screenshots: 4 are visible
  • Horizontal screenshots: only the first one is visible
5. Screenshots size
Large — the content is legible
Much smaller — the captions are quite illegible
6. Video quantity
Up to 3
One video that must be uploaded to YouTube
7. Video orientation
Any orientation
Horizontal orientation prioritized
8. Video autoplay
Plays automatically for 30 seconds without sound
You need to click on it to play
9. Application cover
Displayed on the application and developer pages
There is no cover
10. Graphics update
After release only
Whenever

What to Consider when Preparing a Hypothesis for A/B Testing

Choose an element that will be changed (tested) and, in your opinion, will have a significant impact on users. For example, the background in one of the screenshots. The hypothesis may be that changing it will increase conversion.

  1. Define the specifics of the change. Specifically and clearly indicate what you want to change in this element and add approximate references. For example, "replace the dark background with a light one" or "replace the background with an image of people with a solid background."
  2. Evaluate how the change affects users. The test will not show results if only a small percentage of users notice the change.
  3. The change should be noticeable on the first three vertical screenshots (if there's a video, on the first two). On horizontal screenshots or videos, the changes should be obvious right from the start.

Let's look at examples:

Example 1. Changes are not immediately apparent on the sixth screenshot. Most users only look at the first few and don't scroll to the end. Therefore, such a test is not useful since its results do not allow you to draw a meaningful conclusion.

Example 2. Changes are immediately noticeable on the very first, most conversion-driven screenshot. Only one crucial shift is being tested, not several simultaneously. The results of this A/B test will reveal what users find more alluring for viewing and downloading.

Device or context
Icon size
iPhone
60x60 pt (180x180 px @3x)
60x60 pt (120x120 px @2x)
iPad Pro
83.5x83.5 pt (167x167 px @2x)
iPad, iPad mini
76x76 pt (152x152 px @2x)
App Store
1024x1024 pt (1024x1024 px @1x)
Source language
Target language
How much longer (+) or shorter (–) the text in the target language is
English
French
21.18%
English
Spanish
19.52%
English
Italian
17.91%
English
Deutsch
16.67%
English
Dutch
13.80%
English
Portuguese (Portugal)
14.29%
English
Portuguese (Brazil)
12.96%
English
Polish
9.33%
English
Russian
9.11%
English
Czech
3.70%
English
Arab
–6.25%
English
Japanese
–39.68%
English
Korean
–44.04%
English
Chinese (Simplified)
–61.97%
English
Chinese (Traditional)
–63.80%
*Search Ads Popularity (SAP)shows the popularity of the search term from 5 to 99.

A/B Test Differences Table in App Store and Google Play

Device
Spotlight icon size
Settings icon size
Notification icon size
iPhone
40x40 pt (120x120 px @3x)
29x29 pt (87x87 px @3x)
38x38 pt (114x114 px @3x)
40x40 pt (80x80 px @2x)
29x29 pt (58x58 px @2x)
38x38 pt (76x76 px @2x)
iPad Pro, iPad, iPad mini
40x40 pt (80x80 px @2x)
29x29 pt (58x58 px @2x)
38x38 pt (76x76 px @2x)
Search term
Translation
SAP* English (U.S.)
SAP* English (U.K.)
truck games
games with trucks
62
39
lorry games
games with trucks
32
jail games
prison games
29
8
prison games
prison games
32
24
*Search Ads Popularity (SAP)shows the popularity of the search term from 5 to 99.
Google Play
App Store
What can be tested?
  • Short description
  • Long description
  • Icon
  • Feature graphics
  • Screenshots
  • Videos
  • Screenshots
  • Videos
  • Icon (has to be uploaded to the build*)
Number of simultaneously running tests
5 tests (each test is valid within a single country.

You can choose a default country test (details below): then it will run in all countries where there are no localized graphical or textual materials.
1 test (the test can be immediately extended to all countries where the application is available or opt for specific countries as needed)
The number of test variants that can be tested with the current version in the store
Compared with a maximum of 3 new variants
Can a test be launched while another item is under review?
Yes
No
Mandatory formats for screenshots uploaded to the store
6.5
  • 6.5
  • 5.5
  • 12.9 (if there is an iPad version)
*Build – is a new version of the application. Updating the icon is only possible when updating the application version in the store. In other words, the term "build" refers to a specific version or variant of the application that is ready to be downloaded and installed on the users' devices. It contains all the necessary files and data for users to install and use the application.
More about optimizing graphic elements in the App Store and Google Play can be found in the article 'Graphics in Mobile App Promotion in the App Store and Google Play (ASO) – How to Optimize Graphic Elements.'

How to Publish an A/B Test in the App Store

1. Navigate to the Product Page Optimization tab in the App Store Console.

2. After naming the test, specify the type of test you are launching (A/B, A/B/B, or A/B/C test, etc.), the countries for displaying this test (by default, all 39 countries are selected), and an approximate test duration.

3. Upload your graphic materials.

For a more detailed description, read the official App Store documentation.

Device size or platform
Screenshot size
Requirement
Screenshot source
6.5 inch (iPhone 13 Pro Max, iPhone 12 Pro Max, iPhone 11 Pro Max, iPhone 11, iPhone XS Max, iPhone XR)
1284 x 2778 pixels (portrait)2778 x 1284 pixels (landscape)1242 x 2688 pixels (portrait)2688 x 1242 pixels (landscape)
Required if app runs on iPhone
Upload 6.5-inch screenshots
5.8 inch (iPhone 13 Pro, iPhone 13, iPhone 13 mini, iPhone 12 Pro, iPhone 12, iPhone 12 mini, iPhone 11 Pro, iPhone XS, iPhone X)
1170 x 2532 pixels (portrait)2532 x 1170 pixels (landscape)1125 x 2436 pixels (portrait)2436 x 1125 pixels (landscape)1080 x 2340 (portrait)2340 x 1080 (landscape)
Required if app runs on iPhone and 6.5 inch screenshots are not provided
Default: scaled 6.5-inch screenshotsAlternative: upload 5.8-inch screenshots
5.5 inch (iPhone 8 Plus, iPhone 7 Plus, iPhone 6s Plus)
1242 x 2208 pixels (portrait)2208 x 1242 pixels (landscape)
Required if app runs on iPhone
Upload 5.5-inch screenshots
5.5 inch (iPhone 8 Plus, iPhone 7 Plus, iPhone 6s Plus)
2048 x 2732 pixels (portrait)2732 x 2048 pixels (landscape)
Required if app runs on iPad
Upload 12.9-inch iPad Pro (3rd generation) screenshots
5.5 inch (iPhone 8 Plus, iPhone 7 Plus, iPhone 6s Plus)
2048 x 2732 pixels (portrait)2732 x 2048 pixels (landscape)
Required if app runs on iPad
Upload 12.9-inch iPad Pro (2nd generation) screenshots
Price
Traffic Source (Self Reporting Networks)
Facebook
Google
X (Twitter)
Apple Search Ads
Cohort reports
Impression tracking
Audience segmentation
At extra charge
Custom Dashboards
Custom Reports
At extra charge
Advertisement Cost
DAU/MAU (Stickiness)
Raw Data Export
At extra charge
At extra charge
At extra charge
API Reporting
Search term
Translation
SAP* French
SAP* French (Canada)
soldes
sale
25
5
aubainerie
sale
14
*Search Ads Popularity (SAP)shows the popularity of the search term from 5 to 99.

How to Publish an A/B Test in Google Play

1. On the Store listing experiments tab in the Google Play Console, select the countries where you wish to conduct the test. Unlike the App Store, you can only choose one country for one test or opt for a test in the default country (i.e., for all countries without localized graphic or text materials, depending on what you are testing). So, determine whether the test will be conducted in the default or a specific country.

More information can be found in the official documentation.

2. Configure the metrics that affect the accuracy of the test and determine the number of downloads:

  • Metric aimed at users who have downloaded the application or those who downloaded and did not delete it within the first day.
  • The test variant you will launch (A/B, A/B/C, A/B/C/D – more information on the main differences below).
  • The percentage of visitors who will see the experimental variant instead of the currently active one.
  • The minimum difference between the new variants and the currently active variant that will determine the winner.
  • Confidence coefficient in the test results.

3. Determine what to test. Unlike what is the case in the App Store, you can test not only graphic elements but also text (full and short descriptions).

For A/B tests, you can only upload screenshots in one size. Google will automatically adapt them to other formats.

Device
Icon size
iPhone
180px × 180px (60pt × 60pt @3x)
120px × 120px (60pt × 60pt @2x)
iPad Pro
167px × 167px (83.5pt × 83.5pt @2x)
iPad, iPad mini
152px × 152px (76pt × 76pt @2x)
App Store
1024px × 1024px (1024pt × 1024pt @1x)
Icon format
PNG
Color models
sRGB or P3 (see "Color Management")
Layouts
Aligned without transparency
Sizes
Shape
Square without rounded corners
All the current requirements for icons are specified in the specification.

How to Prepare the Application for Testing

1. Run A/B/B tests, not A/B tests.

А – is the current variant of screenshots (or other materials for testing) that are currently in the store.

В – is the new variant of screenshots that need to be tested.

В – duplicate the screenshots to be tested.

A/B/B tests additionally confirm the likelihood of results. Ideally, in the best scenario, B1 and B2 should exhibit fairly similar performance metrics (more about this in the 'A/B Test Results' section below).

2. The test should last for at least two weeks (depending on how much traffic the application is getting).

As shown in the example below, sometimes this is not enough. The total amount of traffic was low, so two weeks turned out to be insufficient. Ambiguous results persisted for about a month. However, in one and a half months, significant improvements were observed for options B1 and B2. In total, the test lasted for more than 70 days.

3. Graphical changes should be significant.

  • Select a single hypothesis that has the greatest potential to impact the end-user in a significant manner.
  • Focus on the key changes in the first three screenshots during the hypothesis test (if there are also videos, focus on the first two). If the application has horizontal screenshots, focus on the first one.

In the case of rebranding (changing colors, fonts, characters, etc.), the screenshots should undergo a drastic transformation. This is also recommended if the previous screenshots are deemed to be unsatisfactory.

4. Cross-Marketing Activity

Consider global marketing activities. Users associate the brand with specific characters. Therefore, in all promotion channels and during tests in the store, use screenshots with the same characters.

5.Consider the strength of the brand.

A popular application (e.g., Netflix) receives the majority of its views and downloads through brand-specific search queries. Graphics have little influence on user choices. The results of such a test may not always be indicative, despite the amount of traffic and changes.

6. Cultural Localization

Pay attention to the cultural nuances of each region. Localize the language in the screenshots, add colors, elements, and individuals representative of the country. This will spark the interest of the local population.

Icon format
32-bit PNG (with alpha channel)
Color models
512 х 512 pixels
Layouts
1024 КБ
Sizes
square, without rounding and shadows (Google automatically rounds the corners and adds shadows)
All the current requirements are specified in the specification.

A/B Test Results in Google Play

Dictionary:

  • Audience – % of users who see the experiment.
  • Installers (current) – the number of actual downloads during the experiment.
  • Installers (scaled) – the number of downloads during the experiment divided by the audience share.
  • Performance – the likely change in conversion rates when applying the tested variant (the metric is available when there is enough data).

Example 1:

Most likely, test screenshots A and B will win. However, if the result in the Performance column is not entirely in the 'red' or 'green' zone, such results should not be considered 100% reliable.

Let's calculate the expected conversion change:

  1. for Treatment B1: (-11.5 + 24.9) / 2 = 6.7
  2. for Treatment B2: (-9.5 + 15.1) / 2 = 2.8
  3. average of the two values: (6.7 + 2.8) / 2 = 4.75

Conversion will increase by 4.75%. If the current conversion was 30%, the projected conversion will be: 30 + (30 * 4.75 / 100) = 31.43%*

*Important! Do not add the average Performance percentage to the current conversion; instead, change the current conversion by that percentage.

Example 2:

Both variants displayed significantly negative outcomes. Conclusion: the test was unsuccessful.

Example 3.

The same test variant produces different outcomes: in V1, it results in a favorable outcome, while in V2, the opposite occurs. In such a case, calculations using the formula won't yield reliable results to base your decisions on. V1 and V2 should yield more or less similar results.

A/B Test Results in the App Store

Glossary:

  1. Conversion rate – the conversion of test variants (Apple, in contrast to Google Play, displays this immediately).
  2. Improvement – the relative difference between the variant being tested and the variant that’s currently active in the store. If you click on it, you'll see the percentage range over the entire testing period.
  3. Confidence – the confidence level in the results of each individual variant. It should be at least 50% to reach a conclusive decision about the test.

The Confidence and conversion rate improvement indicators in the chart below demonstrate that this test is a winner.

After adopting the winning test variant, measure the conversion once again.

A/B tests are an ongoing process because user preferences constantly change. Today, they might be drawn to a blue background, but later, red might receive more attention.

It's also important to evaluate the results accurately. The test winner doesn't always guarantee an improvement in conversion, and vice versa, and drawing conclusions too hastily can lead to unexpected outcomes.

Home
Up