Boost Your App

Your request has been received! We'll be in touch shortly

Oops! Something went wrong while submitting the form.

A/B Testing in ASO. What Is It and How to Conduct It in Apple’s App Store or Google Play?

Nov 1, 2023

A/B testing in ASO is the process of comparing two or more variations of visual or textual elements to determine what the store visitors perceive as the most appealing option. You can conduct A/B testing on screenshots, icons, or textual metadata within the context of Google Play. RadASO team will take you by the hand and explain what is A/B testing in ASO, the key differences in A/B tests for the App Store and Google Play, and how to do it correctly.

Join the open ASO & User Acquisition community on Discord - ASO Busters! Here, engage in insider discussions, share insights, and collaborate with ASO and UA experts. Our channels cover the App Store, Google Play, visual ASO, ASA, UAC, Facebook, and TikTok.

A/B Testing in ASO – How it Works

Split the total number of users into two groups: A and B. Group A continues with the usual experience and sees the current screenshots. Group B receives a new experience and views fresh new test screenshots. Continue testing until you identify the group with the superior installation conversion rates.

During the test launch, there is an opportunity to select various parameters:

The percentage of users to whom it will be displayed.
Countries in which the test will be conducted.
Conditions under which the test will be considered successful.

However, setting the parameters of whom the test will be displayed to or controlling the audience demographics is impossible.

The main objective of A/B tests in ASO is to improve conversion rates in one of the variants. Sometimes, minor changes, such as a different color for the CTA button, lead to significant differences in user interaction with the application. When creating a hypothesis, specify what you will change and why.

Attribute

Value

Format

PNG

Color space

Display P3 (wide-gamut color), sRGB (color), or Gray Gamma 2.2 (grayscale).
See Color Management.

Layers

Flattened with no transparency

Resolution

Varies. See Image Size and Resolution.

Shape

Square with no rounded corners

Attribute

Value

Format

PNG

Color space

Display P3 (wide-gamut color), sRGB (color), or Gray Gamma 2.2 (grayscale).
See Color Management.

Layers

Flattened with no transparency

Resolution

Varies. See Image Size and Resolution.

Shape

Square with no rounded corners

Word

Frequency in service

Qimai(China)*

AppTweak

Mobileaction

Appfollow

课程 (course)

5387 (high)

56 (mid)

5 (low)

34 (mid)

外语 (foreign language)

5699 (high)

35 (mid)

5 (low)

34(mid)

播放 (play)

5496 (high)

35 (mid)

5 (low)

35 (mid)

*Search index in China provided by Qimai and based on multidimensional calculations such as search results, downloads and relatable keywords.

App Store

Google Play

1. Icon impact on user choice

Important, but not more than a video with screenshots, as they take up most of the screen

Icon is displayed with several other icons in search results, and therefore it has the greatest impact on the user's choice

2. Icon localization

1 icon for all locales

Can be localized for each country

3. Screenshots quantity, orientation

From 1 to 10, any orientation

From 2 to 8, any orientation

4. Screenshots the most important are the ones that are visible before you start scrolling

Vertical screenshots: 3 are visible
Horizontal screenshots: only the first one is visible

Vertical screenshots: 4 are visible
Horizontal screenshots: only the first one is visible

5. Screenshots size

Large — the content is legible

Much smaller — the captions are quite illegible

6. Video quantity

Up to 3

One video that must be uploaded to YouTube

7. Video orientation

Any orientation

Horizontal orientation prioritized

8. Video autoplay

Plays automatically for 30 seconds without sound

You need to click on it to play

9. Application cover

Displayed on the application and developer pages

There is no cover

10. Graphics update

After release only

Whenever

What to Consider when Preparing a Hypothesis for A/B Testing

‍Choose an element that will be changed (tested) and, in your opinion, will have a significant impact on users. For example, the background in one of the screenshots. The hypothesis may be that changing it will increase conversion.

Define the specifics of the change. Specifically and clearly indicate what you want to change in this element and add approximate references. For example, "replace the dark background with a light one" or "replace the background with an image of people with a solid background."
Evaluate how the change affects users. The test will not show results if only a small percentage of users notice the change.
The change should be noticeable on the first three vertical screenshots (if there's a video, on the first two). On horizontal screenshots or videos, the changes should be obvious right from the start.

Let's look at examples:

Example 1. Changes are not immediately apparent on the sixth screenshot. Most users only look at the first few and don't scroll to the end. Therefore, such a test is not useful since its results do not allow you to draw a meaningful conclusion.

Example 2. Changes are immediately noticeable on the very first, most conversion-driven screenshot. Only one crucial shift is being tested, not several simultaneously. The results of this A/B test will reveal what users find more alluring for viewing and downloading.

Device or context

Icon size

iPhone

60x60 pt (180x180 px @3x)

60x60 pt (120x120 px @2x)

iPad Pro

83.5x83.5 pt (167x167 px @2x)

iPad, iPad mini

76x76 pt (152x152 px @2x)

App Store

1024x1024 pt (1024x1024 px @1x)

Source language

Target language

How much longer (+) or shorter (–) the text in the target language is

English

French

21.18%

English

Spanish

19.52%

English

Italian

17.91%

English

Deutsch

16.67%

English

Dutch

13.80%

English

Portuguese (Portugal)

14.29%

English

Portuguese (Brazil)

12.96%

English

Polish

9.33%

English

Russian

9.11%

English

Czech

3.70%

English

Arab

–6.25%

English

Japanese

–39.68%

English

Korean

–44.04%

English

Chinese (Simplified)

–61.97%

English

Chinese (Traditional)

–63.80%

*Search Ads Popularity (SAP) – shows the popularity of the search term from 5 to 99.

A/B Test Differences Table in App Store and Google Play

Device

Spotlight icon size

Settings icon size

Notification icon size

iPhone

40x40 pt (120x120 px @3x)

29x29 pt (87x87 px @3x)

38x38 pt (114x114 px @3x)

40x40 pt (80x80 px @2x)

29x29 pt (58x58 px @2x)

38x38 pt (76x76 px @2x)

iPad Pro, iPad, iPad mini

40x40 pt (80x80 px @2x)

29x29 pt (58x58 px @2x)

38x38 pt (76x76 px @2x)

Search term

Translation

SAP* English (U.S.)

SAP* English (U.K.)

truck games

games with trucks

lorry games

games with trucks

–

jail games

prison games

*Search Ads Popularity (SAP) – shows the popularity of the search term from 5 to 99.

Google Play

App Store

What can be tested?

Short description
Long description
Icon
Feature graphics
Screenshots
Videos

Screenshots
Videos
Icon (has to be uploaded to the build*)

Number of simultaneously running tests

5 tests (each test is valid within a single country.

You can choose a default country test (details below): then it will run in all countries where there are no localized graphical or textual materials.

1 test (the test can be immediately extended to all countries where the application is available or opt for specific countries as needed)

The number of test variants that can be tested with the current version in the store

Compared with a maximum of 3 new variants

Can a test be launched while another item is under review?

Yes

Mandatory formats for screenshots uploaded to the store

6.5

6.5
5.5
12.9 (if there is an iPad version)

*Build – is a new version of the application. Updating the icon is only possible when updating the application version in the store. In other words, the term "build" refers to a specific version or variant of the application that is ready to be downloaded and installed on the users' devices. It contains all the necessary files and data for users to install and use the application.

More about optimizing graphic elements in the App Store and Google Play can be found in the article 'Graphics in Mobile App Promotion in the App Store and Google Play (ASO) – How to Optimize Graphic Elements.'

How to Publish an A/B Test in the App Store

1. Navigate to the Product Page Optimization tab in the App Store Console.

2. After naming the test, specify the type of test you are launching (A/B, A/B/B, or A/B/C test, etc.), the countries for displaying this test (by default, all 39 countries are selected), and an approximate test duration.

3. Upload your graphic materials.

For a more detailed description, read the official App Store documentation.

Device size or platform

Screenshot size

Requirement

Screenshot source

6.5 inch (iPhone 13 Pro Max, iPhone 12 Pro Max, iPhone 11 Pro Max, iPhone 11, iPhone XS Max, iPhone XR)

1284 x 2778 pixels (portrait)2778 x 1284 pixels (landscape)1242 x 2688 pixels (portrait)2688 x 1242 pixels (landscape)

Required if app runs on iPhone

Upload 6.5-inch screenshots

5.8 inch (iPhone 13 Pro, iPhone 13, iPhone 13 mini, iPhone 12 Pro, iPhone 12, iPhone 12 mini, iPhone 11 Pro, iPhone XS, iPhone X)

1170 x 2532 pixels (portrait)2532 x 1170 pixels (landscape)1125 x 2436 pixels (portrait)2436 x 1125 pixels (landscape)1080 x 2340 (portrait)2340 x 1080 (landscape)

Required if app runs on iPhone and 6.5 inch screenshots are not provided

Default: scaled 6.5-inch screenshotsAlternative: upload 5.8-inch screenshots

5.5 inch (iPhone 8 Plus, iPhone 7 Plus, iPhone 6s Plus)

1242 x 2208 pixels (portrait)2208 x 1242 pixels (landscape)

Required if app runs on iPhone

Upload 5.5-inch screenshots

5.5 inch (iPhone 8 Plus, iPhone 7 Plus, iPhone 6s Plus)

2048 x 2732 pixels (portrait)2732 x 2048 pixels (landscape)

Required if app runs on iPad

Upload 12.9-inch iPad Pro (3rd generation) screenshots

5.5 inch (iPhone 8 Plus, iPhone 7 Plus, iPhone 6s Plus)

2048 x 2732 pixels (portrait)2732 x 2048 pixels (landscape)

Required if app runs on iPad

Upload 12.9-inch iPad Pro (2nd generation) screenshots

Price

AppsFlyer

Firebase

Adjust

https://www.appsflyer.com/pricing/

https://firebase.google.com/pricing

https://www.adjust.com/pricing/

Traffic Source (Self Reporting Networks)

Facebook

Google

X (Twitter)

Apple Search Ads

Cohort reports

Impression tracking

Audience segmentation

At extra charge

Custom Dashboards

Custom Reports

At extra charge

Advertisement Cost

DAU/MAU (Stickiness)

Raw Data Export

At extra charge

API Reporting

Search term

Translation

SAP* French

SAP* French (Canada)

soldes

sale

aubainerie

sale

–

*Search Ads Popularity (SAP) – shows the popularity of the search term from 5 to 99.

How to Publish an A/B Test in Google Play

1. On the Store listing experiments tab in the Google Play Console, select the countries where you wish to conduct the test. Unlike the App Store, you can only choose one country for one test or opt for a test in the default country (i.e., for all countries without localized graphic or text materials, depending on what you are testing). So, determine whether the test will be conducted in the default or a specific country.

More information can be found in the official documentation.

2. Configure the metrics that affect the accuracy of the test and determine the number of downloads:

Metric aimed at users who have downloaded the application or those who downloaded and did not delete it within the first day.
The test variant you will launch (A/B, A/B/C, A/B/C/D – more information on the main differences below).
The percentage of visitors who will see the experimental variant instead of the currently active one.
The minimum difference between the new variants and the currently active variant that will determine the winner.
Confidence coefficient in the test results.

3. Determine what to test. Unlike what is the case in the App Store, you can test not only graphic elements but also text (full and short descriptions).

For A/B tests, you can only upload screenshots in one size. Google will automatically adapt them to other formats.

Device

Icon size

iPhone

180px × 180px (60pt × 60pt @3x)

120px × 120px (60pt × 60pt @2x)

iPad Pro

167px × 167px (83.5pt × 83.5pt @2x)

iPad, iPad mini

152px × 152px (76pt × 76pt @2x)

App Store

1024px × 1024px (1024pt × 1024pt @1x)

Icon format

PNG

Color models

sRGB or P3 (see "Color Management")

Layouts

Aligned without transparency

Sizes

Different. See "Image Sizing and Management"

Shape

Square without rounded corners

All the current requirements for icons are specified in the specification.

How to Prepare the Application for Testing

1. Run A/B/B tests, not A/B tests.

А – is the current variant of screenshots (or other materials for testing) that are currently in the store.

В – is the new variant of screenshots that need to be tested.

В – duplicate the screenshots to be tested.

A/B/B tests additionally confirm the likelihood of results. Ideally, in the best scenario, B1 and B2 should exhibit fairly similar performance metrics (more about this in the 'A/B Test Results' section below).

2. The test should last for at least two weeks (depending on how much traffic the application is getting).

As shown in the example below, sometimes this is not enough. The total amount of traffic was low, so two weeks turned out to be insufficient. Ambiguous results persisted for about a month. However, in one and a half months, significant improvements were observed for options B1 and B2. In total, the test lasted for more than 70 days.

3. Graphical changes should be significant.

Select a single hypothesis that has the greatest potential to impact the end-user in a significant manner.
Focus on the key changes in the first three screenshots during the hypothesis test (if there are also videos, focus on the first two). If the application has horizontal screenshots, focus on the first one.

In the case of rebranding (changing colors, fonts, characters, etc.), the screenshots should undergo a drastic transformation. This is also recommended if the previous screenshots are deemed to be unsatisfactory.

4. Cross-Marketing Activity

Consider global marketing activities. Users associate the brand with specific characters. Therefore, in all promotion channels and during tests in the store, use screenshots with the same characters.

5.Consider the strength of the brand.

A popular application (e.g., Netflix) receives the majority of its views and downloads through brand-specific search queries. Graphics have little influence on user choices. The results of such a test may not always be indicative, despite the amount of traffic and changes.

6. Cultural Localization

Pay attention to the cultural nuances of each region. Localize the language in the screenshots, add colors, elements, and individuals representative of the country. This will spark the interest of the local population.

Icon format

32-bit PNG (with alpha channel)

Color models

512 х 512 pixels

Layouts

1024 КБ

Sizes

square, without rounding and shadows (Google automatically rounds the corners and adds shadows)

All the current requirements are specified in the specification.

A/B Test Results in Google Play

Dictionary:

Audience – % of users who see the experiment.
Installers (current) – the number of actual downloads during the experiment.
Installers (scaled) – the number of downloads during the experiment divided by the audience share.
Performance – the likely change in conversion rates when applying the tested variant (the metric is available when there is enough data).

Example 1:

Most likely, test screenshots A and B will win. However, if the result in the Performance column is not entirely in the 'red' or 'green' zone, such results should not be considered 100% reliable.

Let's calculate the expected conversion change:

for Treatment B1: (-11.5 + 24.9) / 2 = 6.7
for Treatment B2: (-9.5 + 15.1) / 2 = 2.8
average of the two values: (6.7 + 2.8) / 2 = 4.75

Conversion will increase by 4.75%. If the current conversion was 30%, the projected conversion will be: 30 + (30 * 4.75 / 100) = 31.43%*

*Important! Do not add the average Performance percentage to the current conversion; instead, change the current conversion by that percentage.

Example 2:

Both variants displayed significantly negative outcomes. Conclusion: the test was unsuccessful.

Example 3.

The same test variant produces different outcomes: in V1, it results in a favorable outcome, while in V2, the opposite occurs. In such a case, calculations using the formula won't yield reliable results to base your decisions on. V1 and V2 should yield more or less similar results.

A/B Test Results in the App Store

№

Traffic

Popularity

Keyword

385 585

instagram

325 480

201 485

177 737

facebook

119 067

tik tok

92 427

ifood

80 473

tinder

69 379

capcut

65 120

nubank

64 602

uber

A/B Test Results in the App Store

Glossary:

Conversion rate – the conversion of test variants (Apple, in contrast to Google Play, displays this immediately).
Improvement – the relative difference between the variant being tested and the variant that’s currently active in the store. If you click on it, you'll see the percentage range over the entire testing period.
Confidence – the confidence level in the results of each individual variant. It should be at least 50% to reach a conclusive decision about the test.

The Confidence and conversion rate improvement indicators in the chart below demonstrate that this test is a winner.

After adopting the winning test variant, measure the conversion once again.

№

Traffic

Popularity

Keyword

Translate (EN)

233 158

snapchat

209 668

instagram

114 577

tik tok

92 500

70 286

facebook

59 302

tinder

51 352

42 468

youtube

36 646

tiktok

36 420

messenger

№

Traffic

Popularity

Keyword

Translate (EN)

243 626

instagram

122 131

snapchat

74 161

tik tok

67 431

55 634

40 775

youtube

39 319

tinder

37 889

facebook

37 600

26 225

brawl stars

№

Traffic

Popularity

Keyword

123 159

instagram

91 584

snapchat

52 253

facebook

28 524

tik tok

25 788

23 972

youtube

23 005

tinder

20 257

17 652

messenger

14 431

discord

№

Traffic

Popularity

Keyword

Translate (EN)

96 271

인스타그램

Instagram

66 339

vpn

47 883

카카오톡

KakaoTalk

40 069

유튜브

YouTube

39 041

배달의민족

Delivery ethnicity

33 538

32 794

트위터

Twitter

29 047

쿠팡플레이

Coupang Play

21 262

네이버

Naver

18 882

티빙

Tibing

№

Traffic

Popularity

Keyword

144 520

instagram

102 720

snapchat

70 760

facebook

45 775

tik tok

26 665

youtube

25 880

24 777

tinder

21 923

21 702

messenger

15 312

spotify

№

Traffic

Popularity

Keyword

Translate (EN)

38 906

телеграм

24 281

20 773

бравл старс

brawl stars

17 750

пабг

YouTube

16 308

дія

Delivery ethnicity

15 075

tik tok

12 534

instagram

Twitter

11 554

вайбер

viber

11 418

brawl stars

11 205

viber

A/B tests are an ongoing process because user preferences constantly change. Today, they might be drawn to a blue background, but later, red might receive more attention.

It's also important to evaluate the results accurately. The test winner doesn't always guarantee an improvement in conversion, and vice versa, and drawing conclusions too hastily can lead to unexpected outcomes.

The Ultimate Guide to Mobile Analytics and MMP

Iryna Kuznietsova

November 24, 2022

Learn more

Apple Search Ads Breakdown. Do we really need it for app promotion?

Irina Prikhodko

June 20, 2022

Learn more

App Store Auto-correct: How will this update affect the future of App Store optimization?

Yaroslav Vorona

August 16, 2022

Learn more