A/B testing study

J·2024년 9월 10일

A/B testing steps

  1. Pre-requisites
  2. Experiment Design
  3. Running Experiment
  4. Result to Decision
  5. Post-launch Monitoring

1.Pre-requisites

  • Objective & Key Metrics
    • Key Metric
      • Revenue
      • Fair when N(control) = N(treatment)
      • Normalize revenue by # of users
      • Revenue per user
  • Vatiants
    • Control group : checkout
    • Treatment group 1: display similar products in checkout
    • Treatment group 2: popup similar products window in checkout
  • Randomization units
    • Users
    • Assume enough users

2.Experiment Design

  • User to target
    • All users?
    • Specific segment of users?

If choose users from "Land on homepage", most of them won't see the feature
therefore, target users should be from "Start checkout"

  • Practical significane boundary
    • Assume pratical significane:
      • Revenue incrase : $2 per user
    • Power of the test : 80% (indsutrial standard)
    • Significan level : 5% (indsutrial standard)
    • Sample size
      (sigma = Standard deviation of population / delta = difference between treatment & control

Assume sigma is 20 for this example :

16*20^2 / 2^2 = 1600 (we need 1600 unique users in each variant)

therefore 4800 unique users for 3 variants

Other case - Need more samples when

Smaller change, sigma = $1 per suser
Smaller significane level, alpha = 2.5%

Decide how long to run (consider 4 factors)

    1. Rump-up plan: Start with dozens of users
    • No bugs
    • Traffic can be handled
    • Expose to a small population
    • Gradually increase percentage

Assume 2000 users per day entering checkout

    1. Day of week effect
    • People behave differently (People make more purchases on wage day)

Recommended - Run experiment for more then 1 whole week

    1. Seasonality
    • Holiday season (Surge in sales during Black Friday)

so the data during the holidays can not be used for analysis and run experiment longer

    1. Primacy and novelty effects
    • users respond to changes differently

3.Running Experiment

running the experiment based on experiment design and collecting log data

Running experiment for too long will not imporve precision any further

4.Result to Decision

Before into analysis

  • Sanity checks
    • Unreliable if assumptions are violated

Things need to check
1. number of users assigned to groups
2. Latency when loading the webpage (user experience is consistant among each group)

  • Hypotheses test to make recommendation

Recommend launching a change when
1. Statistically significant
2. Practically significant

Treatment 1 vs control

Result of treatment 1 (arguable)

  • No impact at all
    or
  • Impact is significant enough

Recommendation of treatment 1

Due to some uncertainty

  • Do not launch the change
  • Run a follow-up test with more power

Treatment 2 vs control

Recommendation of treatment 2

  • Run a follow-up test with more power
profile
Full of adventure

0개의 댓글