이커머스 데이터로 프로젝트를 진행했을 때, 분석에 사용한 리텐션 분석을 기록하려고 한다!
구매 고객을 VIP, 일반 고객 두 그룹으로 나누어 리텐션 분석을 진행했다.
WITH Avg_price AS (
SELECT AVG(total_purchase) AS avg_price
FROM (
SELECT user_id, SUM(price_usd) AS total_purchase
FROM orders
GROUP BY user_id
) AS user_purchase
),
VIP AS (
SELECT o.user_id, SUM(o.price_usd) AS total_purchase, a.avg_price
FROM orders o
CROSS JOIN Avg_price a
GROUP BY o.user_id, a.avg_price
HAVING SUM(o.price_usd) >= 1.5 * a.avg_price
),
고객의 주문건을 기준으로 리텐션을 계산
첫 주문달을 기준으로 첫 달에 구매한 고객 수 집계
첫 주문 달에도 구매하고 두 번째 달에도 구매한 고객 수 집계
계속 이렇게 진행하면서 마지막 구매 달까지 리텐션을 계산한다
first_order AS (
SELECT DISTINCT user_id,
CASE WHEN user_id IN (SELECT user_id
FROM vip) THEN 'VIP'
ELSE 'Regular' END AS 'User_Segment', min(created_at) AS first_ord
FROM orders
GROUP BY 1
)
order_month AS (SELECT DISTINCT fo.user_id, User_segment,
CASE WHEN created_at = date_format( date_format( first_ord, '%Y-%m-01'), '%Y-%m-01') THEN 0
WHEN created_at > date_format( date_format( first_ord, '%Y-%m-01'), '%Y-%m-01') AND created_at <= date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 1 month) THEN 1
WHEN created_at > date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 1 month) AND created_at <= date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 2 month) THEN 2
WHEN created_at > date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 2 month) AND created_at <= date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 3 month) THEN 3
WHEN created_at > date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 3 month) AND created_at <= date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 4 month) THEN 4
WHEN created_at > date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 4 month) AND created_at <= date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 5 month) THEN 5
WHEN created_at > date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 5 month) AND created_at <= date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 6 month) THEN 6
WHEN created_at > date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 6 month) AND created_at <= date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 7 month) THEN 7
WHEN created_at > date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 7 month) AND created_at <= date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 8 month) THEN 8
WHEN created_at > date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 8 month) AND created_at <= date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 9 month) THEN 9
WHEN created_at > date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 9 month) AND created_at <= date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 10 month) THEN 10
WHEN created_at > date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 10 month) AND created_at <= date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 11 month) THEN 11
WHEN created_at > date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 11 month) AND created_at <= date_add(date_format( first_ord, '%Y-%m-01'), INTERVAL 12 month) THEN 12
END AS month_num
FROM first_order fo JOIN orders o ON fo.user_id = o.user_id
ORDER BY month_num desc
)
seq_table AS (
SELECT user_id, User_segment, month_num
FROM order_month
)
SELECT User_segment,
CASE WHEN month_num = 0 THEN 'm-0'
WHEN month_num = 1 THEN 'm-1'
WHEN month_num = 2 THEN 'm-2'
WHEN month_num = 3 THEN 'm-3'
WHEN month_num = 4 THEN 'm-4'
WHEN month_num = 5 THEN 'm-5'
WHEN month_num = 6 THEN 'm-6'
WHEN month_num = 7 THEN 'm-7'
WHEN month_num = 8 THEN 'm-8'
WHEN month_num = 9 THEN 'm-9'
WHEN month_num = 10 THEN 'm-10'
WHEN month_num = 11 THEN 'm-11'
WHEN month_num = 12 THEN 'm-12'
ELSE 'unknown'
END AS month_range,
count(user_id) user_cnt
FROM seq_table
GROUP BY 1,2
ORDER BY 1,2;
고객 구분 | M-0 | M-1 | M-2 | M-3 | M-4 |
---|---|---|---|---|---|
VIP | 28,032 | 7 | 4 | 1 | 0 |
일반 고객 | 3,664 | 280 | 150 | 40 | 2 |
결론:
첫 구매 이후 지속적인 유지율이 낮다
일반 고객 대비 VIP 고객읠 평균 구매 빈도가 높지만, 시간이 지나며 유지율이 감소한다.