[SQL_Q] 3521. Find Product Recommendation Pairs

Hyunjun Kim·2025년 9월 3일
0

SQL

목록 보기
79/90

https://leetcode.com/problems/find-product-recommendation-pairs/description/

문제

Table: ProductPurchases

+-------------+------+
| Column Name | Type | 
+-------------+------+
| user_id     | int  |
| product_id  | int  |
| quantity    | int  |
+-------------+------+
(user_id, product_id) is the unique key for this table.
Each row represents a purchase of a product by a user in a specific quantity.
Table: ProductInfo

+-------------+---------+
| Column Name | Type    | 
+-------------+---------+
| product_id  | int     |
| category    | varchar |
| price       | decimal |
+-------------+---------+
product_id is the primary key for this table.
Each row assigns a category and price to a product.
Amazon wants to implement the Customers who bought this also bought... feature based on co-purchase patterns. Write a solution to :

Identify distinct product pairs frequently purchased together by the same customers (where product1_id < product2_id)
For each product pair, determine how many customers purchased both products
A product pair is considered for recommendation if at least 3 different customers have purchased both products.

Return the result table ordered by customer_count in descending order, and in case of a tie, by product1_id in ascending order, and then by product2_id in ascending order.

The result format is in the following example.

 

Example:

Input:

ProductPurchases table:

+---------+------------+----------+
| user_id | product_id | quantity |
+---------+------------+----------+
| 1       | 101        | 2        |
| 1       | 102        | 1        |
| 1       | 103        | 3        |
| 2       | 101        | 1        |
| 2       | 102        | 5        |
| 2       | 104        | 1        |
| 3       | 101        | 2        |
| 3       | 103        | 1        |
| 3       | 105        | 4        |
| 4       | 101        | 1        |
| 4       | 102        | 1        |
| 4       | 103        | 2        |
| 4       | 104        | 3        |
| 5       | 102        | 2        |
| 5       | 104        | 1        |
+---------+------------+----------+
ProductInfo table:

+------------+-------------+-------+
| product_id | category    | price |
+------------+-------------+-------+
| 101        | Electronics | 100   |
| 102        | Books       | 20    |
| 103        | Clothing    | 35    |
| 104        | Kitchen     | 50    |
| 105        | Sports      | 75    |
+------------+-------------+-------+
Output:

+-------------+-------------+-------------------+-------------------+----------------+
| product1_id | product2_id | product1_category | product2_category | customer_count |
+-------------+-------------+-------------------+-------------------+----------------+
| 101         | 102         | Electronics       | Books             | 3              |
| 101         | 103         | Electronics       | Clothing          | 3              |
| 102         | 104         | Books             | Kitchen           | 3              |
+-------------+-------------+-------------------+-------------------+----------------+
Explanation:

Product pair (101, 102):
Purchased by users 1, 2, and 4 (3 customers)
Product 101 is in Electronics category
Product 102 is in Books category
Product pair (101, 103):
Purchased by users 1, 3, and 4 (3 customers)
Product 101 is in Electronics category
Product 103 is in Clothing category
Product pair (102, 104):
Purchased by users 2, 4, and 5 (3 customers)
Product 102 is in Books category
Product 104 is in Kitchen category
The result is ordered by customer_count in descending order. For pairs with the same customer_count, they are ordered by product1_id and then product2_id in ascending order.

내 쿼리

WITH co_p as (
    SELECT
    p1.product_id as product1_id, 
    p2.product_id as product2_id,
    COUNT(*) as cnt
    FROM ProductPurchases p1, ProductPurchases p2
    WHERE p1.product_id != p2.product_id
    and p1.user_id = p2.user_id
    and p1.product_id < p2.product_id
    GROUP BY 1,2
    having cnt >=3
)
SELECT c.product1_id, c.product2_id, p1.category as product1_category,
p2.category as product2_category, cnt as customer_count
FROM ProductInfo p1 JOIN co_p c
ON p1.product_id = c.product1_id
JOIN ProductInfo p2
ON c.product2_id = p2.product_id
ORDER BY customer_count DESC, c.product1_id, c.product2_id
  • 방식

    • CTE 후 Join
  • 장점

    • Product pair 집계 후 Join → Join 대상 row 줄음
  • 단점

    • 중간 단계 결과 크면 메모리/I/O 부담
  • 적합한 경우

    • ProductPurchases 매우 크고, ProductInfo는 작을 때

다른 사람 쿼리

SELECT
P1.product_id AS product1_id,
P2.product_id AS product2_id,
PI1.category AS product1_category,
PI2.category AS product2_category,
COUNT(P1.user_id) AS customer_count
FROM ProductPurchases P1
INNER JOIN ProductPurchases P2 ON P1.user_id=P2.user_id AND P1.product_id<P2.product_id
LEFT JOIN ProductInfo PI1 ON P1.product_id=PI1.product_id
LEFT JOIN ProductInfo PI2 ON P2.product_id=PI2.product_id
GROUP BY product1_id,product2_id
HAVING COUNT(P1.user_id)>=3
ORDER BY customer_count DESC,product1_id,product2_id
  • 방식

    • 한 번에 Join + Group by
  • 장점

    • SQL 간결, 옵티마이저 최적화 가능
  • 단점

    • group by 전 ProductInfo join → 불필요한 join 발생 가능
  • 적합한 경우

    • ProductInfo 작고, ProductPurchases 크더라도 join 최적화가 잘 되는 DB

다른 사람 쿼리2

# Write your MySQL query statement below
with cte as (select a.product_id as product1_id,b.product_id as product2_id,count( a.user_id) as customer_count
from ProductPurchases a join ProductPurchases b on a.user_id=b.user_id and a.product_id<b.product_id
group by a.product_id,b.product_id
having count( a.user_id)>=3)

select product1_id, product2_id,e.category  as product1_category,f.category  as product2_category,customer_count
from cte
left join ProductInfo e on cte.product1_id=e.product_id
left join ProductInfo  f on cte.product2_id=f.product_id
order by customer_count desc,product1_id,product2_id
  • 방식

    • CTE + 두 번 Join
  • 장점

    • 구조적으로 명확, Join 대상 줄어듦
  • 단점

    • MySQL은 CTE materialization 성능 손해
  • 적합한 경우

    • PostgreSQL/SQL Server 환경, ProductInfo 크더라도 안전하게

최적화된 패턴 요약

  1. GROUP BY 먼저 (큰 테이블 줄이기).
  2. DISTINCT user_id 필수 → 중복 방지.
  3. CTE 대신 인라인 서브쿼리 (MySQL 최적화).
  4. user_id, product_id 인덱스.
    5.작은 테이블(ProductInfo)은 나중에 JOIN.

처음 co_p CTE를 만들고 나서 각 product1_id,product2_id를 구해 놨는데,
ProductInfo table 와 어떻게 join 해야 할지 몰라서 멈칫하게 되었다.
두 번 JOIN 하면 되는 상황에서 항상 이렇게 막힌다. 한 번에 안 되는 부분인지, 되는 부분인지 아직까지 파악하기 힘들다.

처음 ProductPurchases table에 ProductInfo talbe을 JOIN시켜놓고 Output 형태를 만드는 것이 좋을지, co_p CTE를 만든 뒤에 JOIN 할지 고민했었다. 일반적으로는 GROUP by 먼저.

profile
Data Analytics Engineer 가 되

0개의 댓글