https://leetcode.com/problems/find-product-recommendation-pairs/description/
Table: ProductPurchases
+-------------+------+
| Column Name | Type |
+-------------+------+
| user_id | int |
| product_id | int |
| quantity | int |
+-------------+------+
(user_id, product_id) is the unique key for this table.
Each row represents a purchase of a product by a user in a specific quantity.
Table: ProductInfo
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| product_id | int |
| category | varchar |
| price | decimal |
+-------------+---------+
product_id is the primary key for this table.
Each row assigns a category and price to a product.
Amazon wants to implement the Customers who bought this also bought... feature based on co-purchase patterns. Write a solution to :
Identify distinct product pairs frequently purchased together by the same customers (where product1_id < product2_id)
For each product pair, determine how many customers purchased both products
A product pair is considered for recommendation if at least 3 different customers have purchased both products.
Return the result table ordered by customer_count in descending order, and in case of a tie, by product1_id in ascending order, and then by product2_id in ascending order.
The result format is in the following example.
Example:
Input:
ProductPurchases table:
+---------+------------+----------+
| user_id | product_id | quantity |
+---------+------------+----------+
| 1 | 101 | 2 |
| 1 | 102 | 1 |
| 1 | 103 | 3 |
| 2 | 101 | 1 |
| 2 | 102 | 5 |
| 2 | 104 | 1 |
| 3 | 101 | 2 |
| 3 | 103 | 1 |
| 3 | 105 | 4 |
| 4 | 101 | 1 |
| 4 | 102 | 1 |
| 4 | 103 | 2 |
| 4 | 104 | 3 |
| 5 | 102 | 2 |
| 5 | 104 | 1 |
+---------+------------+----------+
ProductInfo table:
+------------+-------------+-------+
| product_id | category | price |
+------------+-------------+-------+
| 101 | Electronics | 100 |
| 102 | Books | 20 |
| 103 | Clothing | 35 |
| 104 | Kitchen | 50 |
| 105 | Sports | 75 |
+------------+-------------+-------+
Output:
+-------------+-------------+-------------------+-------------------+----------------+
| product1_id | product2_id | product1_category | product2_category | customer_count |
+-------------+-------------+-------------------+-------------------+----------------+
| 101 | 102 | Electronics | Books | 3 |
| 101 | 103 | Electronics | Clothing | 3 |
| 102 | 104 | Books | Kitchen | 3 |
+-------------+-------------+-------------------+-------------------+----------------+
Explanation:
Product pair (101, 102):
Purchased by users 1, 2, and 4 (3 customers)
Product 101 is in Electronics category
Product 102 is in Books category
Product pair (101, 103):
Purchased by users 1, 3, and 4 (3 customers)
Product 101 is in Electronics category
Product 103 is in Clothing category
Product pair (102, 104):
Purchased by users 2, 4, and 5 (3 customers)
Product 102 is in Books category
Product 104 is in Kitchen category
The result is ordered by customer_count in descending order. For pairs with the same customer_count, they are ordered by product1_id and then product2_id in ascending order.
WITH co_p as (
SELECT
p1.product_id as product1_id,
p2.product_id as product2_id,
COUNT(*) as cnt
FROM ProductPurchases p1, ProductPurchases p2
WHERE p1.product_id != p2.product_id
and p1.user_id = p2.user_id
and p1.product_id < p2.product_id
GROUP BY 1,2
having cnt >=3
)
SELECT c.product1_id, c.product2_id, p1.category as product1_category,
p2.category as product2_category, cnt as customer_count
FROM ProductInfo p1 JOIN co_p c
ON p1.product_id = c.product1_id
JOIN ProductInfo p2
ON c.product2_id = p2.product_id
ORDER BY customer_count DESC, c.product1_id, c.product2_id
방식
장점
단점
적합한 경우
SELECT
P1.product_id AS product1_id,
P2.product_id AS product2_id,
PI1.category AS product1_category,
PI2.category AS product2_category,
COUNT(P1.user_id) AS customer_count
FROM ProductPurchases P1
INNER JOIN ProductPurchases P2 ON P1.user_id=P2.user_id AND P1.product_id<P2.product_id
LEFT JOIN ProductInfo PI1 ON P1.product_id=PI1.product_id
LEFT JOIN ProductInfo PI2 ON P2.product_id=PI2.product_id
GROUP BY product1_id,product2_id
HAVING COUNT(P1.user_id)>=3
ORDER BY customer_count DESC,product1_id,product2_id
방식
장점
단점
적합한 경우
# Write your MySQL query statement below
with cte as (select a.product_id as product1_id,b.product_id as product2_id,count( a.user_id) as customer_count
from ProductPurchases a join ProductPurchases b on a.user_id=b.user_id and a.product_id<b.product_id
group by a.product_id,b.product_id
having count( a.user_id)>=3)
select product1_id, product2_id,e.category as product1_category,f.category as product2_category,customer_count
from cte
left join ProductInfo e on cte.product1_id=e.product_id
left join ProductInfo f on cte.product2_id=f.product_id
order by customer_count desc,product1_id,product2_id
방식
장점
단점
적합한 경우
처음 co_p CTE를 만들고 나서 각 product1_id,product2_id를 구해 놨는데,
ProductInfo table 와 어떻게 join 해야 할지 몰라서 멈칫하게 되었다.
두 번 JOIN 하면 되는 상황에서 항상 이렇게 막힌다. 한 번에 안 되는 부분인지, 되는 부분인지 아직까지 파악하기 힘들다.처음 ProductPurchases table에 ProductInfo talbe을 JOIN시켜놓고 Output 형태를 만드는 것이 좋을지, co_p CTE를 만든 뒤에 JOIN 할지 고민했었다. 일반적으로는 GROUP by 먼저.