https://leetcode.com/problems/find-category-recommendation-pairs/description/
sqlTable: ProductPurchases
+-------------+------+
| Column Name | Type |
+-------------+------+
| user_id | int |
| product_id | int |
| quantity | int |
+-------------+------+
(user_id, product_id) is the unique identifier for this table.
Each row represents a purchase of a product by a user in a specific quantity.
Table: ProductInfo
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| product_id | int |
| category | varchar |
| price | decimal |
+-------------+---------+
product_id is the unique identifier for this table.
Each row assigns a category and price to a product.
Amazon wants to understand shopping patterns across product categories. Write a solution to:
Find all category pairs (where category1 < category2)
For each category pair, determine the number of unique customers who purchased products from both categories
A category pair is considered reportable if at least 3 different customers have purchased products from both categories.
Return the result table of reportable category pairs ordered by customer_count in descending order, and in case of a tie, by category1 in ascending order lexicographically, and then by category2 in ascending order.
The result format is in the following example.
Example:
Input:
ProductPurchases table:
+---------+------------+----------+
| user_id | product_id | quantity |
+---------+------------+----------+
| 1 | 101 | 2 |
| 1 | 102 | 1 |
| 1 | 201 | 3 |
| 1 | 301 | 1 |
| 2 | 101 | 1 |
| 2 | 102 | 2 |
| 2 | 103 | 1 |
| 2 | 201 | 5 |
| 3 | 101 | 2 |
| 3 | 103 | 1 |
| 3 | 301 | 4 |
| 3 | 401 | 2 |
| 4 | 101 | 1 |
| 4 | 201 | 3 |
| 4 | 301 | 1 |
| 4 | 401 | 2 |
| 5 | 102 | 2 |
| 5 | 103 | 1 |
| 5 | 201 | 2 |
| 5 | 202 | 3 |
+---------+------------+----------+
ProductInfo table:
+------------+-------------+-------+
| product_id | category | price |
+------------+-------------+-------+
| 101 | Electronics | 100 |
| 102 | Books | 20 |
| 103 | Books | 35 |
| 201 | Clothing | 45 |
| 202 | Clothing | 60 |
| 301 | Sports | 75 |
| 401 | Kitchen | 50 |
+------------+-------------+-------+
Output:
+-------------+-------------+----------------+
| category1 | category2 | customer_count |
+-------------+-------------+----------------+
| Books | Clothing | 3 |
| Books | Electronics | 3 |
| Clothing | Electronics | 3 |
| Electronics | Sports | 3 |
+-------------+-------------+----------------+
Explanation:
Books-Clothing:
User 1 purchased products from Books (102) and Clothing (201)
User 2 purchased products from Books (102, 103) and Clothing (201)
User 5 purchased products from Books (102, 103) and Clothing (201, 202)
Total: 3 customers purchased from both categories
Books-Electronics:
User 1 purchased products from Books (102) and Electronics (101)
User 2 purchased products from Books (102, 103) and Electronics (101)
User 3 purchased products from Books (103) and Electronics (101)
Total: 3 customers purchased from both categories
Clothing-Electronics:
User 1 purchased products from Clothing (201) and Electronics (101)
User 2 purchased products from Clothing (201) and Electronics (101)
User 4 purchased products from Clothing (201) and Electronics (101)
Total: 3 customers purchased from both categories
Electronics-Sports:
User 1 purchased products from Electronics (101) and Sports (301)
User 3 purchased products from Electronics (101) and Sports (301)
User 4 purchased products from Electronics (101) and Sports (301)
Total: 3 customers purchased from both categories
Other category pairs like Clothing-Sports (only 2 customers: Users 1 and 4) and Books-Kitchen (only 1 customer: User 3) have fewer than 3 shared customers and are not included in the result.
The result is ordered by customer_count in descending order. Since all pairs have the same customer_count of 3, they are ordered by category1 (then category2) in ascending order.
With pair_c as (
SELECT distinct p1.product_id as pid1, p2.product_id as pid2,
count(*) as cnt
FROM ProductPurchases p1, ProductPurchases p2
where p1.user_id = p2.user_id
and p1.product_id != p2.product_id
and p1.product_id < p2.product_id
GROUP BY 1,2
)
pair_c 출력 결과
| pid1 | pid2 | cnt |
| ---- | ---- | --- |
| 101 | 102 | 2 |
| 102 | 201 | 3 |
| 101 | 201 | 3 |
| 201 | 301 | 2 |
| 102 | 301 | 1 |
| 101 | 301 | 3 |
| 102 | 103 | 2 |
| 101 | 103 | 2 |
| 103 | 201 | 2 |
| 103 | 301 | 1 |
| 301 | 401 | 2 |
| 103 | 401 | 1 |
| 101 | 401 | 2 |
| 201 | 401 | 1 |
| 201 | 202 | 1 |
| 103 | 202 | 1 |
| 102 | 202 | 1 |
이렇게 나왔다. 여기서는
pid1 가 101 (Electronics) ,
pid2 가 102 or 103 (Books)
인 부분을 봐야 하는데
| pid1 | pid2 | cnt |
|---|---|---|
| 101 | 102 | 2 |
| 101 | 103 | 2 |
이 두 결과가 있다. 이 두 결과를 sum() 을 하게 된다면 4로, expected output에서 원하는 customer_count 와 다른 값이 나오게 된다.
Expected output
category1 category2 customer_count Books Clothing 3 Books Electronics 3 Clothing Electronics 3 Electronics Sports 3
이렇게 되는 이유는 Input 인
ProductPurchases 테이블에서 볼 수 있는데,
product_id 에서 101,102,103 이 있는 rows 들을 가져와 보면 짐작할 수 있다.
| user_id | product_id | quantity |
|---|---|---|
| 1 | 101 | 2 |
| 1 | 102 | 1 |
| 2 | 101 | 1 |
| 2 | 102 | 2 |
| 2 | 103 | 1 |
| 3 | 101 | 2 |
| 3 | 103 | 1 |
| 4 | 101 | 1 |
| 5 | 102 | 2 |
| 5 | 103 | 1 |
product_id에서 101 을 가지고 있으면서 102 또는 103을 가지고 있는 user_id는 1, 2, 3 으로
Expected output 에서 요구하는 3이 도출되어야 하는데,
user_id : 2 인 rows 에서
product_id 101, 102 / 101, 103 의 결과 두개가 나와서 count에서는 내가 원했던 4의 결과가 나오지 않았던 것 같다.