Table: Movies
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| movie_id | int |
| title | varchar |
+---------------+---------+
movie_id is the primary key (column with unique values) for this table.
title is the name of the movie.
Table: Users
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| name | varchar |
+---------------+---------+
user_id is the primary key (column with unique values) for this table.
The column 'name' has unique values.
Table: MovieRating
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| movie_id | int |
| user_id | int |
| rating | int |
| created_at | date |
+---------------+---------+
(movie_id, user_id) is the primary key (column with unique values) for this table.
This table contains the rating of a movie by a user in their review.
created_at is the user's review date.
Write a solution to:
Find the name of the user who has rated the greatest number of movies. In case of a tie, return the lexicographically smaller user name.
Find the movie name with the highest average rating in February 2020. In case of a tie, return the lexicographically smaller movie name.
The result format is in the following example.
Example 1:
Input:
Movies table:
+-------------+--------------+
| movie_id | title |
+-------------+--------------+
| 1 | Avengers |
| 2 | Frozen 2 |
| 3 | Joker |
+-------------+--------------+
Users table:
+-------------+--------------+
| user_id | name |
+-------------+--------------+
| 1 | Daniel |
| 2 | Monica |
| 3 | Maria |
| 4 | James |
+-------------+--------------+
MovieRating table:
+-------------+--------------+--------------+-------------+
| movie_id | user_id | rating | created_at |
+-------------+--------------+--------------+-------------+
| 1 | 1 | 3 | 2020-01-12 |
| 1 | 2 | 4 | 2020-02-11 |
| 1 | 3 | 2 | 2020-02-12 |
| 1 | 4 | 1 | 2020-01-01 |
| 2 | 1 | 5 | 2020-02-17 |
| 2 | 2 | 2 | 2020-02-01 |
| 2 | 3 | 2 | 2020-03-01 |
| 3 | 1 | 3 | 2020-02-22 |
| 3 | 2 | 4 | 2020-02-25 |
+-------------+--------------+--------------+-------------+
Output:
+--------------+
| results |
+--------------+
| Daniel |
| Frozen 2 |
+--------------+
Explanation:
Daniel and Monica have rated 3 movies ("Avengers", "Frozen 2" and "Joker") but Daniel is smaller lexicographically.
Frozen 2 and Joker have a rating average of 3.5 in February but Frozen 2 is smaller lexicographically.
with top_cnt_uer as (
SELECT u.name as results
FROM MovieRating m JOIN Users u
ON m.user_id = u.user_id
GROUP BY m.user_id
ORDER BY COUNT(*) DESC, results
LIMIT 1
),top_rate_mv as(
SELECT m.title
FROM (
SELECT movie_id, user_id, rating
FROM MovieRating
WHERE created_at >= '2020-02-01' and created_at < '2020-03-01'
) f join Movies m
on f.movie_id = m.movie_id
GROUP BY f.movie_id
ORDER BY avg(f.rating) DESC, m.title
LIMIT 1
)
SELECT * FROM top_cnt_uer
UNION ALL
SELECT * FROM top_rate_mv
(
SELECT u.name AS results
FROM MovieRating mr
JOIN Users u ON mr.user_id = u.user_id
GROUP BY u.name
ORDER BY COUNT(*) DESC, u.name
LIMIT 1
)
UNION ALL
(
SELECT m.title AS results
FROM MovieRating mr
JOIN Movies m ON mr.movie_id = m.movie_id
WHERE DATE_FORMAT(mr.created_at, '%Y-%m') = '2020-02'
GROUP BY m.title
ORDER BY AVG(mr.rating) DESC, m.title
LIMIT 1
);
UNION 에서는 LIMIT을 사용할 수 없다는 것은 알고 있었는데 ORDER BY 도 사용 못하는 건 까먹고 있었다.
UNION은 두 SELECT의 결과를 합친 후 하나의 테이블처럼 취급하므로, 개별 SELECT에 ORDER BY/LIMIT을 둘 수 없다.
with으로 만든 Common Table Expression 방법만 생각이 났었는데,
SELECT 문을 괄호로 묶으면 union이 되는지 이제 알았다.. 재사용성 & 성능 문제 없이 빠르게 테스트 해봐야 할 때 괜찮은 방법인 것 같다.
with top_cnt_usr as (
SELECT u.name as results
FROM Users u JOIN MovieRating m
ON m.user_id = u.user_id
GROUP BY m.user_id
ORDER BY COUNT(*) DESC, results
LIMIT 1
),
top_rate_mv as(
SELECT m.title
FROM Movies m JOIN MovieRating mr
on mr.created_at >= '2020-02-01' AND mr.created_at < '2020-03-01'
and mr.movie_id = m.movie_id
GROUP BY mr.movie_id
ORDER BY avg(mr.rating) DESC, m.title
LIMIT 1
)
SELECT * FROM top_cnt_usr
UNION ALL
SELECT * FROM top_rate_mv