2018년 10월, 2019년 10월 한 달 동안 있었던 자전거 대여/반납의 차
bike_id, 대여 정류소 id, 반납 정류소 id가 한 행에 있을 때, 사건 수를 어떻게 세야 할까?
bike_id에 대해서 그룹화를 해서 count한다고 하면 총 사건 수에 대한 의미가 없고,
대여 정류소 id나 반납 정류소 id에 대해서 그룹화를 한다고 하면, 반대쪽 사건에 대한 값 의미가 퇴색될 것 같고..
CASE WHEN으로 구분해서 하면?
SELECT rent_station_id,
COUNT(CASE WHEN YEAR(rent_at) = 2018 THEN 1 END) as rent_2018,
COUNT(CASE WHEN YEAR(rent_at) = 2019 THEN 1 END) as rent_2019
FROM rental_history
WHERE (rent_at >= '2018-10-01' AND rent_at < '2018-11-01') OR
(rent_at >= '2019-10-01' AND rent_at < '2019-11-01')
GROUP BY
rent_station_id
...다시 생각할 필요가 있다.
그런데 solvesql은 문제 난이도가 이상한 것 같다. 난이도4,5는 오히려 바로 풀리는 경우가 많은데 난이도 3이 제일 어려운 것 같다.
WITH rent_cnt AS (
SELECT rent_station_id, DATE_FORMAT(rent_at, '%Y-%m') as dt, COUNT(*) as rent_cnt
FROM rental_history
WHERE (rent_at >= '2018-10-01' AND rent_at < '2018-11-01') OR
(rent_at >= '2019-10-01' AND rent_at < '2019-11-01')
GROUP BY
rent_station_id, DATE_FORMAT(rent_at, '%Y-%m')
), return_cnt AS (
SELECT return_station_id, DATE_FORMAT(return_at, '%Y-%m') as dt, COUNT(*) as return_cnt
FROM rental_history
WHERE (return_at >= '2018-10-01' AND return_at < '2018-11-01') OR
(return_at_at >= '2019-10-01' AND return_at < '2019-11-01')
)
2018년 10월 또는 2019년 10월 한 달간 대여/반납 건수가 0건인 정류소는 이미 폐쇄된 정류소이거나 새로이 생긴 정류소 일 수 있으므로 쿼리 결과에 포함되지 않도록 해주세요
-> rent_at, return_at으로 그룹화를 한 번 더 진행해서 각 연도의 cnt를 나타나게 한 다음에 JOIN할 때, 0인 값들은 아예 테이블에 포함되지 않도록 한다.
일단, 아예 필터링되도록 해놓고 필요하다고 하면 CASE WHEN으로 값을 0으로 처리시키는 방법으로 진행한다.
-> 원래는 rent_cnt랑 return_cnt 테이블을 조인시키려고 했는데, 각 테이블에 포함되어있지 않은 정류소 ID가 존재할 수 있어서 모든 정류소 ID가 있는 station 테이블과 조인시키는 게 맞겠다.
WITH cnt AS (
SELECT rent_station_id as id, DATE_FORMAT(rent_at, '%Y-%m') as dt, COUNT(*) as rent_cnt
FROM rental_history
WHERE (rent_at >= '2018-10-01' AND rent_at < '2018-11-01') OR
(rent_at >= '2019-10-01' AND rent_at < '2019-11-01')
GROUP BY
rent_station_id, DATE_FORMAT(rent_at, '%Y-%m')
UNION ALL
SELECT return_station_id as id, DATE_FORMAT(return_at, '%Y-%m') as dt, COUNT(*) as return_cnt
FROM rental_history
WHERE (return_at >= '2018-10-01' AND return_at < '2018-11-01') OR
(return_at >= '2019-10-01' AND return_at < '2019-11-01')
GROUP BY
return_station_id, DATE_FORMAT(return_at, '%Y-%m')
)
SELECT * from cnt
WITH cnt AS (
SELECT rent_station_id as id, DATE_FORMAT(rent_at, '%Y-%m') as dt, COUNT(*) as cnt
FROM rental_history
WHERE (rent_at >= '2018-10-01' AND rent_at < '2018-11-01') OR
(rent_at >= '2019-10-01' AND rent_at < '2019-11-01')
GROUP BY
rent_station_id, DATE_FORMAT(rent_at, '%Y-%m')
UNION ALL
SELECT return_station_id as id, DATE_FORMAT(return_at, '%Y-%m') as dt, COUNT(*) as cnt
FROM rental_history
WHERE (return_at >= '2018-10-01' AND return_at < '2018-11-01') OR
(return_at >= '2019-10-01' AND return_at < '2019-11-01')
GROUP BY
return_station_id, DATE_FORMAT(return_at, '%Y-%m')
)
SELECT s.station_id, s.name, s.local,
ROUND( SUM(IF(dt='2019-10'), cnt) / SUM(IF(dt='2018-10'), cnt) * 100, 2) as usage_pct
FROM station s INNER JOIN cnt c
ON s.station_id = c.id
GROUP BY
s.station_id
HAVING
SUM(IF(dt='2018-10'), cnt) > 0 AND SUM(IF(dt='2019-10'), cnt) > 0
AND (SUM(IF(dt='2019-10'), cnt) / SUM(IF(dt='2018-10'), cnt)) <= 0.5
Error 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '), cnt, NULL) / SUM(IF(dt='2018-10'), cnt, NULL) * 100, 2) as usage_pct FROM ' at line 19
괄호 위치
WITH cnt AS (
SELECT rent_station_id as id, DATE_FORMAT(rent_at, '%Y-%m') as dt, COUNT(*) as cnt
FROM rental_history
WHERE (rent_at >= '2018-10-01' AND rent_at < '2018-11-01') OR
(rent_at >= '2019-10-01' AND rent_at < '2019-11-01')
GROUP BY
rent_station_id, DATE_FORMAT(rent_at, '%Y-%m')
UNION ALL
SELECT return_station_id as id, DATE_FORMAT(return_at, '%Y-%m') as dt, COUNT(*) as cnt
FROM rental_history
WHERE (return_at >= '2018-10-01' AND return_at < '2018-11-01') OR
(return_at >= '2019-10-01' AND return_at < '2019-11-01')
GROUP BY
return_station_id, DATE_FORMAT(return_at, '%Y-%m')
)
SELECT s.station_id, s.name, s.local,
ROUND( SUM(IF(dt='2019-10', cnt, 0)) / SUM(IF(dt='2018-10', cnt, 0)) * 100, 2) as usage_pct
FROM station s INNER JOIN cnt c
ON s.station_id = c.id
GROUP BY
s.station_id
HAVING
SUM(IF(dt='2018-10', cnt, 0)) > 0 AND SUM(IF(dt='2019-10', cnt, 0)) > 0
AND (SUM(IF(dt='2019-10', cnt, 0)) / SUM(IF(dt='2018-10', cnt, 0))) <= 0.5
대여, 렌탈의 합이니까 INNER JOIN 보다는 UNION ALL(중복값은 그대로 유지되도록 UNION은 쓰지 말고)로 진행한다. 그리고 station_id에 rent_station_id나 rental_station_id가 없는 값이 있을 수도 있으니까 LEFT JOIN으로 할까 했는데, INNER JOIN으로도 충분했다. 그리고 없는 값이 있으면 어차피 대여/반납 건 수 합이 0이기 때문에 알아서 필터링됐을 것이다.