[SQL_Q] 585. Investments in 2016

Hyunjun Kim·2025년 7월 23일
0

SQL

목록 보기
60/90

https://leetcode.com/problems/investments-in-2016/description/

문제

Table: Insurance

+-------------+-------+
| Column Name | Type  |
+-------------+-------+
| pid         | int   |
| tiv_2015    | float |
| tiv_2016    | float |
| lat         | float |
| lon         | float |
+-------------+-------+
pid is the primary key (column with unique values) for this table.
Each row of this table contains information about one policy where:
pid is the policyholder's policy ID.
tiv_2015 is the total investment value in 2015 and tiv_2016 is the total investment value in 2016.
lat is the latitude of the policy holder's city. It's guaranteed that lat is not NULL.
lon is the longitude of the policy holder's city. It's guaranteed that lon is not NULL.
 

Write a solution to report the sum of all total investment values in 2016 tiv_2016, for all policyholders who:

have the same tiv_2015 value as one or more other policyholders, and
are not located in the same city as any other policyholder (i.e., the (lat, lon) attribute pairs must be unique).
Round tiv_2016 to two decimal places.

The result format is in the following example.

 

Example 1:

Input: 
Insurance table:
+-----+----------+----------+-----+-----+
| pid | tiv_2015 | tiv_2016 | lat | lon |
+-----+----------+----------+-----+-----+
| 1   | 10       | 5        | 10  | 10  |
| 2   | 20       | 20       | 20  | 20  |
| 3   | 10       | 30       | 20  | 20  |
| 4   | 10       | 40       | 40  | 40  |
+-----+----------+----------+-----+-----+
Output: 
+----------+
| tiv_2016 |
+----------+
| 45.00    |
+----------+
Explanation: 
The first record in the table, like the last record, meets both of the two criteria.
The tiv_2015 value 10 is the same as the third and fourth records, and its location is unique.

The second record does not meet any of the two criteria. Its tiv_2015 is not like any other policyholders and its location is the same as the third record, which makes the third record fail, too.
So, the result is the sum of tiv_2016 of the first and last record, which is 45.

내 풀이

WITH a as (
	SELECT *
	FROM Insurance
	GROUP BY lat,lon
	having count(*) = 1
),
b as(
	SELECT pid
	FROM Insurance
	WHERE tiv_2015 not in (
    	SELECT tiv_2015
	    FROM Insurance
	    GROUP BY tiv_2015
	    having count(*)=1
	)
)
SELECT round(SUM(tiv_2016),2) as tiv_2016
FROM a join b
on a.pid = b.pid
  • NOT IN (SELECT tiv_2015 ...) 서브쿼리는 NULL 이 포함되면 전체 결과 NULL 이 될 수 있음 (예외적 상황)
  • GROUP BY lat, lon 에서 SELECT * 사용 → MySQL의 GROUP BY 확장에 의존
    • 표준 SQL이 아님, 실행 계획의 예측이 어려움
  • JOIN을 사용할 경우 MySQL은 임시 테이블을 만들 가능성이 있으며, 인덱스 최적화를 안 할 수 있음

다른 사람 풀이

with tiv as (
    select tiv_2015 from insurance 
    group by tiv_2015 having count(*) > 1
),
latlon as (
    select * from insurance 
    group by lat, lon having count(*) = 1 
)

select round(sum(tiv_2016 ), 2) as tiv_2016  from latlon where tiv_2015 in (select tiv_2015 from tiv)
  • IN + 서브쿼리 방식이 간결함
  • JOIN 없이 필터 조건 직접 사용 → 불필요한 조인 줄여서 실행 계획 단순
  • WHERE tiv_2015 IN (...) 는 NULL-safe
  • MySQL 옵티마이저가 WHERE IN 조건을 더 효과적으로 PushDown 할 가능성이 높음

결론

성능과 안정성 측면에서 다른 사람의 풀이가 우수하다

  • JOIN보다 WHERE IN 필터링이 일반적으로 더 빠름
  • NOT IN 은 예외 처리(특히 NULL)가 까다로움
  • JOIN 은 불필요한 메모리 사용 및 인덱스 비활성화 가능성 있음

수정본

with valid_tiv_2015 as(
SELECT tiv_2015
FROM Insurance
GROUP BY tiv_2015
having count(*) > 1
), 
valid_location as (
SELECT *
FROM Insurance
GROUP BY lat,lon
having count(*) = 1
)
SELECT ROUND(SUM(tiv_2016),2) as tiv_2016
FROM valid_location
WHERE tiv_2015 in ( SELECT tiv_2015 FROM valid_tiv_2015)
profile
Data Analytics Engineer 가 되

0개의 댓글