필터링 최적화

한상우·2024년 9월 11일

SQL

목록 보기

7/8

584. Find Customer Referee

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| name        | varchar |
| referee_id  | int     |
+-------------+---------+
In SQL, id is the primary key column for this table.
Each row of this table indicates the id of a customer, their name,
and the id of the customer who referred them.

Find the names of the customer that are not referred by the customer with id = 2.
Return the result table in any order.
해결방법
- WHERE 절로 referee_id가 2가 아닌 row와 referee_id가 NULL인 row를 필터링하여 출력한다

WHERE(row)

사용 데이터들을 확인해보면, referee_id 컬럼에 NULL값이 많다

| id | name | referee_id |
| -- | ---- | ---------- |
| 1  | Will | null       |
| 2  | Jane | null       |
| 3  | Alex | 2          |
| 4  | Bill | null       |
| 5  | Zack | 1          |
| 6  | Mark | 2          |

가설) WHERE 절의 조건문들의 순서를 바꿔서 필터링을 더 많이 할 수 있다면 성능이 향상될 것이다
- query 1

 SELECT name
 FROM customer 
 WHERE referee_id !=2 or referee_id is null

query 2

SELECT name
FROM customer 
WHERE referee_id is null or referee_id !=2

EXPLAIN 결과

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE Customer null ALL null null null null 6 86.11 Using where
두 쿼리의 EXPLAIN 결과는 동일하게 위와 같다. 즉, 성능의 차이가 없다
결론 : WHERE문에서의 필터링은 MySQL 옵티마이저가 자동으로 최적 순서를 찾아내기 때문에, 쿼리 작성자는 조건문의 순서에 신경 쓸 필요는 없다

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	Customer	null	ALL	null	null	null	null	6	86.11	Using where

불필요한 컬럼

위 문제는 총 3개의 컬럼이 있으며, 필요한 하는 컬럼은 2개이다.
동일한 문제에 대해 컬럼의 수가 많은데, 필요한 컬럼은 2개라고 가정해보자.
- 이러한 경우, 최적 성능의 쿼리는 아래와 같을 것이다.

SELECT name
FROM (
    SELECT name, referee_id
    FROM Customer
) AS C
WHERE referee_id != 2 OR referee_id IS NULL

Column Filtering 필요한 컬럼만 불러오면 메모리 사용과 I/O 연산이 줄어들어 성능이 향상된다.

LIMIT

쿼리 작동 확인이나, 일부 데이터만 필요할 경우는 아래와 같이 제한을 걸어두고 사용하는 것도 필터링 최적화 방법 중 하나이다

-- 예시) id기준으로 오름차순 정렬한 5개의 결과만 확인하기

SELECT *
FROM Customer
ORDER BY id
LIMIT 5

index

문제 사이트에서는 인덱스를 적용할 수 없으므로, 자체 데이터셋에서 연습해보며 확인해보자

한상우

개인 공부용 블로그입니다

이전 포스트

system variables

다음 포스트

필터링 최적화

SQL

584. Find Customer Referee

WHERE(row)

불필요한 컬럼

LIMIT

index

system variables

REGEXP

0개의 댓글