[SQL] Managers with at Least 5 Direct Reports

Hyunjun Kim·2024년 11월 6일

SQL

목록 보기

26/44

문제링크

570. Managers with at Least 5 Direct Reports

문제 설명

Table: Employee

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| name        | varchar |
| department  | varchar |
| managerId   | int     |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table indicates the name of an employee, their department, and the id of their manager.
If managerId is null, then the employee does not have a manager.
No employee will be the manager of themself.
 

Write a solution to find managers with at least five direct reports.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Employee table:
+-----+-------+------------+-----------+
| id  | name  | department | managerId |
+-----+-------+------------+-----------+
| 101 | John  | A          | null      |
| 102 | Dan   | A          | 101       |
| 103 | James | A          | 101       |
| 104 | Amy   | A          | 101       |
| 105 | Anne  | A          | 101       |
| 106 | Ron   | B          | 101       |
+-----+-------+------------+-----------+
Output: 
+------+
| name |
+------+
| John |
+------+

접근

접근 1.

접근 2. GROUP BY managerId count 해서 5개 이상 iD 찾고
새로 table 조회해서 해당 아이디 와 join

문제 풀이

풀이1 sub쿼리 활용
처음에는 이렇게 풀었는데 Runtime Error 가 발생했다.

SELECT name
FROM Employee
WHERE id = (SELECT managerId
	FROM Employee
	WHERE managerId IS NOT NULL
	GROUP BY managerId
	HAVING COUNT(managerId) >= 5)

풀이2 join 활용
RuntimeError 가 발생하면 Join을 활용하라는 문구에 JOIN 을 활용해 보았다.

SELECT e1.name
FROM Employee AS e1 JOIN (
	SELECT managerId
	FROM Employee
	WHERE managerId IS NOT NULL
	GROUP BY managerId
	HAVING COUNT(managerId) >= 5) AS e2 
ON e1.id = e2.managerId

둘의 차이는 뭐였을까? - GPT 에게 물어보았다.

첫 번째 쿼리가 runtime error를 발생시키는 이유는
서브쿼리가 여러 개의 결과 값을 반환할 수 있기 때문.
WHERE id = (SELECT ...) 조건에서 서브쿼리의 결과가 단일 값이어야 하지만, GROUP BY와 HAVING COUNT(managerId) >= 5 조건에 따라 여러 managerId가 반환될 가능성이 있습니다.

이 경우, = 연산자는 단일 값 비교에만 사용할 수 있기 때문에 다중 결과를 반환하는 서브쿼리와 함께 사용하면 오류가 발생합니다.

반면, 두 번째 쿼리는 서브쿼리의 결과를 JOIN을 통해 e1 테이블과 연결하므로, 다중 결과가 반환되더라도 문제없이 작동합니다. JOIN은 다중 행을 처리할 수 있도록 설계되어 있기 때문에 에러가 발생하지 않습니다.

요약하자면:
첫 번째 쿼리의 서브쿼리가 다중 행을 반환할 경우 = 연산자 때문에 runtime error가 발생합니다.
두 번째 쿼리는 JOIN을 사용하여 다중 행을 처리하므로 에러 없이 실행됩니다.

예시로 Employee 테이블에 다음과 같은 데이터가 있다고 가정해 봅시다:

id	name	managerId
1	Alice	2
2	Bob	3
3	Charlie	NULL
4	David	2
5	Eve	2
6	Frank	3
7	Grace	2
8	Henry	2

이제 managerId가 5명 이상인 경우를 찾는 서브쿼리를 실행해 보겠습니다.

서브쿼리 결과:

SELECT managerId
FROM Employee
WHERE managerId IS NOT NULL
GROUP BY managerId
HAVING COUNT(managerId) >= 5;

이 서브쿼리는 다음과 같이 managerId가 2인 경우를 반환할 수 있습니다:

managerId
2

이제 쿼리를 실행할 때 문제를 살펴봅시다:

SELECT name
FROM Employee
WHERE id = (SELECT managerId
            FROM Employee
            WHERE managerId IS NOT NULL
            GROUP BY managerId
            HAVING COUNT(managerId) >= 5);

SELECT managerId ... 서브쿼리가 반환하는 결과가 단일 행이 아닌 경우(예: 여러 managerId가 조건을 만족하면), id = (...) 조건은 여러 값과 비교할 수 없기 때문에 runtime error가 발생하는 반면에

JOIN 은 e1의 id와 서브쿼리에서 반환된 managerId를 비교하기 때문에, e2에서 다중 행이 반환되더라도 문제 없이 e1.name을 반환할 수 있습니다.

Hyunjun Kim

이전 포스트

[SQL] 1280. Students and Examinations

다음 포스트