서브쿼리

파랑·2021년 8월 10일

항상 이 서브 쿼리 부분이 헷갈렸는데,
오늘 수업을 들으면서 헷갈렸던 부분들이 많이 정리되었다.

생각보다 간단한 것인 것도 많았는데,
그동안 내가 너무 어렵게 빙빙 꼬아서 접근한 것도 있었던 것 같다😂😀

어려운 상태로 내버려두는 게 아니라 간결하게 간소화해서 학습하는 버릇을 들여야겠다고 다짐🤸‍♀️🤸‍♀️

서브쿼리란?

select, from, where 절 이하에 쿼리문이 들어오는 것을 말한다. * select 문의 서브쿼리

select dept_name, 
(select count(*) 
from instructor 
where department.dept_name = instructor.dept_name) as num_instructors
from department;

단일행 서브쿼리 - scalar subquery

서브쿼리의 결과행이 단일하기 때문에 scalar subquery라고 부른다.
- 연산자 <, >, <>, = 등으로 비교
- ex) 스미스의 연봉보다 높은 직원의 이름과 연봉

    select ename, sal
    from emp
    where sal > (select sal from emp where ename ='SMITH')

다중행 서브쿼리

서브 쿼리의 결과값이 여러 개일 때

연산자

IN, >ANY, <ANY, <ALL, >ALL, EXIST

>any

부서번호 30인 직원들 중에 가장 연봉을 많이 받는 사원보다 많이 받는 직원 (max로 처리 가능)
```
select ename, deptno, sal
from emp
where emp.sal > any(select sal from emp where deptno = 30);
```

IN

2009년 가을 학기와 2010년 봄 두 학기에 다 열린 과목의 이름 (intersection 해도 되지 않나...?)

select disinct course_id
from section
where semester = 'Fall' and year = 2009 and 
course_id in (select course_id from section where semester = 'Spring');

의미적으로 section(table)과 section(table)의 equi-join
in 안에 집합 (다중행 결과)

in (select course_id from section where semester = 'Spring')

ID 10101 강사가 가르친 수업을 들은 학생의 수

select count(distinct ID)
from takes
where (course_id, sec_id, semester, year) in 
(select course_id, sec_id, semester, year from teaches where teaches.ID=10101);

의미적으로 TAKES(table)과 Takes(table)의 equi-join
in 안에 집합 (다중행 결과)

비교 대상의 형식이 같아야 한다. course_id, sec_id, semester, year

학생 테이블에서 각 학년별로 최대키를 가진 학생들의 학년과 이름 출력
```
select name, grade
from student
where (grade, height) in(select grade, max(height) from student group by grade);
```
- 똑같은 쿼리를 IN 없이 작성해볼 수도 있다.
```
select grade, name
from student s1
where height = (select max(height) 
                from student s2 
                where s1.grade = s2.grade);
```
- S1이 튜플 변수가 되어, 만약 1학년이면 중첩문 안의 student 안에서 s2의 grade와 비교해 모든 1학년의 RESULT SET을 만들어 내고 그 안의 max(height)를 만들어 낸다. => group by와 같은 효과를 냄

코스별 평균 점수 이상인 학생 이름과 코스 이름

select name, title
from student s, course c, takes t
where S.ID = T.ID AND C.course_ID = T.course_id and
score > (select avg(score)
        from takes T1
        where T1.course_id = C.courseID)

여기서는 C가 튜플 변수가 되어 코스별(수업이름별) 평균 점수를 만들고 해당 점수 이상인 학생과 수업의 이름을 출력한다.

튜플 변수를 이용하는 것 대신 IN을 사용할 수도 있다.

select name, title
from student s, course c, takes t
where S.ID = T.ID AND C.course_ID = T.course_id and
(score) >(select avg(score) 
          from student 
          group by score);

exists

select course_id
from section as S
Where semester = 'Fall' and year = 2009 and
exists (select *
        from section as T
        where semeser = 'Spring' and year = 2010 and S.course_id = T.course_id)

2009년 가을과 2010년 봄 두 학기에 다 열린 과목
1. from 절: section as S에 tuple 값이 담김 ( S는 변수)
2. where 절: semester = 'Fall' and year = 2009 을 타고 exists에 옴
3. () 안을 먼저 계산 :
  1) from 절: T에 section의 tuple이 담김
  2) where 절: semeser = 'Spring' and year = 2010 이고 S의 튜플과 T의 튜플 값이 같은 모든 튜플을 찾아냄
4. 1에 가서 S는 다음 튜플 값이 되고 이를 계속 반복함.

not exists

select distinct S.ID, S.name
from student as S
where not exists ((select course_id
                  from course
                  where dept_name ='Biology')
                 except 
                 (select T.course_id
                 from takes as T
                 where S.ID = T.ID));

생물학과에서 모든 코스를 수강한 모든 학생
(1) except (2)
- 1번째 식은 독립적인 식이다 - 단 한 번만 계산됨
- 2번째 식은 비독립적이다 - 튜플 변수 S를 포함하고 있기 때문에 한 학생마다 들은 모든 course의 레코드(takes)를 계산한다.
- 따라서 student와 takes를 직접 join하지 않고 takes 안에 있는 t의 아이디와의 비교를 통해 수강 여부를 판단할 수 있다.

파랑

이전 포스트

wsl2 네트워크 어떻게 잡히는지

다음 포스트