SAS Advanced programming 정리- PROC SQL (2)

Hoya Jaeho Lee·2022년 4월 6일
0

SAS Advanced programming

목록 보기
2/17

Chapter 2 - Performing Advanced Queries Using PROC SQL

Eliminating Duplicate Rows from Output

proc sql;
select distinct flightnumber, destination
from sasuser.internationalflights
order by 1;

코드설명:
select절에 distinct 옵션을 사용하여 중복되는 rows삭제 기능!!

Using the CONTAINS or Question Mark (?) Operator to Select a String

proc sql outobs=10;
select name
from sasuser.frequentflyers
where name contains 'ER';

코드설명:
select절에 where ~ contatins 사용해서 조건에 해당하는 문자열 해당하는 column 선택 가능:)

where jobcategory in ('PT','NA','FA')
where dayofweek in (2,4,6)
where chesspiece not in ('pawn','king','queen')

이런식으로도 가능!

Using the IS MISSING or IS NULL Operator to Select Missing Values

proc sql;
select boarded, transferred,
nonrevenue, deplaned
from sasuser.marchflights
where boarded is missing;

코드설명:
결측에 해당하는 column 선택
where 절에 IS MISSING / IS NULL

Using the LIKE Operator to Select a Pattern

proc sql;
select ffid, name, address
from sasuser.frequentflyers
where address like '% P%PLACE';

코드설명:
like 사용하고 나서 %: 문자열/ 숫자 아무거나 가능

Subsetting Rows By Using Calculated Values

proc sql outobs=10;
select flightnumber, date, destination,
boarded + transferred + nonrevenue
as Total
from sasuser.marchflights
where calculated total < 100;

코드 설명: select절에 만든 변수를 where절에 반영하여 calculated옵션 지정하여 변수 계산 가능

select절에서도 calcuated 옵션 바로 가능!!!

proc sql outobs=10;
select flightnumber, date, destination,
boarded + transferred + nonrevenue
as Total,
calculated total/2 as Half
from sasuser.marchflights;

Enhancing Query Output

Specifying Column Formats and Labels

proc sql outobs=15;
title 'Current Bonus Information';
title2 'Employees with Salaries > $75,000';
select empid label='Employee ID',
jobcode label='Job Code',
salary,
salary * .10 as Bonus
format=dollar12.2

from sasuser.payrollmaster
where salary>75000
order by salary desc;

코드 설명:
format 변경 부분 확인 및 alias 순서 확인!!

Counting Rows

proc sql;
select substr(jobcode,1,2)
label='Job Category'
,
count(*) as Count
from sasuser.payrollmaster
group by 1;

Counting All Non-Missing Values in a Column

proc sql;
select count(JobCode) as Count
from sasuser.payrollmaster;

Counting All Unique Values in a Column

proc sql;
select count(distinct jobcode) as Count
from sasuser.payrollmaster;

코드 설명:
count뒤 distinct 옵션 추가

proc sql;
select jobcode,
avg(salary) as AvgSalary
format=dollar11.2
from sasuser.payrollmaster
group by jobcode
having avg(salary) > 56000;

코드 설명:
아래처럼 표현도 가능!!!
having 절의 경우 calculated 사용 안해도 됨:)
having AvgSalary > 56000;

Subsetting Data By Using Subqueries

-Subsetting Data By Using Noncorrelated Subqueries

Using Single-Value Noncorrelated Subqueries

proc sql;
select jobcode,
avg(salary) as AvgSalary
format=dollar11.2
from sasuser.payrollmaster
group by jobcode
having avg(salary) >
(select avg(salary)
from sasuser.payrollmaster);

코드 설명:
처음 having avg(salary)은 group by에 걸리는 jobcode에 따라 salary를 의미하고 뒤에 따라나오는 (select avg(salary) from sasuser.payrollmaster)은 전체의 company's average salary를 의미함.

Using Multiple-Value Noncorrelated Subqueries
proc sql;
select empid, lastname, firstname,
city, state
from sasuser.staffmaster
where empid in
(select empid
from sasuser.payrollmaster
where month(dateofbirth)=2);

코드 설명:
다른 데이터셋에서 select 해서 where를 안에 또 사용하여 month가 feb에 해당하는 empid만 선택하는 경우

Subsetting Data By Using Correlated Subqueries
:비효율적이어서 보통 PROC SQL join으로 많이 사용

proc sql;
select lastname, firstname
from sasuser.flightattendants
where not exists
(select *
from sasuser.flightschedule
where flightattendants.empid=
flightschedule.empid);

코드 설명:
A not exists B : B의 성분이 없는 순수한 A 만 선택!

Validating Query Syntax

NOEXEC option and the VALIDATE keyword
실행 하기 전 실행을 시키지 않고 log로 query 검증 부분!
-NOEXEC option
proc sql noexec;
select empid, jobcode, salary
from sasuser.payrollmaster
where jobcode contains 'NA'
order by salary;

-Validate option (semicolon이 붙지 않고, 바로 select 따라 나오는 거 확인! )
proc sql;
validate
select empid, jobcode, salary
from sasuser.payrollmaster
where jobcode contains 'NA'
order by salary;

References
SAS Certification Prep Guide: Advanced Programming for SAS 9, Fourth Edition 4th Edition by SAS Institute

profile
Biostatistics researcher Github: https://github.com/hoyajhl

0개의 댓글