SQL์†Œ๊ฐœ

SQL ๊ธฐ๋ณธ


๋‹ค์ˆ˜์˜ SQL ๋ฌธ์„ ์‹คํ–‰ํ•œ๋‹ค๋ฉด ์„ธ๋ฏธ์ฝœ๋ก ์œผ๋กœ ๋ถ„๋ฆฌํ•œ๋‹ค.
ex) sql๋ฌธ1;sql๋ฌธ2

  • SQL ์ฃผ์„: -- ์ธ๋ผ์ธ ํ•œ์ค„์งœ๋ฆฌ ์ฃผ์„, /--/ ์—ฌ๋Ÿฌ ์ค„ ์ฃผ์„
  • SQL ํ‚ค์›Œ๋“œ๋Š” ๋Œ€๋ฌธ์ž๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋˜์ง€ ํ•˜๋Š” ๋‚˜๋ฆ„์˜ ํฌ๋งทํŒ…์ด ํ•„์š”
    ํŒ€ํ”„๋กœ์ ํŠธ๋ผ๋ฉด ํŒ€์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๊ณตํ†ต ํฌ๋งท์ด ํ•„์š”
  • ํ…Œ์ด๋ธ”/ํ•„๋“œ ์ด๋ฆ„์˜ ๋ช…๋ช…๊ทœ์น™ ์ •ํ•˜๊ธฐ(๋‹จ์ˆ˜ํ˜•(ex : User) vs. ๋ณต์ˆ˜ํ˜•(Users), snake_case vs. CamelCase)

DDL - ํ…Œ์ด๋ธ” ๊ตฌ์กฐ ์ •์˜ ์–ธ์–ด


CREATE TABLE

  • primary key ์†์„ฑ์„ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์œผ๋‚˜ ๋ฌด์‹œ๋จ
    primary key uniqueness - big data๋ฐ์ดํ„ฐ์›จ์–ดํ•˜์šฐ์Šค์—์„œ๋Š” ์ง€์ผœ์ง€์ง€์•Š์Œ(๋ ˆ์ฝ”๋“œ๋“ค์ด ์ถ”๊ฐ€๋  ๋•Œ๋งˆ๋‹ค ๋งŽ์€์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํŒŒ์•…ํ•ด์„œ ์œ ๋‹ˆํฌํ•˜๊ฒŒ ํ•ด์ค˜์•ผ๋˜๋Š”๋ฐ ๋„ˆ๋ฌด ๋งŽ์•„์„œ ์–ด๋ ค์›€)
  • CTAS: CREATE TABLE table_name AS SELECT : ํ…Œ์ด๋ธ” ์ƒ์„ฑ๊ณผ ๋™์‹œ์— ๋‚ด์šฉ ์ถ”๊ฐ€ ( vs. CREATE TABLE and then INSERT)
  • raw_data ์Šคํ‚ค๋งˆ ์•ˆ์— user_session_channel ํ…Œ์ด๋ธ” ๋งŒ๋“ค๊ธฐ

ex)

CREATE TABLE raw_data.user_session_channel (
	userid int,
    sessionid varchar(32) primary key,
    channel varchar(32)
);

DROP TABLE

  • DROP TABLE table_name; ์—†๋Š” ํ…Œ์ด๋ธ”์„ ์ง€์šฐ๋ ค๊ณ  ํ•˜๋ฉด ์—๋Ÿฌ ๋ฐœ์ƒ
  • DROP TABLE IF EXISTS table_name; ํ…Œ์ด๋ธ”์ด ์กด์žฌํ•  ๋•Œ ์ง€์›€
  • DELETE FROM ์กฐ๊ฑด์— ๋งž๋Š” ๋ ˆ์ฝ”๋“œ๋“ค๋งŒ ์ง€์›€(ํ…Œ์ด๋ธ” ์ž์ฒด๋Š” ์กด์žฌ)

ALTER TABLE

  • ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ ์ถ”๊ฐ€: ALTER TABLE ํ…Œ์ด๋ธ”์ด๋ฆ„ ADD COLUMN ํ•„๋“œ์ด๋ฆ„ ํ•„๋“œํƒ€์ž…;
  • ๊ธฐ์กด ์ปฌ๋Ÿผ ์ด๋ฆ„ ๋ณ€๊ฒฝ: ALTER TABLE ํ…Œ์ด๋ธ”์ด๋ฆ„ RENAME ํ˜„์žฌํ•„๋“œ์ด๋ฆ„ to ์ƒˆํ•„๋“œ์ด๋ฆ„;
  • ๊ธฐ์กด ์ปฌ๋Ÿผ ์ œ๊ฑฐ: ALTER TABLE ํ…Œ์ด๋ธ”์ด๋ฆ„ DROP COLUMN ํ•„๋“œ์ด๋ฆ„;
  • ํ…Œ์ด๋ธ” ์ด๋ฆ„ ๋ณ€๊ฒฝ: ALTER TABLE ํ˜„์žฌํ…Œ์ด๋ธ”์ด๋ฆ„ RENAME to ์ƒˆํ…Œ์ด๋ธ”์ด๋ฆ„;

DML - ํ…Œ์ด๋ธ” ๋ฐ์ดํ„ฐ ์กฐ์ž‘ ์–ธ์–ด


SELECT: ๋ ˆ์ฝ”๋“œ ์งˆ์˜ ์–ธ์–ด

  • SELECT FROM: ํ…Œ์ด๋ธ”์—์„œ ๋ ˆ์ฝ”๋“œ์™€ ํ•„๋“œ ์ฝ์–ด์˜จ๋‹ค.
  • WHERE๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ ˆ์ฝ”๋“œ ์„ ํƒ ์กฐ๊ฑด ์ง€์ •
  • GROUP BY๋ฅผ ํ†ตํ•ด ์ •๋ณด๋ฅผ ๊ทธ๋ฃน ๋ ˆ๋ฒจ์—์„œ ์ถ”์ถœ
  • ORDER BY๋ฅผ ์‚ฌ์šฉํ•ด ๋ ˆ์ฝ”๋“œ ์ˆœ์„œ ๊ฒฐ์ •
  • ๋‹ค์ˆ˜์˜ ํ…Œ์ด๋ธ”์„ JOINํ•ด์„œ ์‚ฌ์šฉํ•˜๊ธฐ๋„ ํ•จ

๋ ˆ์ฝ”๋“œ ์ˆ˜์ • ์–ธ์–ด

  • INSERT INTO: ํ…Œ์ด๋ธ”์— ๋ ˆ์ฝ”๋“œ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ
  • UPDATE FROM: ํ…Œ์ด๋ธ” ๋ ˆ์ฝ”๋“œ์˜ ํ•„๋“œ ๊ฐ’ ์ˆ˜์ •
  • DELETE FROM(vs. TRUNCATE): ํ…Œ์ด๋ธ”์—์„œ ๋ ˆ์ฝ”๋“œ๋ฅผ ์‚ญ์ œ

SELECT ์†Œ๊ฐœ

ํ…Œ์ด๋ธ”(๋“ค)์—์„œ ๋ ˆ์ฝ”๋“œ๋“ค(ํ˜น์€ ๋ ˆ์ฝ”๋“œ ์ˆ˜)์„ ์ฝ์–ด์˜ค๋Š”๋ฐ ์‚ฌ์šฉํ•˜๋ฉฐ, WHERE๋ฅผ ์‚ฌ์šฉํ•ด ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” ๋ ˆ์ฝ”๋“œ๋งŒ ์ฝ์–ด์˜ฌ ์ˆ˜ ์žˆ๋‹ค.

SELECT ํ•„๋“œ์ด๋ฆ„1, ํ•„๋“œ์ด๋ฆ„2, ... -> *๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ชจ๋“  ํ•„๋“œ ์„ ํƒ
FROM ํ…Œ์ด๋ธ”์ด๋ฆ„
WHERE ์„ ํƒ์กฐ๊ฑด
GROUP BY ํ•„๋“œ์ด๋ฆ„1, ํ•„๋“œ์ด๋ฆ„2, ...
ORDER BY ํ•„๋“œ์ด๋ฆ„ [ASC|DESC] -> ํ•„๋“œ ์ด๋ฆ„ ๋Œ€์‹ ์— ์ˆซ์ž ์‚ฌ์šฉ ๊ฐ€๋Šฅ
LIMIT N;
ex)

SELECT *
FROM raw_data.user_session_channel;

SELECT *
FROM raw_data.user_session_channel
LIMIT 10;

SELECT DISTINCT channel                      -> ์œ ์ผํ•œ ์ฑ„๋„ ์ด๋ฆ„์„ ์•Œ๊ณ  ์‹ถ์€ ๊ฒฝ์šฐ
FROM raw_data.user_session_channel;

SELECT channel, COUNT(1)                    -> ์ฑ„๋„๋ณ„ ์นด์šดํŠธ๋ฅผ ํ•˜๊ณ  ์‹ถ์€ ๊ฒฝ์šฐ. COUNT ํ•จ์ˆ˜!!
FROM raw_data.user_session_channel	    -> SELECT๋ฌธ์˜ 1๋ฒˆ ํ•„๋“œ๋กœ GROUP BY ์ˆ˜ํ–‰
GROUP BY 1;

SELECT COUNT(1)                    -> ์ฑ„๋„๋ณ„๋กœ ์†ํ•œ ๋ ˆ์ฝ”๋“œ ์ˆ˜๋ฅผ ์นด์šดํŠธ,ํ…Œ์ด๋ธ”์˜ ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ ์ˆ˜ ์นด์šดํŠธํ•˜๋Š” ๊ฒฝ์šฐ-> COUNT(*).
FROM raw_data.user_session_channel;

SELECT COUNT(1)
FROM raw_data.user_session_channel
WHERE channel = 'Facebook';		   ->channel ์ด๋ฆ„์ด Facebook์ธ ๋ ˆ์ฝ”๋“œ ์ˆ˜ ์นด์šดํŠธ

CASE WHEN

ํ•„๋“œ๊ฐ’์˜ ๋ณ€ํ™˜์„ ์œ„ํ•ด ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค.
CASE WHEN ์กฐ๊ฑด THEN ์ฐธ์ผ ๋•Œ ๊ฐ’ ELSE ๊ฑฐ์ง“์ผ ๋•Œ ๊ฐ’ END ํ•„๋“œ์ด๋ฆ„
์—ฌ๋Ÿฌ ์กฐ๊ฑด์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ๋„ ๊ฐ€๋Šฅํ•˜๋‹ค.

ex

SELECT
  channel,
  CASE
    WHEN channel in ('Facebook', 'Instagram') THEN 'Social-Media'
    WHEN channel in ('Google', 'Naver') THEN 'Search-Engine'
    ELSE 'Something-Else'
END channel_type
FROM raw_data.user_session_channel
LIMIT 100;

channel์ด๋ฆ„์œผ๋กœ ๋ถ„๋ฅ˜๋ฅผ ํ•˜์—ฌ channel_type๋ผ๋Š” ์ƒˆ๋กœ์šด ํ•„๋“œ ์ƒ์„ฑ

NULL์ด๋ž€?

  • ๊ฐ’์ด ์กด์žฌํ•˜์ง€ ์•Š์Œ์„ ๋‚˜ํƒ€๋‚ด๋Š” ์ƒ์ˆ˜. 0 ํ˜น์€ ""๊ณผ๋Š” ๋‹ค๋ฆ„
  • ํ•„๋“œ ์ง€์ • ์‹œ ๊ฐ’์ด ์—†๋Š” ๊ฒฝ์šฐ NULL๋กœ ์ง€์ • ๊ฐ€๋Šฅ
    ํ…Œ์ด๋ธ” ์ •์˜์‹œ ๋””ํดํŠธ ๊ฐ’์œผ๋กœ๋„ ์ง€์ • ๊ฐ€๋Šฅ
  • ์–ด๋–ค ํ•„๋“œ์˜ ๊ฐ’์ด NULL์ธ์ง€ ์•„๋‹Œ์ง€ ๋น„๊ต๋Š” ํŠน์ˆ˜ํ•œ ๋ฌธ๋ฒ•์„ ํ•„์š”๋กœ ํ•จ
    field1 is NULL: ์–ด๋–ค ํ•„๋“œ์˜ ๊ฐ’์ด NULL์ธ์ง€ ์•„๋‹Œ์ง€ ๋น„๊ต
  • NULL์ด ์‚ฌ์น™์—ฐ์‚ฐ์— ์‚ฌ์šฉ๋˜๋ฉด ๊ทธ ๊ฒฐ๊ณผ๋Š” ๋ชจ๋‘ NULL

COUNTํ•จ์ˆ˜ ์ œ๋Œ€๋กœ ์ดํ•ดํ•˜๊ธฐ

  • SELECT COUNT(1) FROM count_test
    ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ ์ˆ˜๋ฅผ ์„ธ์–ด 7์„ ๋ฐ˜ํ™˜
  • SELECT COUNT(0) FROM count_test
    ์œ„์™€ ๊ฐ™์€ ๊ฒฐ๊ณผ
  • SELECT COUNT(NULL) FROM count_test
    NULL์ด ์ธ์ž๋กœ ๋“ค์–ด์˜ค๋ฉด ์„ธ์ง€ ์•Š์œผ๋ฏ€๋กœ 0์„ ๋ฐ˜ํ™˜
  • SELECT COUNT(value) FROM count_test
    NULL์ด ์•„๋‹Œ value์˜ ๋ ˆ์ฝ”๋“œ ์ˆ˜๋ฅผ ์„ธ์–ด 6์„ ๋ฐ˜ํ™˜
  • SELECT COUNT(DISTINCT value) FROM count_test
    ์œ ์ผํ•œ ๊ฐ’๋“ค(NULL, 1, 0, 4, 3) ์ค‘ NULL์ด ์•„๋‹Œ ๊ฐ’๋งŒ ์„ธ์–ด 4๋ฅผ ๋ฐ˜ํ™˜

WHERE

  1. IN and NOT IN
  • WHERE channel in ('Google', 'Youtube')
  • WHERE channel = 'Google' OR channel = 'Youtube'
  1. LIKE and ILIKE
  • LIKE: ๋Œ€์†Œ๋ฌธ์ž๋ฅผ ๊ตฌ๋ถ„
  • ILIKE: ๋Œ€์†Œ๋ฌธ์ž๋ฅผ ๊ตฌ๋ถ„ํ•˜์ง€ ์•Š์Œ
  • WHERE channel LIKE 'G%' ->'G*(์ฑ„๋„๋ช…์ด G๋กœ ์‹œ์ž‘ํ•˜๋Š” ์ฑ„๋„)
  • WHERE channel LIKE '%o%' ->o(์ฑ„๋„๋ช…์— o๊ฐ€ ๋“ค์–ด์žˆ๋Š” ์ฑ„๋„)
  • NOT LIKE or NOT ILIKE
  1. BETWEEN
  • ๋‚ ์งœ์‚ฌ์ด์— ๋“ค์–ด๊ฐ€๋ฉด ๋งค์นญ

ex)

SELECT COUNT(1)
FROM raw_data.user_session_channel
WHERE channel in ('Google','Facebook');

SELECT DISTINCT channel
FROM raw_data.user_session_channel
WHERE channel ILIKE '%o%'; 
์ฑ„๋„๋ช…์— o๊ฐ€ ๋“ค์–ด์žˆ๋Š” ์ฑ„๋„์„ ๊ตฌ๋ถ„๋˜๋Š” ์ฑ„๋„๋งŒ ์ถœ๋ ฅ

STRING Functions

  • LEFT(str, N): ์™ผ์ชฝ๋ถ€ํ„ฐ N๊ฐœ ๋ฌธ์ž์—ด๋งŒ ์ถ”์ถœ
  • REPLACE(str, exp1, exp2): ์ฃผ์–ด์ง„ ๋ฌธ์ž์—ด์—์„œ exp1์„ exp2๋กœ ๋Œ€์ฒด
  • UPPER(str) : ๋Œ€๋ฌธ์ž๋กœ ๋ณ€ํ™˜
  • LOWER(str) : ์†Œ๋ฌธ์ž๋กœ ๋ณ€ํ™˜
  • LEN(str) : ๋ฌธ์ž์—ด์˜ ๊ธธ์ด ์ถ”์ถœ
  • LPAD, RPAD: ๋ฌธ์ž์—ด์˜ ์™ผ์ชฝ/์˜ค๋ฅธ์ชฝ์— ๋ฌธ์ž padding
  • SUBSTRING: ๋ฒ”์œ„๋ฅผ ์ง€์ •ํ•˜์—ฌ ๋ฌธ์ž์—ด ์ถ”์ถœ
    ex)
SELECT
    LEN(channel),
    UPPER(channel),
    LOWER(channel),
    LEFT(channel, 4)
FROM raw_data.user_session_channel
LIMIT 100;

ORDER BY

  • ๋””ํดํŠธ๊ฐ’์€ ์˜ค๋ฆ„์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•œ๋‹ค.
  • ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ์„ ํ•˜๊ณ ์‹ถ๋‹ค๋ฉด "DESC" ๋ช…์‹œ
  • ์—ฌ๋Ÿฌ ๊ฐœ ์ปฌ๋Ÿผ์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•  ์ˆ˜ ์žˆ์Œ ORDER BY 1 DESC, 2, 3
  • NULL๊ฐ’์˜ ์ˆœ์„œ
    NULL๊ฐ’๋“ค์€ ์˜ค๋ฆ„์ฐจ์ˆœ์ผ ๊ฒฝ์šฐ ๋งˆ์ง€๋ง‰์—, ๋‚ด๋ฆผ์ฐจ์ˆœ์ผ ๊ฒฝ์šฐ ์ฒ˜์Œ์— ์œ„์น˜ํ•จ
    ์ด๋ฅผ ๋ฐ”๊พธ๋ ค๋ฉด NULLS FIRST ํ˜น์€ NULLS LAST ์‚ฌ์šฉ

Type Casting

1/2์˜ ๊ฒฐ๊ณผ๋Š” 0์ด๋จ. ์ •์ˆ˜๊ฐ„์˜ ์—ฐ์‚ฐ์€ ์ •์ˆ˜๊ฐ€ ๋˜์–ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.
๋”ฐ๋ผ์„œ ๋ถ„์ž๋‚˜ ๋ถ„๋ชจ์ค‘์˜ ํ•˜๋‚˜๋ฅผ float์œผ๋กœ ์บ์ŠคํŒ…ํ•ด์•ผ 0.5๊ฐ€ ๋‚˜์˜ด

์บ์ŠคํŒ…ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋‘๊ฐ€์ง€๊ฐ€ ์žˆ๋‹ค.

  • ์˜คํผ๋ ˆ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉ
    category::float
  • castํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉ
    cast(category as float)

Pandas๋ž‘ ์—ฐ๋™ํ•˜๋Š” ๋ฒ•

result = %sql SELECT * FROM raw_data.user_session_channel
์ด ํ›„ ํŒ๋‹ค์Šค ๋ฌธ๋ฒ•์„ ์ด์šฉํ•˜์—ฌ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
df = result.DataFrame()

df.groupby(["channel"]).size()
df.groupby(["channel"])["sessionid"].count()

๋!


profile
๊ฒŒ์„๋ €๋˜ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๊ณต๋ถ€

0๊ฐœ์˜ ๋Œ“๊ธ€