πŸ“’ SQL(2)

KimdongkiΒ·2024λ…„ 4μ›” 23일

DB

λͺ©λ‘ 보기
5/33

πŸ“Œ κ΄€κ³„ν˜• λ°μ΄ν„°λ² μ΄μŠ€ 예제 - μ›Ήμ„œλΉ„μŠ€ μ‚¬μš©μž/μ„Έμ…˜ 정보

  • μ‚¬μš©μž ID : 보톡 μ›Ήμ„œλΉ„μŠ€μ—μ„œλŠ” λ“±λ‘λœ μ‚¬μš©μžλ§ˆλ‹€ λΆ€μ—¬ν•˜λŠ” μœ μΌν•œ ID
  • μ„Έμ…˜ ID : μ„Έμ…˜λ§ˆλ‹€ λΆ€μ—¬λ˜λŠ” ID
    • μ„Έμ…˜ : μ‚¬μš©μžμ˜ 방문을 논리적인 λ‹¨μœ„λ‘œ λ‚˜λˆˆ 것
      • μ‚¬μš©μžκ°€ μ™ΈλΆ€ 링크(κ΄‘κ³ )λ₯Ό 타고 μ˜€κ±°λ‚˜ 직접 λ°©λ¬Έν•΄μ„œ 올 경우 μ„Έμ…˜μ„ 생성
      • μ‚¬μš©μžκ°€ λ°©λ¬Έ ν›„ 30λΆ„κ°„ Interaction이 μ—†λ‹€κ°€ λ­”κ°€λ₯Ό ν•˜λŠ” 경우 μƒˆλ‘œ μ„Έμ…˜μ„ 생성
    • ν•˜λ‚˜μ˜ μ‚¬μš©μžλŠ” μ—¬λŸ¬ 개의 μ„Έμ…˜μ„ κ°€μ§ˆ 수 μžˆλ‹€.
    • 보톡 μ„Έμ…˜μ˜ 경우 μ„Έμ…˜μ„ λ§Œλ“€μ–΄λ‚Έ 접점(κ²½μš°μ§€)λ₯Ό μ±„λ„μ΄λž€ μ΄λ¦„μœΌλ‘œ 기둝해둔닀.
      -> λ§ˆμΌ€νŒ… κ΄€λ ¨ 기여도 뢄석을 μœ„ν•œκ²ƒμ΄λ‹€.
    • μ„Έμ…˜μ΄ 생긴 μ‹œκ°„λ„ κΈ°λ‘ν•œλ‹€.
  • 이 정보λ₯Ό 기반으둜 λ‹€μ–‘ν•œ 데이터 뢄석과 μ§€ν‘œ 섀정이 κ°€λŠ₯ν•˜λ‹€.
    • λ§ˆμΌ€νŒ… κ΄€λ ¨, μ‚¬μš©μž νŠΈλž˜ν”½ κ΄€λ ¨
    • DAU, WAU, MAUλ“±μ˜ 일주월별 Active User 차트
    • Marketing Channel Attribution 뢄석
      • μ–΄λŠ 채널에 κ΄‘κ³ λ₯Ό ν•˜λŠ” 것이 κ°€μž₯ νš¨κ³Όμ μΈκ°€?
  • μ‚¬μš©μž ID 100번 : 총 3개의 μ„Έμ…˜(νŒŒλž€ 글씨)을 κ°–λŠ” 예제
    • μ„Έμ…˜ 1 : ꡬ글 ν‚€μ›Œλ“œ κ΄‘κ³ λ‘œ μ‹œμž‘ν•œ μ„Έμ…˜
    • μ„Έμ…˜ 2 : Facebook κ΄‘κ³ λ₯Ό 톡해 생긴 μ„Έμ…˜
    • μ„Έμ…˜ 3 : 넀이버 κ΄‘κ³ λ₯Ό 톡해 생긴 μ„Έμ…˜

ꡬ글 κ΄‘κ³ (λ°©λ¬Έ) -> μƒν’ˆ κ΄‘κ³  클릭 -> μƒν’ˆ 리뷰 νŽ˜μ΄μ§€ 클릭 -> Facebook κ΄‘κ³ (λ°©λ¬Έ)
-> λ‹€λ₯Έ μƒν’ˆ κ΄‘κ³  클릭 -> 넀이버 κ΄‘κ³ (λ°©λ¬Έ) -> μƒν’ˆ ꡬ맀


πŸ“Œ κ΄€κ³„ν˜• λ°μ΄ν„°λ² μ΄μŠ€ 예제 - λ°μ΄ν„°λ² μ΄μŠ€μ™€ ν…Œμ΄λΈ”

raw_data DB(μŠ€ν‚€λ§ˆ)
user_session_channel Table

μ»¬λŸΌνƒ€μž…
useridint
sessionidvarchar(32)
channelvarchar(32)

session_timestamp Table

μ»¬λŸΌνƒ€μž…
sessionidvarchar(32)
tstimestamp

πŸ“Œ SQL κΈ°λ³Έ

  • λ¨Όμ € λ‹€μˆ˜μ˜ SQL문을 μ‹€ν–‰ν•œλ‹€λ©΄ μ„Έλ―Έμ½œλ‘ (;)으둜 뢄리가 ν•„μš”ν•˜λ‹€.
  • SQL 주석 -> --(ν•œμ€„), /--/(μ—¬λŸ¬μ€„)
  • SQLν‚€μ›Œλ“œλŠ” λŒ€λ¬Έμžλ₯Ό μ‚¬μš©ν•˜μ—¬ λ‚˜λ¦„λŒ€λ‘œ ν¬λ§·νŒ…μ΄ ν•„μš”ν•˜λ‹€.
    -> νŒ€ ν”„λ‘œμ νŠΈλΌλ©΄ νŒ€μ—μ„œ μ‚¬μš©ν•˜λŠ” 곡톡 포맷이 ν•„μš”ν•˜λ‹€.
  • ν…Œμ΄λΈ”/ν•„λ“œ μ΄λ¦„μ˜ λͺ…λͺ…κ·œμΉ™μ„ μ •ν•˜λŠ” 것이 μ€‘μš”ν•˜λ‹€.
    -> λ‹¨μˆ˜ν˜• vs. λ³΅μˆ˜ν˜• -> User vs. Users
    -> _ vs. CamelCasing -> user_session_channel vs. UserSessionChannel

πŸ“Œ SQL DDL - ν…Œμ΄λΈ” ꡬ쑰 μ •μ˜ μ–Έμ–΄

  • CREATE TABLE -> ν…Œμ΄λΈ” 생성
  • Primary key 속성을 μ§€μ •ν•  수 μžˆμœΌλ‚˜ λ¬΄μ‹œλœλ‹€.
    • Primary key uniqueness
      • Big Data λ°μ΄ν„°μ›¨μ–΄ν•˜μš°μŠ€μ—μ„œλŠ” μ§€μΌœμ§€μ§€ μ•ŠλŠ”λ‹€.(Redshift, Snowflake, BigQuery)
      • λŒ€λŸ‰μ˜ λ ˆμ½”λ“œλ₯Ό μΆ”κ°€ν•  λ•Œ μ„±λŠ₯이 μ €ν•˜λœλ‹€. -> ν•˜λ‚˜μ”© λ‹€ 확인해야 ν•˜κΈ° λ•Œλ¬Έ
  • CTAS : CREATE TABLE table_name AS SELECT (ν…Œμ΄λΈ” 생성 + λ‚΄μš© μΆ”κ°€)
    • vs. CREATE TABLE and thenm INSERT
CREATE TABLE raw_data.user_session_channel(
	userid int,
    sessionid varchar(32) primary key,
    channel varchar(32)
);
  • DROP TABLE -> ν…Œμ΄λΈ” 제거

    • DROP TABLE table_name; -> μ—†λŠ” ν…Œμ΄λΈ”μ„ μ§€μš°λ €ν•˜λ©΄ 였λ₯˜ λ°œμƒ
    • DROP TABLE IF EXISTS table_name; -> μ—λŸ¬ λ©”μ„Έμ§€ μƒλž΅
      -> vs. DELETE FROM
      • DELETE FROM은 쑰건에 λ§žλŠ” λ ˆμ½”λ“œλ“€μ„ 지움(ν…Œμ΄λΈ”μ€ 쑴재)
  • ALTER TABLE

    • μƒˆλ‘œμš΄ 컬럼 μΆ”κ°€
      -> ALTER TABLE table_name ADD COLUMN field_name field_type;
    • κΈ°μ‘΄ 컬럼 이름 λ³€κ²½
      -> ALTER TABLE table_name RENAME (ν˜„μž¬)field_name to (NEW)field_name
    • κΈ°μ‘΄ 컬럼 제거
      -> ALTER TABLE table_name DROP COLUMN field_name;
    • ν…Œμ΄λΈ” 이름 λ³€κ²½
      -> ALTER TABLE (ν˜„μž¬)table_name RENAME to (NEW)table_name;

πŸ“Œ SQL DML - ν…Œμ΄λΈ” 데이터 μ‘°μž‘ μ–Έμ–΄

  • λ ˆμ½”λ“œ 질의

    • SELECT FROM : ν…Œμ΄λΈ”μ—μ„œ λ ˆμ½”λ“œμ™€ ν•„λ“œλ₯Ό μ½μ–΄μ˜€λŠ”λ° μ‚¬μš©
    • WHERE : λ ˆμ½”λ“œ 선택 쑰건을 μ§€μ •
    • GROUP BY : 정보λ₯Ό κ·Έλ£Ή λ ˆλ²¨μ—μ„œ λ½‘λŠ”λ° μ‚¬μš©
      -> DAU(일별), WAU(주별), MAU(월별) 계산은 GROUP BYλ₯Ό ν•„μš”λ‘œ ν•œλ‹€.
    • ORDER BY : λ ˆμ½”λ“œ μˆœμ„œλ₯Ό κ²°μ • -> ASC(μ˜€λ¦„), DESC(λ‚΄λ¦Ό) -> ν•„λ“œ 이름 λŒ€μ‹  숫자 μ‚¬μš© κ°€λŠ₯
    • LIMIT N; : μ•žμ—μ„œλΆ€ν„° N개 만큼 κ°€μ Έμ˜¨λ‹€.
    • 보톡 λ‹€μˆ˜μ˜ ν…Œμ΄λΈ”μ„ μ‘°μΈν•΄μ„œ μ‚¬μš©ν•˜κΈ°λ„ ν•œλ‹€.
  • λ ˆμ½”λ“œ μˆ˜μ •

    • INSERT INTO : ν…Œμ΄λΈ”μ— λ ˆμ½”λ“œλ₯Ό μΆ”κ°€ν•˜λŠ”λ° μ‚¬μš©
    • UPDATE FROM : ν…Œμ΄λΈ” λ ˆμ½”λ“œμ˜ ν•„λ“œ κ°’ μˆ˜μ •
    • DELETE FROM : ν…Œμ΄λΈ”μ—μ„œ λ ˆμ½”λ“œλ₯Ό μ‚­μ œ
      • vs. TRUNCATE
  • DISTINCT : 쀑볡 제거

SELECT DISTINCT channel
FROM raw_data.user_session_channel;
  • CASE WHEN
    • ν•„λ“œ κ°’μ˜ λ³€ν™˜μ„ μœ„ν•΄ μ‚¬μš© κ°€λŠ₯ν•˜λ‹€.
    • CASE WHEN 쑰건 THEN μ°ΈμΌλ•Œ κ°’ ELSE 거짓일 λ–„ κ°’ END Field_name
    • μ—¬λŸ¬ 쑰건을 μ‚¬μš©ν•˜μ—¬ λ³€ν™˜ κ°€λŠ₯
CASE
	WHEN 쑰건1 THEN κ°’1
    WHEN 쑰건2 THEN κ°’2
END Field_name
SELECT CASE
	WHEN channel in ('Facebook', 'Instagram') THEN 'Social-Media'
    WHEN channel in ('Google', 'Naver') THEN 'Search-Engine'
    ELSE 'Something-Else'
END channel_type
FROM raw_data.user_session_channel;

πŸ“Œ NULL

  • 값이 μ‘΄μž¬ν•˜μ§€ μ•ŠμŒμ„ λ‚˜νƒ€λ‚΄λŠ” μƒμˆ˜. 0 or ""κ³ΌλŠ” λ‹€λ₯΄λ‹€.
  • ν•„λ“œ μ§€μ •μ‹œ 값이 μ—†λŠ” 경우 NULL둜 μ§€μ • κ°€λŠ₯ν•˜λ‹€.
    • ν…Œμ΄λΈ” μ •μ˜μ‹œ λ””ν΄νŠΈ κ°’μœΌλ‘œλ„ μ§€μ • κ°€λŠ₯ν•˜λ‹€.
  • μ–΄λ–€ ν•„λ“œμ˜ 값이 NULL인지 μ•„λ‹Œμ§€ λΉ„κ΅ν•˜λŠ” νŠΉμˆ˜ν•œ 문법을 ν•„μš”λ‘œ ν•œλ‹€.
    • field1 is NULL or field1 is not NULL
  • NULL이 사칙연산에 μ‚¬μš©λ˜λŠ” 경우
    • SELECT 0 + NULL, 0 - NULL, 0 * NULL, 0 / NULL

πŸ“Œ COUNT

  • COUNT(n) : μ±„λ„λ³„λ‘œ 카운트λ₯Ό ν•˜κ³ μ‹Άμ€ 경우 μ‚¬μš©
  1. ν•œ κ·Έλ£Ή 속에 λ ˆμ½”λ“œκ°€ λͺ‡κ°œμΈμ§€
    SELECT channel, COUNT(1)
    FROM raw_data.user_session_channel
    GROUP BY 1;
  2. SELECT 쑰건에 λ§Œμ‘±ν•˜λŠ” λ ˆμ½”λ“œμ˜ 수
    SELECT COUNT(1)
    FROM raw_data.user_session_channel;
  3. channel이름이 Facebook인 λ ˆμ½”λ“œ 수
    SELECT COUNT(1)
    FROM raw_data.user_session_channel
    WHERE channel = 'Facebook';
  • μ˜ˆμ‹œ
SELECT COUNT(1) FROM count_test -> count_test의 개수
SELECT COUNT(0) FROM count_test -> count_test의 개수
SELECT COUNT(NULL) FROM count_test -> 0
SELECT COUNT(value) FROM count_test -> NULL이 μ•„λ‹Œκ²ƒμ˜ 개수
SELECT COUNT(DISTINCT value) FROM count_test -> NULL이 μ•„λ‹ˆκ³  쀑볡을 μ œκ±°ν•œ 개수

πŸ“Œ WHERE

  • IN
    • WHERE channel in ('Google', 'Youtube')
      -> WHERE channel = 'Google' OR channel = 'Youtube'
    • NOT IN
  • LIKE and ILIKE
    • LIKEλŠ” λŒ€μ†Œλ¬Έμžλ₯Ό κ΅¬λΆ„ν•˜λ©° λ¬Έμžμ—΄μ„ μ°ΎλŠ” 것 이닀.
    • ILIKEλŠ” λŒ€μ†Œλ¬Έμžλ₯Ό κ΅¬λΆ„ν•˜μ§€ μ•ŠμœΌλ©΄μ„œ λ¬Έμžμ—΄μ„ μ°ΎλŠ” 것이닀.
    • WHERE channel LIKE 'G%' -> 'G*'
    • WHERE channel LIKE '%o%' -> '* o *'
    • NOT LIKE or NOT ILIKE
  • BETWEEN
    • λ‚ μ§œ λ²”μœ„μ— μΌμΉ˜ν•˜λŠ” 것을 찾을 λ•Œ μ‚¬μš©ν•œλ‹€.
  • CASE WHEN μ‚¬μ΄μ—μ„œλ„ μ‚¬μš© κ°€λŠ₯ν•˜λ‹€.

πŸ“Œ IN & LIKE/ILIKE

SELECT COUNT(1)
FROM raw_data.user_session_channel
WHERE channel in ('Google', 'Facebook');
SELECT DISTINCT channel
FROM FROM raw_data.user_session_channel
WHERE channel ILIKE '%o%';
SELECT COUNT(1)
FROM FROM raw_data.user_session_channel
WHERE channel ILIKE 'Google' or channel ILIKE 'Facebook';
SELECT DISTINCT channel
FROM FROM raw_data.user_session_channel
WHERE channel NOT ILIKE '%o%';

πŸ“Œ STRING Function

  • LEFT(str, N)
  • REPLACE(str, exp1, exp2)
  • UPPER(str)
  • LOWER(str)
  • LEN(str)
  • LPAD, RPAD
  • SUBSTRING
SELECT
	LEN(channel),
    UPPER(channel),
    LOWER(channel),
    LEFT(channel, 4),
FROM raw_data.user_session_channel;

πŸ“Œ ORDER BY

  • κΈ°λ³Έ μˆœμ„œλŠ” μ˜€λ¦„μ°¨μˆœμ΄λ‹€.
    -> ORDER BY 1 ASC
  • λ‚΄λ¦Όμ°¨μˆœμ€ DESCλ₯Ό μ‚¬μš©ν•œλ‹€.
    -> ORDER BY 1 DESC
  • μ—¬λŸ¬ μ—΄λ‘œ μ •λ ¬ν•  λ•ŒλŠ”
    -> ORDER BY 1 DESC,2,3
  • NULL의 μˆœμ„œ
    • NULL값듀은 μ˜€λ¦„μ°¨μˆœ 인 경우 λ§ˆμ§€λ§‰
    • NULL값듀은 λ‚΄λ¦Όμ°¨μˆœ 인 경우 처음
    • 이λ₯Ό λ°”κΎΈκ³ μž ν•˜λ©΄ NULLS FIRST or NULLS LAST μ‚¬μš©

πŸ“Œ Type Casting


0개의 λŒ“κΈ€