๐Ÿ“’ SQL(4)

Kimdongkiยท2024๋…„ 4์›” 25์ผ

DB

๋ชฉ๋ก ๋ณด๊ธฐ
7/33

๐Ÿ“Œ JOIN

SQL์—์„œ JOIN์ด๋ž€ ๋‘ ๊ฐœ ๋˜๋Š” ๊ทธ ์ด์ƒ์˜ ํ…Œ์ด๋ธ”๋“ค ํ†ตํ•ฉํ•˜๋Š” ๊ฒƒ์„ ๋งํ•œ๋‹ค.
JOIN์€ 6๊ฐ€์ง€๊ฐ€ ์กด์žฌํ•˜๋Š”๋ฐ ๊ฐ๊ฐ INNER, FULL, CROSS, LEFT, RIGHT, SELF ๊ฐ€ ์žˆ๋‹ค.

๐Ÿ“™ JOIN

SELECT A.*, B.*
FROM raw_data.table1 A
JOIN raw_data.table2 B ON A.key = B.key;

๐Ÿ“Œ ์ฃผ์˜

  1. ์ค‘๋ณต ๋ ˆ์ฝ”๋“œ๊ฐ€ ์—†๊ณ  Primary Key์˜ uniqueness๊ฐ€ ๋ณด์žฅ๋จ์„ ์ฒดํฌํ•ด์•ผ ํ•œ๋‹ค.
  2. JOINํ•˜๋Š” ํ…Œ์ด๋ธ”๋“ค ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋ช…ํ™•ํ•˜๊ฒŒ ์ •์˜ํ•œ๋‹ค.
    -> One to one, One to manay, Many to one, Many to Many
  3. ์–ด๋А ํ…Œ์ด๋ธ”์„ ๋ฒ ์ด์Šค๋กœ ํ• ์ง€ ๊ฒฐ์ •ํ•ด์•ผํ•œ๋‹ค. -> FROM์— ์ž‘์„ฑํ•  ํ…Œ์ด๋ธ”

๐Ÿ“Œ INNER JOIN

  • ๋‘ ํ…Œ์ด๋ธ”์—์„œ ๋งค์น˜๊ฐ€ ๋˜๋Š” ๋ ˆ์ฝ”๋“œ๋“ค๋งŒ ๋ฆฌํ„ดํ•œ๋‹ค.
  • ๋‚ด์šฉ์ด ๋ชจ๋‘ ์ฑ„์›Œ์ง„ ์ƒํƒœ์ด๋‹ค.
SELECT *
FROM raw_data.User A
JOIN raw_data.product B ON A.userid = B.userid;
A.useridA.productidA.weightB.useridB.typeB.date
10011010901001stick2024-04-25

๐Ÿ“Œ LEFT JOIN

  • ์™ผ์ชฝ ํ…Œ์ด๋ธ”(Base)์˜ ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ๋“ค์„ ๋ฆฌํ„ดํ•œ๋‹ค.
  • ์˜ค๋ฅธ์ชฝ ํ…Œ์ด๋ธ”์˜ ํ•„๋“œ๋Š” ์™ผ์ชฝ ๋ ˆ์ฝ”๋“œ์™€ ๋งค์นญ๋˜๋Š” ๊ฒฝ์šฐ๋งŒ ์ฑ„์›Œ์„œ ๋ฆฌํ„ดํ•œ๋‹ค. -> ์ด์™ธ์—๋Š” NULL
SELECT *
FROM raw_data.User A
LEFT JOIN raw_data.product B ON A.userid = B.userid;
A.useridA.productidA.weightB.useridB.typeB.date
1001101090NULLNULLNULL
1002201070NULLNULLNULL
10031020121003case2024-04-25
1004301040NULLNULLNULL

๐Ÿ“Œ FULL JOIN

  • ์™ผ์ชฝ ํ…Œ์ด๋ธ”๊ณผ ์˜ค๋ฅธ์ชฝ ํ…Œ์ด๋ธ”์˜ ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ๋ฅผ ๋ฆฌํ„ดํ•œ๋‹ค.
  • ๋งค์นญ ๋˜๋Š” ๊ฒฝ์šฐ์—๋งŒ ์ฑ„์šฐ๊ณ  ์ด์™ธ์—๋Š” NULL๋กœ ๋ฆฌํ„ดํ•œ๋‹ค.
SELECT *
FROM raw_data.User A
FULL JOIN raw_data.product B ON A.userid = B.userid;
A.useridA.productidA.weightB.useridB.typeB.date
1001101090NULLNULLNULL
1002201070NULLpadNULL
10031020121003case2024-04-25
NULLNULLNULLNULLcandy2024-04-22

๐Ÿ“Œ CROSS JOIN

  • ์™ผ์ชฝ ํ…Œ์ด๋ธ”๊ณผ ์˜ค๋ฅธ์ชฝ ํ…Œ์ด๋ธ”์˜ ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ๋“ค์˜ ์กฐํ•ฉ์„ ๋ฆฌํ„ดํ•œ๋‹ค.
SELECT *
FROM raw_data.User A
CROSS JOIN raw_data.product B ON A.userid = B.userid;
A.useridA.productidA.weightB.useridB.typeB.date
10011010901001case2024-04-24
10012010701001pad2024-05-22
10021020121002case2024-04-25
10031020121003case2024-04-22

๐Ÿ“Œ SELF JOIN

  • ๋™์ผํ•œ ํ…Œ์ด๋ธ”์„ alias๋ฅผ ๋‹ฌ๋ฆฌํ•˜์—ฌ ์ž๊ธฐ ์ž์‹ ๊ณผ JOINํ•œ๋‹ค.
SELECT *
FROM raw_data.User A1
JOIN raw_data.User A2 ON A1.userid = A2.userid;
A1.useridA1.productidA1.weightA2.useridA2.productidA2.weight
10011010901001101090

๐Ÿ“Œ BOOLEAN

  • TRUE or FALSE
  • flag = True
  • flas is True

๐Ÿ“Œ NULL

  • IS or IS NOT์œผ๋กœ ์‚ฌ์šฉํ•ด์•ผ ํ•œ๋‹ค.
  • =, !=, <, > ์œผ๋กœ ์—ฐ์‚ฐํ•˜๋ฉด ์ž˜๋ชป๋œ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜จ๋‹ค.

๐Ÿ“Œ COALESCE

  • NULL๊ฐ’์„ ๋‹ค๋ฅธ ๊ฐ’์œผ๋กœ ๋ฐ”๊ฟ”์ฃผ๋Š” ํ•จ์ˆ˜
    -> NULL๋Œ€์‹  ๋‹ค๋ฅธ ๋ฐฑ์—…๊ฐ’์œผ๋กœ ๋ฆฌํ„ด
SELECT
	value,
    COALESCE(value, 0) -> value๊ฐ€ NULL์ด๋ฉด 0์„ ๋ฆฌํ„ด
FROM raw_data.count_test;

๐Ÿ“Œ ์‹ค์Šต

๐Ÿ“Œ ์‚ฌ์šฉ์ž๋ณ„๋กœ ์ฒ˜์Œ ์ฑ„๋„๊ณผ ๋งˆ์ง€๋ง‰ ์ฑ„๋„ ์•Œ์•„๋‚ด๊ธฐ

  • ROW_NUMBER() OVER(PARTITION BY userid ORDER BY ts) seq ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๊ฐ€๋Šฅํ•˜๋‹ค.
%%sql

SELECT
    A.userid AS userid,
    A.channel AS First,
    B.channel AS LAST
FROM (
    SELECT
        userid,
        ts,
        channel,
        ROW_NUMBER() OVER(PARTITION BY userid ORDER BY ts) seq  -- ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌ๋œ ์ƒํƒœ
    FROM raw_data.user_session_channel usc
    JOIN raw_data.session_timestamp st ON usc.sessionid = st.sessionid
) A
JOIN (
    SELECT
        userid,
        ts,
        channel,
        ROW_NUMBER() OVER(PARTITION BY userid ORDER BY ts DESC) seq -- ์˜ค๋ฆ„์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌ๋œ ์ƒํƒœ
    FROM raw_data.user_session_channel usc
    JOIN raw_data.session_timestamp st ON usc.sessionid = st.sessionid
) B ON A.userid = B.userid and B.seq = 1
WHERE A.seq = 1;

OUTPUT ->

๐Ÿ“Œ Gross Revenue๊ฐ€ ๊ฐ€์žฅ ํฐ UserID 10๊ฐœ ์ฐพ๊ธฐ

user_session_channel, session_transaction, session_timestamp 3๊ฐœ ์‚ฌ์šฉํ•˜๋ผ๊ณ  ํ•˜์…จ๋Š”๋ฐ ๋‚ด ์ƒ๊ฐ์—๋Š” timestamp๋Š” ํ•„์š” ์—†์–ด๋ณด์ž„ Userid๋„ ์—†๊ณ  amount๋„ ์—†๋‹ค.

๐Ÿ“™ user_session_channel

%%sql
SELECT *
FROM raw_data.user_session_channel
LIMIT 2

  • Userid๊ฐ€ ์กด์žฌํ•จ
  • session_transaction ํ…Œ์ด๋ธ”์˜ sessionid์™€ ์—ฐ๊ฒฐํ•ด์„œ JOINํ•˜๋ฉด ๋ ๋“ฏ

๐Ÿ“™ session_transaction

%%sql
SELECT *
FROM raw_data.session_transaction
LIMIT 2

  • Gross Revenue์—์„œ ์ˆ˜์ต์ธ amount๊ฐ€ ์กด์žฌํ•จ
  • Refund ์—ฌ๋ถ€์™€ ์ƒ๊ด€ ์—†์œผ๋‹ˆ๊นŒ ๋‹ค ๋”ํ•˜๋ฉด ๋ ๋“ฏ

๐Ÿ“™ session_transaction

%%sql
SELECT *
FROM raw_data.session_timestamp
LIMIT 2

  • Userid ์ฐพ๊ธฐ์ธ๋ฐ Userid๊ฐ€ ์—†๋Š” ํ…Œ์ด๋ธ”์ž„
%%sql

SELECT
    userid,
    SUM(amount) AS Gross_Revenue
FROM raw_data.user_session_channel usc
JOIN raw_data.session_transaction st ON usc.sessionid = st.sessionid
GROUP BY userid
ORDER BY Gross_Revenue DESC -- ์ƒ์œ„ 10๊ฐœ๋‹ˆ๊นŒ ์ด ์ˆ˜์ต ๊ธฐ์ค€์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ
LIMIT 10

OUTPUT ->

๐Ÿ“Œ raw_data.nps ํ…Œ์ด๋ธ”์„ ๋ฐ”ํƒ•์œผ๋กœ ์›”๋ณ„ NPS ๊ณ„์‚ฐ

  • ๊ณ ๊ฐ๋“ค์ด 0(์˜ํ–ฅ ์—†์Œ) ์—์„œ 10(์˜ํ–ฅ ์•„์ฃผ ๋†’์Œ)
  • detractor(๋น„์ถ”์ฒœ์ž) : 0~6
  • passive(์†Œ๊ทน์ž) : 7~8
  • promoter(ํ™๋ณด์ž) : 9~10
  • NPS : promoter% - detractor%

๐Ÿ“™ nps

%%sql
SELECT *
FROM raw_data.nps
LIMIT 10

score๋ฅผ ์ด์šฉํ•˜๋ฉด ๋ ๊ฒƒ ๊ฐ™๋‹ค.

%%sql

SELECT
    LEFT(created_at, 7) AS Month,
    ROUND(SUM(
            CASE
                WHEN score >= 9 THEN 1
                WHEN score <= 6 THEN -1 
            END)::float * 100 / COUNT(1), 2
        ) AS NPS
FROM raw_data.nps
GROUP BY Month
ORDER BY Month

OUTPUT ->

0๊ฐœ์˜ ๋Œ“๊ธ€