JOIN

JOIN์ด๋ž€?


SQL JOIN์€ ๋‘ ๊ฐœ ํ˜น์€ ๊ทธ ์ด์ƒ์˜ ํ…Œ์ด๋ธ”๋“ค์„ ๊ณตํ†ต ํ•„๋“œ๋ฅผ ๊ฐ€์ง€๊ณ  mergeํ•˜๋Š” ๊ฒƒ.
์ด๋Š” star schema๋กœ ๊ตฌ์„ฑ๋œ ํ…Œ์ด๋ธ”๋“ค๋กœ ๋ถ„์‚ฐ๋˜์–ด ์žˆ๋˜ ์ •๋ณด๋ฅผ ํ†ตํ•ฉํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

์™ผ์ชฝ ํ…Œ์ด๋ธ”์„ LEFT, ์˜ค๋ฅธ์ชฝ ํ…Œ์ด๋ธ”์„ RIGHT๋ผ๊ณ  ํ•˜๋ฉด JOIN์˜ ๊ฒฐ๊ณผ๋Š” ์–‘์ชฝ์˜ ํ•„๋“œ๋ฅผ ๋ชจ๋‘ ๊ฐ€์ง„ ์ƒˆ๋กœ์šด ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ค์–ด๋‚ด๊ฒŒ ๋˜๋Š”๋ฐ, JOIN ๋ฐฉ์‹์— ๋”ฐ๋ผ ๋‹ค์Œ ๋‘ ๊ฐ€์ง€๊ฐ€ ๋‹ฌ๋ผ์ง„๋‹ค.

  1. ์–ด๋–ค ๋ ˆ์ฝ”๋“œ๋“ค์ด ์„ ํƒ๋˜๋Š”์ง€?
  2. ์–ด๋–ค ํ•„๋“œ๋“ค์ด ์ฑ„์›Œ์ง€๋Š”์ง€?
    JOIN์ด๋ฏธ์ง€
  • join๋ฌธ๋ฒ•
    join๋ฌธ๋ฒ•

  • JOIN์‹œ ๊ณ ๋ คํ•ด์•ผ ํ•  ์ 
    - ๋จผ์ € ์ค‘๋ณต ๋ ˆ์ฝ”๋“œ๊ฐ€ ์—†๊ณ  Primary key uniqueness๊ฐ€ ๋ณด์žฅ๋จ์„ ์ฒดํฌํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”

    -์กฐ์ธํ•˜๋Š” ํ…Œ์ด๋ธ”๋“ค๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋ช…ํ™•ํ•˜๊ฒŒ ์ •์˜

    -One to one (์™„์ „ํ•œ one to one : user_session)channel & session_timestamp or ํ•œ์ชฝ์ด ๋ถ€๋ถ„์ง‘ํ•ฉ์ด ๋˜๋Š” one to one : user_session_channel & session_transaction)

    -One to many: order vs order_items
    ์ด๊ฒฝ์šฐ ์ค‘๋ณต์ด ๋” ํฐ ๋ฌธ์ œ๋จ

    -Many to one
    ๋ฐฉํ–ฅ๋งŒ ๋ฐ”๊พธ๋ฉด one to many์™€ ๋™์ผ

    -Many to many
    ์ด๋Ÿฐ ๊ฒฝ์šฐ๋Š” ๋งŽ์ง€ ์•Š์œผ๋ฉฐ ์ด๋Š” one to one ์ด๋‚˜ one to many๋กœ ๋ฐ”๊พธ๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋ฉด ๋ณ€ํ™˜ํ•˜์—ฌ ์กฐ์ธํ•˜๋Š” ๊ฒƒ์ด ๋œ ์œ„ํ—˜

    -์–ด๋А ํ…Œ์ด์„ ๋ฒ ์ด์Šค๋กœ ์žก์„ ์ง€(From์— ์‹œ์šฉํ•  ์ง€ ๊ฒฐ์ •ํ•ด์•ผ ํ•จ)

๋‹ค์–‘ํ•œ JOIN์˜ ์ข…๋ฅ˜


JOIN์˜ ์ข…๋ฅ˜

  • INNER JOIN
  • LEFT JOIN
  • RIGHT JOIN
  • FULL OUTER JOIN
  • SELF JOIN
  • CROSS JOIN

์˜ˆ์ œ ํ…Œ์ด๋ธ”

  • INNER JOIN
    1.์–‘์ชฝ ํ…Œ์ด๋ธ”์—์„œ ๋งค์น˜๊ฐ€ ๋˜๋Š” ๋ ˆ์ฝ”๋“œ๋“ค๋งŒ ๋ฆฌํ„ดํ•จ
    2.์–‘์ชฝ ํ…Œ์ด๋ธ”์˜ ํ•„๋“œ๊ฐ€ ๋ชจ๋‘ ์ฑ„์›Œ์ง„ ์ƒํƒœ๋กœ ๋ฆฌํ„ด๋จ
SELECT * FROM raw_data.Vital v
JOIN raw_data.Alert a ON v.vitalID = a.vitalID;

  • LEFT JOIN
  1. ์™ผ์ชฝ ํ…Œ์ด๋ธ”(Base)์˜ ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ๋“ค์„ ๋ฆฌํ„ดํ•จ
  2. ์˜ค๋ฅธ์ชฝ ํ…Œ์ด๋ธ”์˜ ํ•„๋“œ๋Š” ์™ผ์กฑ ๋ ˆ์ฝ”๋“œ์™€ ๋งค์นญ๋˜๋Š” ๊ฒฝ์šฐ์—๋งŒ ์ฑ„์›Œ์ง„ ์ƒํƒœ๋กœ ๋ฆฌํ„ด๋จ
SELECT * FROM raw_data.Vital v
LEFT JOIN raw_data.Alert a ON v.vitalID = a.vitalID;

  • FULL JOIN
  1. ์™ผ์ชฝ ํ…Œ์ด๋ธ”๊ณผ ์˜ค๋ฅธ์ชฝ ํ…Œ์ด๋ธ”์˜ ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ๋“ค์„ ๋ฆฌํ„ดํ•จ
  2. ๋งค์นญ๋˜๋Š” ๊ฒฝ์šฐ์—๋งŒ ์–‘์ชฝ ํ…Œ์ด๋ธ”๋“ค์˜ ๋ชจ๋“  ํ•„๋“œ๋“ค์ด ์ฑ„์›Œ์ง„ ์ƒํƒœ๋กœ ๋ฆฌํ„ด๋จ
SELECT * FROM raw_data.Vital v
FULL JOIN raw_data.Alert a ON v.vitalID = a.vitalID;

  • CROSS JOIN
    ์™ผ์ชฝ ํ…Œ์ด๋ธ”๊ณผ ์˜ค๋ฅธ์ชฝ ํ…Œ์ด๋ธ”์˜ ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ๋“ค์˜ ์กฐํ•ฉ์„ ๋ฆฌํ„ด
SELECT * FROM raw_data.Vital v CROSS JOIN raw_data.Alert a;

  • SELF JOIN
    ๋™์ผํ•œ ํ…Œ์ด๋ธ”์„ alias๋ฅผ ๋‹ฌ๋ฆฌํ•ด์„œ ์ž๊ธฐ ์ž์‹ ๊ณผ ์กฐ์ธํ•จ
SELECT * FROM raw_data.Vital v1
JOIN raw_data.Vital v2 ON v1.vitalID = v2.vitalID;

BOOLEAN ํƒ€์ž… ์ฒ˜๋ฆฌ


flag is True์™€ flag is not False๊ฐ€ ๋™์ผํ•œ ํ‘œํ˜„์ธ์ง€ ์ƒ๊ฐํ•ด๋ณด์ž. ์•„๋ž˜์™€ ๊ฐ™์€ ๊ฒฝ์šฐ๋ฅผ ์‚ดํŽด๋ณด๋ฉด ์‰ฝ๊ฒŒ ์ฐจ์ด๋ฅผ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

SELECT
    COUNT(CASE WHEN flag = True THEN 1 END) true_cnt1,
    COUNT(CASE WHEN flag is True THEN 1 END) true_cnt2,
    COUNT(CASE WHEN flag is not False THEN 1 END) not_false_cnt
FROM raw_data.boolean_test;

NULL ๋น„๊ต

NULL ๋น„๊ต๋Š” ํ•ญ์ƒ IS ํ˜น์€ IS NOT์œผ๋กœ ์ˆ˜ํ–‰
= ํ˜น์€ != ํ˜น์€ <>์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋ฉด ์ž˜๋ชป๋œ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ด

์˜ˆ์ œ) ์ฑ„๋„๋ณ„ ์›”๋ณ„ ๋งค์ถœ์•ก ํ…Œ์ด๋ธ” ๋งŒ๋“ค๊ธฐ


์•„๋ž˜์™€ ๊ฐ™์€ ํ•„๋“œ๋กœ ๊ตฌ์„ฑ๋œ ์ฑ„๋„๋ณ„ ์›”๋ณ„ ๋งค์ถœ์•ก ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ค์–ด๋ณด์ž.

  • month
  • channel
  • uniqueUsers (์ด ๋ฐฉ๋ฌธ ์‚ฌ์šฉ์ž)
  • paidUsers (๊ตฌ๋งค ์‚ฌ์šฉ์ž: refundํ•œ ๊ฒฝ์šฐ๋„ ํŒ๋งค๋กœ ๊ณ ๋ ค)
  • conversionRate (๊ตฌ๋งค ์‚ฌ์šฉ์ž/ ์ด ๋ฐฉ๋ฌธ ์‚ฌ์šฉ์ž)
  • grossRevenue (refund ํฌํ•จ)
  • netRevenue (refund ์ œ์™ธ)

(1) ๋จผ์ € ์ฑ„๋„๋ณ„ ์›”๋ณ„ ์œ ์ผํ•œ ์‚ฌ์šฉ์ž ์ˆ˜๋ถ€ํ„ฐ ์„ธ๋ณด์ž

SELECT
    LEFT(ts, 7) "month",
    usc.channel,
    COUNT(DISTINCT userid) uniqueUsers
FROM raw_data.user_session_channel usc
JOIN raw_data.session_timestamp t ON t.sessionid = usc.sessionid
GROUP BY 1, 2
ORDER BY 1, 2

(2) ๋งค์ถœ์•ก์„ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด session_transaction ํ…Œ์ด๋ธ”์„ ์กฐ์ธํ•˜์ž

3๊ฐœ ํ…Œ์ด๋ธ” ๋ชจ๋‘ sessionid๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์กฐ์ธ์„ ํ•ด์•ผํ•œ๋‹ค. user_session_channel๊ณผ session_timestamp๋Š” ์ผ๋Œ€์ผ๋กœ ์กฐ์ธ ๊ฐ€๋Šฅ(INNER JOIN)ํ•˜์ง€๋งŒ session_transaction์˜ ๊ฒฝ์šฐ์—๋Š” ๋ชจ๋“  sessionid๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค. -> LEFT JOIN

SELECT
    LEFT(ts, 7) "month",
    usc.channel,
    COUNT(DISTINCT userid) uniqueUsers
FROM raw_data.user_session_channel usc
JOIN raw_data.session_timestamp t ON t.sessionid = usc.sessionid
LEFT JOIN raw_data.session_transaction st ON st.sessionid = usc.sessionid
GROUP BY 1, 2
ORDER BY 1, 2

(3) paidUsers๋ฅผ ์ถ”๊ฐ€ํ•ด๋ณด์ž

SELECT
    LEFT(ts, 7) "month",
    usc.channel,
    COUNT(DISTINCT userid) uniqueUsers,
    COUNT(DISTINCT CASE WHEN amount > 0 THEN usc.userid END) paidUsers,
FROM raw_data.user_session_channel usc
JOIN raw_data.session_timestamp t ON t.sessionid = usc.sessionid
LEFT JOIN raw_data.session_transaction st ON st.sessionid = usc.sessionid
GROUP BY 1, 2
ORDER BY 1, 2

(4) ๋ฐฉ๋ฌธ ์œ ์ € ์ค‘ ๋ช‡ %๊ฐ€ ์‹ค์ œ ๊ตฌ๋งค๋ฅผ ํ–ˆ๋Š”์ง€ conversionRate๋ฅผ ๊ณ„์‚ฐํ•ด๋ณด์ž

  1. paidUsers/uniqueUsers AS comversionRate
    ์ •์ˆ˜ํ˜• ๋‚˜๋ˆ„๊ธฐ ์ •์ˆ˜ํ˜•์ด๋ผ ์‹ค์ˆ˜ํ˜•์ด ์•ˆ๋‚˜์˜ด

  2. paidUser::float/uniqueUsers AS coversionRate
    ์‹ค์ˆ˜ํ˜•์œผ๋กœ ๋‚˜์˜ค์ง€๋งŒ 0.1 ,0.4๋“ฑ๋“ฑ ํผ์„ผํŠธ๋กœ ๋‚˜์˜ค์ง€ ์•Š์Œ

  3. ROUND(paidUsers*100.0/uniqueUsers, 2)AS conversionRate
    ํผ์„ผํŠธ๋กœ ์ถœ๋ ฅํ•˜๊ธฐ์œ„ํ•œ ๋ช…๋ น์–ดbut uniqueUsers๊ฐ€ 0์ด๋ฉด ์—๋Ÿฌ๊ฐ€ ๋‚จ

  4. ROUND(paidUsers*100.0/NULLIF(uniqueUsers,0),2) AS conversionRate
    uniqueUsers์˜ ๊ฐ’์ด 0์ด๋ฉด NULL์„์‚ฌ์šฉํ•ด๋ผ

SELECT
    LEFT(ts, 7) "month",
    usc.channel,
    COUNT(DISTINCT userid) uniqueUsers,
    COUNT(DISTINCT CASE WHEN amount > 0 THEN usc.userid END) paidUsers,
    ROUND(paidUsers*100.0/NULLIF(uniqueUsers, 0), 2) AS conversionRate 
    -- uniqueUsers๊ฐ€ 0์ผ ๊ฒฝ์šฐ ๋ฐœ์ƒํ•˜๋Š” ์—๋Ÿฌ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด NULLIF ์‚ฌ์šฉ
FROM raw_data.user_session_channel usc
JOIN raw_data.session_timestamp t ON t.sessionid = usc.sessionid
LEFT JOIN raw_data.session_transaction st ON st.sessionid = usc.sessionid
GROUP BY 1, 2
ORDER BY 1, 2

*NULLIF

  • paidUsers/uniqueUsers
    0์œผ๋กœ ๋‚˜๋ˆ„๋Š”๊ฒฝ์šฐ divide by 0 ์—๋Ÿฌ ๋ฐœ์ƒ
    ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด NULLIF๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 0์„ NULL๋กœ ๋ณ€๊ฒฝ
    ์‚ฌ์น™์—ฐ์‚ฐ์— NULL์ด๋“ค์–ด๊ฐ€๋ฉด ๊ฒฐ๊ณผ๋„ NULL์ด ๋จ

*COALESCE

  • NULL๊ฐ’์„ ๋‹ค๋ฅธ ๊ฐ’์œผ๋กœ ๋ฐ”๊ฟ”์ฃผ๋Š” ํ•จ์ˆ˜
    ex)COALESCE(exp1,exp2,exp3,..)
    exp1๋ถ€ํ„ฐ ์ธ์ž๋ฅผ ํ•˜๋‚˜์”ฉ ์‚ดํŽด์„œ NULL์ด ์•„๋‹Œ ๊ฐ’์ด ๋‚˜์˜ค๋ฉด ๊ทธ๊ฑธ๋ฆฌํ„ด
    ๋๊นŒ์ง€ ๊ฐ”๋Š”๋ฐ๋„ ๋ชจ๋‘ NULL์ด๋ฉด ์ตœ์ข…์ ์œผ๋กœ NULL์„ ๋ฆฌํ„ด
SELECT
    value,
    COALESCE(value,0) -- value๊ฐ€ NULL์ด๋ฉด 0์„ ๋ฆฌํ„ด
FROM raw_data.count_test;

(5) grossRevenue, netRevenue๋ฅผ ์ถ”๊ฐ€ํ•˜์ž

SELECT
    LEFT(ts, 7) "month",
    usc.channel,
    COUNT(DISTINCT userid) uniqueUsers,
    COUNT(DISTINCT CASE WHEN amount > 0 THEN usc.userid END) paidUsers,
    ROUND(paidUsers*100.0/NULLIF(uniqueUsers, 0), 2) AS conversionRate 
    SUM(amount) grossRevenue
    SUM(CASE WHEN refunded is False THEN amount END) netRevenue
FROM raw_data.user_session_channel usc
JOIN raw_data.session_timestamp t ON t.sessionid = usc.sessionid
LEFT JOIN raw_data.session_transaction st ON st.sessionid = usc.sessionid
GROUP BY 1, 2
ORDER BY 1, 2

profile
๊ฒŒ์„๋ €๋˜ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๊ณต๋ถ€

0๊ฐœ์˜ ๋Œ“๊ธ€