SQL JOIN์ ๋ ๊ฐ ํน์ ๊ทธ ์ด์์ ํ ์ด๋ธ๋ค์ ๊ณตํต ํ๋๋ฅผ ๊ฐ์ง๊ณ mergeํ๋ ๊ฒ.
์ด๋ star schema๋ก ๊ตฌ์ฑ๋ ํ ์ด๋ธ๋ค๋ก ๋ถ์ฐ๋์ด ์๋ ์ ๋ณด๋ฅผ ํตํฉํ๋๋ฐ ์ฌ์ฉ๋๋ค.
์ผ์ชฝ ํ ์ด๋ธ์ LEFT, ์ค๋ฅธ์ชฝ ํ ์ด๋ธ์ RIGHT๋ผ๊ณ ํ๋ฉด JOIN์ ๊ฒฐ๊ณผ๋ ์์ชฝ์ ํ๋๋ฅผ ๋ชจ๋ ๊ฐ์ง ์๋ก์ด ํ ์ด๋ธ์ ๋ง๋ค์ด๋ด๊ฒ ๋๋๋ฐ, JOIN ๋ฐฉ์์ ๋ฐ๋ผ ๋ค์ ๋ ๊ฐ์ง๊ฐ ๋ฌ๋ผ์ง๋ค.

join๋ฌธ๋ฒ

JOIN์ ๊ณ ๋ คํด์ผ ํ ์
- ๋จผ์ ์ค๋ณต ๋ ์ฝ๋๊ฐ ์๊ณ Primary key uniqueness๊ฐ ๋ณด์ฅ๋จ์ ์ฒดํฌํ๋ ๊ฒ์ด ์ค์
-์กฐ์ธํ๋ ํ ์ด๋ธ๋ค๊ฐ์ ๊ด๊ณ๋ฅผ ๋ช ํํ๊ฒ ์ ์
-One to one (์์ ํ one to one : user_session)channel & session_timestamp or ํ์ชฝ์ด ๋ถ๋ถ์งํฉ์ด ๋๋ one to one : user_session_channel & session_transaction)
-One to many: order vs order_items
์ด๊ฒฝ์ฐ ์ค๋ณต์ด ๋ ํฐ ๋ฌธ์ ๋จ
-Many to one
๋ฐฉํฅ๋ง ๋ฐ๊พธ๋ฉด one to many์ ๋์ผ
-Many to many
์ด๋ฐ ๊ฒฝ์ฐ๋ ๋ง์ง ์์ผ๋ฉฐ ์ด๋ one to one ์ด๋ one to many๋ก ๋ฐ๊พธ๋ ๊ฒ์ด ๊ฐ๋ฅํ๋ค๋ฉด ๋ณํํ์ฌ ์กฐ์ธํ๋ ๊ฒ์ด ๋ ์ํ
-์ด๋ ํ ์ด์ ๋ฒ ์ด์ค๋ก ์ก์ ์ง(From์ ์์ฉํ ์ง ๊ฒฐ์ ํด์ผ ํจ)
์์ ํ
์ด๋ธ

SELECT * FROM raw_data.Vital v
JOIN raw_data.Alert a ON v.vitalID = a.vitalID;

SELECT * FROM raw_data.Vital v
LEFT JOIN raw_data.Alert a ON v.vitalID = a.vitalID;

SELECT * FROM raw_data.Vital v
FULL JOIN raw_data.Alert a ON v.vitalID = a.vitalID;

SELECT * FROM raw_data.Vital v CROSS JOIN raw_data.Alert a;

SELECT * FROM raw_data.Vital v1
JOIN raw_data.Vital v2 ON v1.vitalID = v2.vitalID;

flag is True์ flag is not False๊ฐ ๋์ผํ ํํ์ธ์ง ์๊ฐํด๋ณด์. ์๋์ ๊ฐ์ ๊ฒฝ์ฐ๋ฅผ ์ดํด๋ณด๋ฉด ์ฝ๊ฒ ์ฐจ์ด๋ฅผ ์ ์ ์๋ค.

SELECT
COUNT(CASE WHEN flag = True THEN 1 END) true_cnt1,
COUNT(CASE WHEN flag is True THEN 1 END) true_cnt2,
COUNT(CASE WHEN flag is not False THEN 1 END) not_false_cnt
FROM raw_data.boolean_test;
NULL ๋น๊ต๋ ํญ์ IS ํน์ IS NOT์ผ๋ก ์ํ
= ํน์ != ํน์ <>์ผ๋ก ์ํํ๋ฉด ์๋ชป๋ ๊ฒฐ๊ณผ๊ฐ ๋์ด
์๋์ ๊ฐ์ ํ๋๋ก ๊ตฌ์ฑ๋ ์ฑ๋๋ณ ์๋ณ ๋งค์ถ์ก ํ
์ด๋ธ์ ๋ง๋ค์ด๋ณด์.

(1) ๋จผ์ ์ฑ๋๋ณ ์๋ณ ์ ์ผํ ์ฌ์ฉ์ ์๋ถํฐ ์ธ๋ณด์
SELECT
LEFT(ts, 7) "month",
usc.channel,
COUNT(DISTINCT userid) uniqueUsers
FROM raw_data.user_session_channel usc
JOIN raw_data.session_timestamp t ON t.sessionid = usc.sessionid
GROUP BY 1, 2
ORDER BY 1, 2
(2) ๋งค์ถ์ก์ ๊ตฌํ๊ธฐ ์ํด session_transaction ํ ์ด๋ธ์ ์กฐ์ธํ์
3๊ฐ ํ ์ด๋ธ ๋ชจ๋ sessionid๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ์กฐ์ธ์ ํด์ผํ๋ค. user_session_channel๊ณผ session_timestamp๋ ์ผ๋์ผ๋ก ์กฐ์ธ ๊ฐ๋ฅ(INNER JOIN)ํ์ง๋ง session_transaction์ ๊ฒฝ์ฐ์๋ ๋ชจ๋ sessionid๊ฐ ์กด์ฌํ์ง ์๋๋ค. -> LEFT JOIN
SELECT
LEFT(ts, 7) "month",
usc.channel,
COUNT(DISTINCT userid) uniqueUsers
FROM raw_data.user_session_channel usc
JOIN raw_data.session_timestamp t ON t.sessionid = usc.sessionid
LEFT JOIN raw_data.session_transaction st ON st.sessionid = usc.sessionid
GROUP BY 1, 2
ORDER BY 1, 2
(3) paidUsers๋ฅผ ์ถ๊ฐํด๋ณด์
SELECT
LEFT(ts, 7) "month",
usc.channel,
COUNT(DISTINCT userid) uniqueUsers,
COUNT(DISTINCT CASE WHEN amount > 0 THEN usc.userid END) paidUsers,
FROM raw_data.user_session_channel usc
JOIN raw_data.session_timestamp t ON t.sessionid = usc.sessionid
LEFT JOIN raw_data.session_transaction st ON st.sessionid = usc.sessionid
GROUP BY 1, 2
ORDER BY 1, 2
(4) ๋ฐฉ๋ฌธ ์ ์ ์ค ๋ช %๊ฐ ์ค์ ๊ตฌ๋งค๋ฅผ ํ๋์ง conversionRate๋ฅผ ๊ณ์ฐํด๋ณด์
paidUsers/uniqueUsers AS comversionRate
์ ์ํ ๋๋๊ธฐ ์ ์ํ์ด๋ผ ์ค์ํ์ด ์๋์ด
paidUser::float/uniqueUsers AS coversionRate
์ค์ํ์ผ๋ก ๋์ค์ง๋ง 0.1 ,0.4๋ฑ๋ฑ ํผ์ผํธ๋ก ๋์ค์ง ์์
ROUND(paidUsers*100.0/uniqueUsers, 2)AS conversionRate
ํผ์ผํธ๋ก ์ถ๋ ฅํ๊ธฐ์ํ ๋ช
๋ น์ดbut uniqueUsers๊ฐ 0์ด๋ฉด ์๋ฌ๊ฐ ๋จ
ROUND(paidUsers*100.0/NULLIF(uniqueUsers,0),2) AS conversionRate
uniqueUsers์ ๊ฐ์ด 0์ด๋ฉด NULL์์ฌ์ฉํด๋ผ
SELECT
LEFT(ts, 7) "month",
usc.channel,
COUNT(DISTINCT userid) uniqueUsers,
COUNT(DISTINCT CASE WHEN amount > 0 THEN usc.userid END) paidUsers,
ROUND(paidUsers*100.0/NULLIF(uniqueUsers, 0), 2) AS conversionRate
-- uniqueUsers๊ฐ 0์ผ ๊ฒฝ์ฐ ๋ฐ์ํ๋ ์๋ฌ๋ฅผ ์ฒ๋ฆฌํ๊ธฐ ์ํด NULLIF ์ฌ์ฉ
FROM raw_data.user_session_channel usc
JOIN raw_data.session_timestamp t ON t.sessionid = usc.sessionid
LEFT JOIN raw_data.session_transaction st ON st.sessionid = usc.sessionid
GROUP BY 1, 2
ORDER BY 1, 2
*NULLIF
*COALESCE
SELECT
value,
COALESCE(value,0) -- value๊ฐ NULL์ด๋ฉด 0์ ๋ฆฌํด
FROM raw_data.count_test;
(5) grossRevenue, netRevenue๋ฅผ ์ถ๊ฐํ์
SELECT
LEFT(ts, 7) "month",
usc.channel,
COUNT(DISTINCT userid) uniqueUsers,
COUNT(DISTINCT CASE WHEN amount > 0 THEN usc.userid END) paidUsers,
ROUND(paidUsers*100.0/NULLIF(uniqueUsers, 0), 2) AS conversionRate
SUM(amount) grossRevenue
SUM(CASE WHEN refunded is False THEN amount END) netRevenue
FROM raw_data.user_session_channel usc
JOIN raw_data.session_timestamp t ON t.sessionid = usc.sessionid
LEFT JOIN raw_data.session_transaction st ON st.sessionid = usc.sessionid
GROUP BY 1, 2
ORDER BY 1, 2