[TIL#32 250404]

๊ฐ•๋ฏผ์ง€ยท2025๋…„ 4์›” 4์ผ

๋ฐ์ดํ„ฐ๋ถ„์„_TIL

๋ชฉ๋ก ๋ณด๊ธฐ
34/81

Daily plan

๐ŸŒž์˜ค์ „

- SQL ์ฝ”๋“œ์นดํƒ€
- 9์‹œ 30๋ถ„ QCC

๐Ÿ”ฅ ์˜คํ›„

- QCC ํ•ด์„ค
- ๋น…๋ถ„๊ธฐ D-1...........

๐ŸŒ ์ €๋…

- ์˜ค๋Š˜ ํ•˜๋ฃจ๋ฅผ ๋น…๋ถ„๊ธฐ์—๊ฒŒ ๋ฐ”์นฉ๋‹ˆ๋‹ค....

SQL ์ฝ”๋“œ์นดํƒ€

Q85 - Rising Temperature

select id
from(
    select id,
        temperature,
        lag(temperature) over(order by recordDate) yesterday_temp
    from weather
) a
where temperature>yesterday_temp

์ตœ๋Œ€ํ•œ ์„œ๋ธŒ์ฟผ๋ฆฌ๋ฅผ ์•ˆ ์“ฐ๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ํ’€์–ด๋‚ด๊ณ  ์‹ถ์—ˆ๋Š”๋ฐ,, ์ด ๋ฐฉ๋ฒ•๋ฐ–์— ๋– ์˜ค๋ฅด์ง€ ์•Š์•˜๋‹ค.
๊ทผ๋ฐ gptํ•œํ…Œ ๋ฌผ์–ด๋ดค๋”๋‹ˆ ์–˜๋„ ์„œ๋ธŒ์ฟผ๋ฆฌ ์“ฐ๋Š” ๊ฑฐ ๋ณด๋‹ˆ๊นŒ ๊ฑ ์ด๋ ‡๊ฒŒ ํ’€๋ฉด ๋  ๊ฑฐ ๊ฐ™๋‹คใ…‹

WITH TempWithYesterday AS (
    SELECT id, temperature, 
           LAG(temperature) OVER(ORDER BY recordDate) AS yesterday_temp
    FROM weather
)
SELECT id 
FROM TempWithYesterday
WHERE temperature > yesterday_temp;

์ด๊ฑด gpt๊ฐ€ ์ข‹์•„ํ•˜๋Š” CTE ๋ฐฉ์‹^^

CTE(๊ณตํ†ต ํ…Œ์ด๋ธ” ํ‘œํ˜„์‹) ์‚ฌ์šฉ ์‹œ ์žฅ์ 
- ๋ณต์žกํ•œ ์„œ๋ธŒ์ฟผ๋ฆฌ๋ฅผ ๋ณ„๋„๋กœ ๋ถ„๋ฆฌํ•˜์—ฌ ๊ฐ€๋…์„ฑ ํ–ฅ์ƒ
- ํฐ ํ…Œ์ด๋ธ”์—์„œ ์—ฌ๋Ÿฌ ๋ฒˆ ๊ฐ™์€ ์„œ๋ธŒ์ฟผ๋ฆฌ๋ฅผ ์‹คํ–‰ํ•  ๊ฒฝ์šฐ ์„ฑ๋Šฅ ์ตœ์ ํ™” ๊ฐ€๋Šฅ
- ๋‹ค๋ฅธ ์กฐ๊ฑด์ด ์ถ”๊ฐ€๋  ๋•Œ ์œ ์ง€๋ณด์ˆ˜์„ฑ์ด ๋†’์•„์ง


QCC

Q1 - ์ž„์ง์› ๋กœ๊ทธ์ธ ๋นˆ๋„ ๋ถ„์„

select unique_logins, 
    count(distinct employee_id) employee_count
from (select employee_id, 
    count(1) unique_logins
    from logins
    where login_result='SUCCESS'
      and date_format(login_time,'%Y%m%d') between '20230701' and '20230930'
    group by employee_id
  ) a
group by unique_logins
order by unique_logins

Q2 - ์„ธ๋ฒˆ์งธ๋กœ ๋†’์€ ๊ธ‰์—ฌ๋ฅผ ๋ฐ›๋Š” ์ง์›

select e.employee_id, e.name, e.salary
from employee_salary e
  join (select salary, rank() over(order by salary desc) as salary_rank 
        from employee_salary
        group by salary) r
  on e.salary = r.salary
where salary_rank=3
order by employee_id

ํŠœํ„ฐ๋‹˜์€ dense rank๋ฅผ ์ด์šฉํ•˜์…จ๋‹ค.
rank์™€ dense rank๋Š” ๋ชจ๋‘ ์ค‘๋ณต ์ˆœ์œ„๋ฅผ ๋™์ผ ์ˆœ์œ„๋กœ ํ‘œ์‹œํ•œ๋‹ค๋Š” ์ ์ด ๊ฐ™์ง€๋งŒ,
rank๋Š” ์ค‘๋ณต ์ˆœ์œ„ ๋‹ค์Œ์„ ์ค‘๋ณต ๊ฐœ์ˆ˜๋งŒํผ ๋–จ์–ด์ง„ ์ˆœ์œ„๋กœ ๋ถ€์—ฌํ•˜๊ณ 
dense rank๋Š” ์ค‘๋ณต ์ˆœ์œ„ ๋‹ค์Œ์— ์ค‘๋ณต ๊ฐœ์ˆ˜์™€ ์ƒ๊ด€์—†์ด ์ˆœ์ฐจ์ ์ธ ์ˆœ์œ„๋ฅผ ๋ถ€์—ฌํ•œ๋‹ค๋Š” ์ ์ด ๋‹ค๋ฅด๋‹ค.
์—ฌ๊ธฐ์„œ๋Š” salary๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ฐ™์€ salary์—๋Š” ๊ฐ™์€ ์ˆœ์œ„๋ฅผ ๋ถ€์—ฌํ•˜๊ณ , ์ค‘๋ณต์— ์ƒ๊ด€์—†์ด ์ˆœ์ฐจ์ ์ธ ์ˆœ์œ„๋ฅผ ๋ถ€์—ฌํ•˜๋ฉด ๋˜๋ฏ€๋กœ dense rank๊ฐ€ ๋” ์ ์ ˆํ•  ๊ฒƒ ๊ฐ™๋‹ค.
(์ด ๋ฌธ์ œ์˜ ๊ฒฝ์šฐ์—์„œ๋Š” salary ๊ธฐ์ค€์œผ๋กœ groupbyํ•ด์„œ ์ˆœ์œ„๋ฅผ ๋งค๊ธฐ๋Š” ๊ฒƒ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€์ผ ๊ฒƒ ๊ฐ™์Œ)

๊ทธ๋ฆฌ๊ณ  ํ™•์‹คํžˆ with๋ฌธ์„ ์“ฐ๋Š” ๊ฒŒ ๊น”๋”ํ•œ๋“ฏ!
...์ด๋ผ๊ณ  n๋งŒ๋ฒˆ์งธ ๋งํ•˜๋ฉด์„œ ์ ˆ๋Œ€ with๊ตฌ๋ฌธ ์•ˆ์“ฐ๋Š” ์‚ฌ๋žŒ...

with salary_ranked as(
	select employee_id, name, salary,
    		dense_rank() over (order by salary desc) as rnk
    from employee_salary
)
select employee_id, name, salary
from salary_ranked
where rnk=3
order by employee_id

Q3 - ๋ถ€์„œ ๊ฐ„ ๋ฉ”์‹œ์ง€ ๋น„์œจ ๊ณ„์‚ฐ

select round(count(case when send_dep<>rec_dep then 1 end)/count(1),1) inter_department_msg_pct
from(
  select *,
    (select department 
    from employees e 
    where e.employee_id=m.sender_id) as send_dep,
    (select department 
    from employees e 
    where e.employee_id=m.receiver_id) as rec_dep
  from messages m) a
where send_dep is not null and rec_dep is not null

๋„ˆ๋ฌด ์ง€์ €๋ถ„ํ•˜๊ฒŒ ํ‘ผ ๊ฑฐ ๊ฐ™์•„์„œ ๋ง˜์— ์•ˆ ๋“ฆ..
๋‹ค์‹œ ๋ด๋„ from๋ฌธ์—์„œ ์™œ ์ €๋ ‡๊ฒŒ ๋ณต์žกํ•˜๊ฒŒ ๋‚œ๋ฆฌ๋ฅผ ์ณ๋†จ๋Š”์ง€ ๋ชจ๋ฅด๊ฒ ๋‹ค.
๋ฌธ์ œ ํ’€๋‹ค๊ฐ€ ์–ด์ง€๋Ÿฌ์›Œ์ง„ ์ •์‹  ์ƒํƒœ๋ฅผ ์ฝ”๋“œ๋กœ ํ‘œํ˜„ํ•œ ๋А๋‚Œ..

select
  round(100.0 * SUM(case when e1.department != e2.department then 1 else 0 end) / count(*), 1) inter_department_msg_pct
FROM messages m
inner join employees e1 on m.sender_id = e1.employee_id
inner join employees e2 on m.receiver_id = e2.employee_id

์ด๋ ‡๊ฒŒ๋งŒ ํ•˜๋ฉด ํ›จ์”ฌ ๊ฐ„๋‹จํ•œ ๊ฒƒ์„...

Q4 - (๋„์ „) ๊ด‘๊ณ  ์„ฑ๊ณผ Attribution ๋ถ„์„

์–˜๋Š” ์‹œ๊ฐ„ ์•ˆ์— ๋ชป ํ’€์—ˆ๋‹ค...ใ…œ


Python Standard 2ํšŒ์ฐจ

- python์˜ window function(shift, rolling, expanding)์„ ์ดํ•ดํ•œ๋‹ค.
- ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ํŒŒ์ƒ ๋ณ€์ˆ˜(์ด๋™ ํ‰๊ท , ๋ˆ„์  ํ•ฉ) ์ƒ์„ฑ์„ ์‹ค์Šตํ•œ๋‹ค.
- ๋ฐ์ดํ„ฐ ํƒ€์ž…๋ณ„ ๋‹ค์–‘ํ•œ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ์‚ดํŽด๋ณด๊ณ  ์‹ค์Šตํ•œ๋‹ค.

Shift

  • ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์˜ ๋ฐ์ดํ„ฐ๋‚˜ ์ธ๋ฑ์Šค๋ฅผ ์›ํ•˜๋Š” ๊ธฐ๊ฐ„๋งŒํผ shiftํ•˜๋Š” ๋ฉ”์„œ๋“œ
  • ๋ฌธ๋ฒ•: df.shift(periods=๊ธฐ๊ฐ„, freq=None, axis=0, fill_value='๋น„์—ˆ์Œ')
    • periods: ์ด๋™ํ•  ๊ธฐ๊ฐ„ (์Œ์ˆ˜or์–‘์ˆ˜๋กœ ์ž…๋ ฅ)
    • freq: ์„ ํƒ ๋งค๊ฐœ๋ณ€์ˆ˜ (Y, M, D, H, T, S, Timestamp, 'Infer' ๋“ฑ์ด ์œ„์น˜)
    • fill_value: shift๋กœ ์ธํ•ด ์ƒ๊ธด ๊ฒฐ์ธก์น˜๋ฅผ ๋Œ€์ฒดํ•  ๊ฐ’ ์ง€์ •
    • axis: ์—ฐ์‚ฐํ•  ์ถ•๋ฐฉํ–ฅ ์„ค์ • (0:ํ–‰ / 1:์—ด)
df.shift(1)  # ๋’ค์˜ ๋‚ ์งœ ๋ฐ์ดํ„ฐ๋ฅผ ์•ž์œผ๋กœ ํ•œ ์นธ ๋•ก๊น€
df.shift(-1) # ์•ž์˜ ๋‚ ์งœ ๋ฐ์ดํ„ฐ๋ฅผ ๋’ค๋กœ ํ•œ ์นธ ๋•ก๊น€
df.shift(periods=3, freq='D')  # 3์ผ ์ด๋™
df.shift(periods=3, freq='infer') # df์˜ ๋‚ ์งœ๊ฐ„๊ฒฝ๋ฅด ๋ถ„์„ํ•ด์„œ ์ ๋‹นํ•œ ์ฃผ๊ธฐ๋ฅผ ์ด๋™

Rolling

  • ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋‚ด ์—ด์— ๋Œ€ํ•˜์—ฌ ์ผ์ • ํฌ๊ธฐ์˜ window(๋ฒ”์œ„)๋ฅผ ์ง€์ •ํ•˜๊ณ , ๊ทธ window์•ˆ์˜ ๊ฐ’์„ ์ถ”๊ฐ€ ์—ฐ์‚ฐ์œผ๋กœ ๊ณ„์‚ฐํ•˜๋Š” ๋ฉ”์„œ๋“œ
  • ๋ฌธ๋ฒ•: df.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, method='single')
    • window: ๊ณ„์‚ฐํ•  window์˜ ํฌ๊ธฐ (์—ด ๊ธฐ์ค€์œผ๋กœ ๊ณ„์‚ฐํ•  ๊ฒฝ์šฐ ํ–‰์˜ ์ˆ˜๋ฅผ ์˜๋ฏธ)
    • min_periods: ๊ณ„์‚ฐํ•  ์ตœ์†Œ ํฌ๊ธฐ(๊ธฐ๊ฐ„) (๊ธฐ๋ณธ์ ์œผ๋กœ window ํฌ๊ธฐ์™€ ๋™์ผํ•จ)
    • center: ๊ณ„์‚ฐ์„ ์ค‘๊ฐ„ ํ–‰์—์„œ ํ•  ์ง€ ๊ฒฐ์ •ํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ (๊ธฐ๋ณธ๊ฐ’์€ False/ True๋กœ ํ•  ๊ฒฝ์šฐ ์ค‘๊ฐ„ ํ–‰์„ ๊ธฐ์ค€์œผ๋กœ ๊ณ„์‚ฐํ•จ)
    • win_type: triang/ gaussian ๋“ฑ ๊ฐ€์ค‘์น˜๋ฅผ ๋„ฃ์–ด ๊ณ„์‚ฐํ•  ๊ฒฝ์šฐ ๊ณ„์‚ฐ ๋ฐฉ์‹
    • on: ์‹œ๊ณ„์—ด ์ธ๋ฑ์Šค๋‚˜ ์‹œ๊ณ„์—ด๊ณผ ์œ ์‚ฌํ•œ ์—ด์ด ์žˆ์„ ๊ฒฝ์šฐ ์ด ์—ด์„ ๊ธฐ์ค€์œผ๋กœ rolling์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Œ
    • axis: ์—ฐ์‚ฐํ•  ์ถ• ๋ฐฉํ–ฅ ์„ค์ • (0:ํ–‰ / 1:์—ด)
    • closed: ์—ฐ์‚ฐ์ด ๋‹ซํžˆ๋Š” ๋ฐฉํ–ฅ ์„ค์ • (left, right, both, neither)
    • method:{'single'/'table'} numba๋ฅผ ์ด์šฉํ•ด ํ…Œ์ด๋ธ” ๊ณ„์‚ฐ์„ ์ง„ํ–‰ํ•˜์—ฌ ์†๋„๋ฅผ ๋†’์ผ์ง€์˜ ์—ฌ๋ถ€ (ํ˜„์žฌ 'single'๋งŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅ)
df.rolling(window=3).mean()  # 3์ผ ์ด๋™ ํ‰๊ท  ๊ตฌํ•˜๊ธฐ
df.rolling(window=3).sum()  # 3์ผ ๋ˆ„์ ํ•ฉ ๊ตฌํ•˜๊ธฐ
df.rolling(window=3, center=True, closed='left').mean() 
  # 3์ผ ์ด๋™ํ‰๊ท ์„ ์ค‘๊ฐ„ ํ–‰ ๊ธฐ์ค€์œผ๋กœ ๊ณ„์‚ฐํ•˜๊ณ , ์™ผ์ชฝ ๊ฐ’์„ ํฌํ•จํ•˜์—ฌ ๊ณ„์‚ฐ

Expanding

  • ํ–‰์ด๋‚˜ ์—ด์˜ ๊ฐ’์— ๋Œ€ํ•ด ๋ˆ„์ ์œผ๋กœ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฉ”์„œ๋“œ
  • ๋ฌธ๋ฒ•: df.expanding(min_periods=1, axis=0, method='single').์ถ”๊ฐ€๋ฉ”์„œ๋“œ()
    • min_periods: ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•  ์š”์†Œ์˜ ์ตœ์†Œ ๊ฐฏ์ˆ˜๋กœ, ์ด๋ณด๋‹ค ์ž‘์œผ๋ฉด NaN ์ถœ๋ ฅ
    • axis: ์—ฐ์‚ฐํ•  ์ถ• ๋ฐฉํ–ฅ ์„ค์ • (0:ํ–‰/1:์—ด)
    • method: ์—ฐ์‚ฐ๋ฐฉ์‹
      • single: ์—ฐ์‚ฐ์„ ํ•œ ์ค„์”ฉ ์ˆ˜ํ–‰
      • table: ์ „์ฒด ํ…Œ์ด๋ธ”์— ๋Œ€ํ•ด์„œ ๋กค๋ง ์ˆ˜ํ–‰
      • ๊ธฐ๋ณธ๊ฐ’์€ single์ด๋ฉฐ, ๋กค๋ง ์—ฐ์‚ฐํ•  ๊ฒฝ์šฐ numba ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ถ”๊ฐ€๋กœ import ํ•ด์•ผ ํ•จ

๋ฐ์ดํ„ฐ ํƒ€์ž…๋ณ„ ์ƒ๊ด€๊ด€๊ณ„

์—ฐ์†ํ˜•-์—ฐ์†ํ˜•

  • Pearson correlation (ํ”ผ์–ด์Šจ ์ƒ๊ด€๊ณ„์ˆ˜)
    • ๊ฐ€์žฅ ์ผ๋ฐ˜์ ์ธ ํ˜•ํƒœ์˜ ์ƒ๊ด€๊ด€๊ณ„ ๋ถ„์„
    • -1~+1 ์‚ฌ์ด์˜ ๊ฒฐ๊ณผ๊ฐ’์„ ๊ฐ€์ง€๋ฉฐ, ์ ˆ๋Œ€๊ฐ’์ด 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ๋†’์€ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๊ฐ€์ง
    • df.corr(method='pearson')

์—ฐ์†ํ˜•-๋ฒ”์ฃผํ˜•(์ด๋ถ„ํ˜•)

  • Point-Biserial Correlation
    • ์ด๋ถ„ํ˜• ๋ณ€์ˆ˜๋Š” '๋„ค/์•„๋‹ˆ์˜ค'์™€ ๊ฐ™์ด 2๊ฐ€์ง€๋กœ ๋‚˜๋‰˜๋Š” ๋ฐฉ์‹์„ ์˜๋ฏธํ•˜๋Š”๋ฐ, ์ด๋ฅผ 0๊ณผ 1๋กœ ์ฝ”๋”ฉํ•œ ํ›„ ํ”ผ์–ด์Šจ ์ƒ๊ด€๊ณ„์ˆ˜๋กœ ๊ณ„์‚ฐํ•œ ๊ฒƒ์„ point-biserial correlation์ด๋ผ๊ณ  ํ•จ
    • -1~+1 ์‚ฌ์ด์˜ ๊ฒฐ๊ณผ๊ฐ’์„ ๊ฐ€์ง€๋ฉฐ, ์ ˆ๋Œ€๊ฐ’์ด 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ๋†’์€ ์ƒ๊ด€๊ด€๊ณ„
    • python scipy ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ pointbiserialr ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ณ„์‚ฐ ๊ฐ€๋Šฅ

์—ฐ์†ํ˜•-๋ฒ”์ฃผํ˜•(3๊ฐœ ์ด์ƒ)

  • Polyserial correlation
    • ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜๊ฐ€ ์ˆœ์„œํ˜•์ธ ๊ฒฝ์šฐ์— ์ ํ•ฉ
    • ์‹ค์ œ ํ˜„์—…์—์„œ ๊ฐ ๋ณ€์ˆ˜๋“ค์˜ ๊ด€๊ณ„๋ฅผ ๋ชจ๋‘ ์„ค๋ช…ํ•˜๊ธฐ ์–ด๋ ต๋‹ค๋Š” ํ•œ๊ณ„์ ์„ ๊ฐ€์ง
    • ํ˜„์žฌ python์—์„œ ์ง€์› X
    • -1~+1 ์‚ฌ์ด์˜ ๊ฒฐ๊ณผ๊ฐ’์„ ๊ฐ€์ง€๋ฉฐ, ์ ˆ๋Œ€๊ฐ’์ด 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ๋†’์€ ์ƒ๊ด€๊ด€๊ณ„
  • ANOVA๊ฒ€์ • (๋ถ„์‚ฐ ๋ถ„์„)
    • ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜๊ฐ€ ๋ช…๋ชฉํ˜•์ธ ๊ฒฝ์šฐ์— ์ ํ•ฉ
    • ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜์— ๋”ฐ๋ผ ์—ฐ์†ํ˜• ๋ณ€์ˆ˜์˜ ํ‰๊ท ์— ์ฐจ์ด๊ฐ€ ์žˆ๋Š”์ง€ ๊ฒ€์ •
    • python scipy ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ f_onewayํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ์ง„ํ–‰ ๊ฐ€๋Šฅ
    • F-Statistic๊ณผ p-value๋ฅผ ๋„์ถœํ•  ์ˆ˜ ์žˆ์Œ
      • F<1: ๋ฌด์˜๋ฏธ / 1<=F<3: ๊ฑฐ์˜ ๋ฌด์˜๋ฏธ / 3<=F<10: ๊ฒฝ์šฐ์— ๋”ฐ๋ผ ์œ ์˜๋ฏธ / 10<=F<50: ์œ ์˜๋ฏธ(๊ฐ•ํ•œ ์ฐจ์ด) / F>50: ๊ฑฐ์˜ ํ™•์‹คํ•œ ์œ ์˜๋ฏธ(์•„์ฃผ ๊ฐ•ํ•œ ์ฐจ์ด) / F>=100: ํ™•์‹คํ•œ ์œ ์˜๋ฏธ
  • ์—ฐ์†ํ˜• ๋ณ€์ˆ˜์™€ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜์— ๋Œ€ํ•œ ์ƒ๊ด€๊ด€๊ณ„ ํ•ด์„ ์‹œ์—๋Š” ๋‹จ์ˆœ ํ†ต๊ณ„๋Ÿ‰๋ฟ๋งŒ ์•„๋‹ˆ๋ผ p-value๋ฅผ ํ•จ๊ป˜ ํ™•์ธํ•ด์•ผ ํ•จ

๋ฒ”์ฃผํ˜•-๋ฒ”์ฃผํ˜•

  • Phi(ฯ†) coefficient (ํŒŒ์ด ์ƒ๊ด€๊ณ„์ˆ˜)
    • ํŒŒ์ด ์ƒ๊ด€๊ณ„์ˆ˜๋Š” ๋‘ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜๋ฅผ 0๊ณผ 1๋กœ ์ฝ”๋”ฉํ•œ ํ›„ ํ”ผ์–ด์Šจ ์ƒ๊ด€๊ณ„์ˆ˜๋ฅผ ๊ตฌํ•˜๋Š” ๋ฐฉ์‹
    • ๊ฐ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜๊ฐ€ 2๊ฐ€์ง€ ๊ฐ’์„ ๊ฐ€์ง€๋Š” ๊ฒฝ์šฐ์—๋งŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
    • ๋‘ ๋ณ€์ˆ˜ ๋ชจ๋‘ ์ˆœ์„œ๊ฐ€ ์—†๋Š” ๋ช…๋ชฉํ˜•์ด๋ฉด์„œ ์ด๋ถ„ํ˜•์ด์–ด์•ผ ํ•œ๋‹ค๋Š” ํ•œ๊ณ„์ 
  • Cramer's V
    • ๋‘ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜ ์ค‘ ํ•˜๋‚˜์˜ ๋ณ€์ˆ˜๊ฐ€ 3๊ฐœ ์ด์ƒ์˜ ๋ฒ”์œ„๋ฅผ ๊ฐ€์ง€๋Š” ๊ฒฝ์šฐ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
    • ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜ ๊ฐ’๋“ค์„ LabelEncodingํ•˜๊ณ  ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•œ ๋’ค ํ˜ผ๋™ํ–‰๋ ฌ์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ์‹
    • Cramer's V ๊ณ„์ˆ˜๋Š” 0๊ณผ 1์‚ฌ์ด์˜ ๊ฐ’์„ ๊ฐ€์ง€๋ฉฐ 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ๊ฐ•ํ•œ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋‚˜ํƒ€๋ƒ„
    • python scipy ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ sklearn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ ๊ตฌํ˜„ ๊ฐ€๋Šฅ

์ผ๊ธฐ

๋“œ๋””์–ด ์˜ค๋Š˜ ๋Œ€์šฉ๋‹˜ ์›ƒ๊ธฐ๊ธฐ ์„ฑ๊ณตํ–ˆ๋‹ค,, ๋ฟŒ๋“ฏํ•จ max
๊ณต๋ถ€๋‚˜ ์‹œํ—˜์—๋Š” ์Šน๋ถ€์š•์ด ์ „ํ˜€ ์—†๋Š”๋ฐ ์ด์ƒํ•˜๊ฒŒ ์ด๋Ÿฐ๊ฑฐ์—” ๋ชฉ์ˆจ๊ฑธ๊ฒŒ ๋˜๋Š” ์ด์œ ๊ฐ€ ๋ญ˜๊นŒ..

ํ•œ๋™์•ˆ SQL ์† ๋†“๊ณ  ์žˆ์—ˆ๋”๋‹ˆ ๊ฐ ๋‹ค ์žƒ์—ˆ๋‹ค. ๋‹ค์‹œ ์ฝ”๋“œ์นดํƒ€ ์—ด์‹ฌํžˆ ํ’€์ž! ใ… .ใ… 

์ด๋ฒˆ์ฃผ์— ํ–‡๋น›์„ ๋„ˆ๋ฌด ์•ˆ๋ด์„œ ๊ทธ๋Ÿฐ์ง€ ์ •์‹  ์ƒํƒœ๊ฐ€ ์ข€ ์ด์ƒํ–ˆ๋‹ค. ์ž ๋„ ์Ÿ์•„์ง€๊ณ  ๊ธ€์ž๋ฅผ ์ฝ์–ด๋„ ์•ˆ ์ฝํžˆ๊ณ ,, ํ•˜๋ฃจ์— 10๋ถ„์ด๋ผ๋„ ๋‚˜๊ฐ€์„œ ์‚ฐ์ฑ…์„ ํ•˜๋˜์ง€ ํ–‡๋น›์„ ๋ณด๊ณ  ์™€์•ผ๊ฒ ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ์ด์ œ ์ง„์งœ ์ผ์ฐ ์ž๊ธฐ!!!
11์‹œ ๋˜๊ธฐ๋„ ์ „์— ์ž๋˜ ์‹œ์ ˆ์—๋Š” ์ •์‹ ์ด ์ •๋ง ๊ฑด๊ฐ•ํ•œ ๊ฒŒ ๋А๊ปด์กŒ์—ˆ๋Š”๋ฐ ์š”์ฆ˜ ์ƒˆ๋ฒฝ 2~3์‹œ์— ์ž๋Š” ๊ฒŒ ์Šต๊ด€์ด ๋˜๋”๋‹ˆ ๋ฉ˜ํƒˆ์ด ๋ถˆ์•ˆ์ •ํ•ด์ง€๊ณ  ์žˆ๋Š” ๊ฑฐ ๊ฐ™์Œ,,

๋‚ด์ผ ๋น…๋ถ„๊ธฐ ์‹œํ—˜์ธ๋ฐ ๊ธฐ๋Œ€๊ฐ€ ์—†์–ด์„œ ๋ถˆ์•ˆํ•˜์ง€๋„ ์•Š๋‹คใ…‹
ADsP๋•Œ๋„ ๋”ฑ ์ด๋Ÿฐ ๊ธฐ๋ถ„์ด์—ˆ๋˜ ๊ฑฐ ๊ฐ™์€๋ฐ,, ๊ทธ๋•Œ์ฒ˜๋Ÿผ ํ•˜๋Š˜์ด ๋‚ด ํŽธ์ด ๋“ค์–ด์ฃผ๊ธธ ๋ฐ”๋ผ์ง€๋งŒ ๋„ˆ๋ฌด ์–‘์‹ฌ์—†๋Š” ๊ฑฐ ๊ฐ™์•„์„œ ๊ฑ ํ•ดํƒˆํ•จใ…Ž ๊ทธ๋ž˜๋„ ๋๊นŒ์ง€ ํ™”์ดํ‹ฐ์ด์ž‰!

  • ์ฃผ๋ง์— ํŒŒ์ด์ฌ ์Šคํƒ ๋‹ค๋“œ ์‹ค์Šต ๋ณต์Šตํ•˜๊ณ 
  • ๋ฐ€๋ฆฐ ํ†ต๊ณ„ ๊ฐ•์˜ ๋“ค์–ด์•ผ์ง€....^^

0๊ฐœ์˜ ๋Œ“๊ธ€