๐Ÿ’  AIchemist 6th Session | ํšŒ๊ท€

yellowsubmarine372ยท2023๋…„ 11์›” 6์ผ

AIchemist

๋ชฉ๋ก ๋ณด๊ธฐ
8/14
post-thumbnail

01. ํšŒ๊ท€ ์†Œ๊ฐœ

ํšŒ๊ท€ ๋ฐ์ดํ„ฐ ๊ฐ’์ด ํ‰๊ท ๊ณผ ๊ฐ™์€ ์ผ์ •ํ•œ ๊ฐ’์œผ๋กœ ๋Œ์•„๊ฐ€๋ ค๋Š” ๊ฒฝํ–ฅ์„ ์ด์šฉํ•œ ํ†ต๊ณ„ํ•™ ๊ธฐ๋ฒ•

Y = W1*X+W2*X+W3*X ... +Wn*X
Y๋Š” ์ข…์† ๋ณ€์ˆ˜, X๋Š” ๋…๋ฆฝ ๋ณ€์ˆ˜, W๋Š” ํšŒ๊ท€ ๊ณ„์ˆ˜
๋…๋ฆฝ ๋ณ€์ˆ˜๋Š” ํ”ผ์ฒ˜, ์ข…์†๋ณ€์ˆ˜๋Š” ๊ฒฐ์ •๊ฐ’

๋จธ์‹  ๋Ÿฌ๋‹ ํšŒ๊ท€ ์˜ˆ์ธก์˜ ํ•ต์‹ฌ์€ ์ฃผ์–ด์ง„ ํ”ผ์ฒ˜์™€ ๊ฒฐ์ •๊ฐ’ ๋ฐ์ดํ„ฐ์— ๊ธฐ๋ฐ˜์—์„œ ํ•™์Šต์„ ํ†ตํ•ด ์ตœ์ ์˜ ํšŒ๊ท€ ๊ณ„์ˆ˜๋ฅผ ์ฐพ์•„๋‚ด๋Š” ๊ฒƒ

์ง€๋„ํ•™์Šต์€ ๋ถ„๋ฅ˜์™€ ํšŒ๊ท€๋กœ ๋‚˜๋‰จ. ๋ถ„๋ฅ˜๋Š” ์˜ˆ์ธก๊ฐ’์ด ์นดํ† ๊ณ ๋ฆฌ์™€ ๊ฐ™์€ ์ด์‚ฐํ˜• ํด๋ž˜์Šค ๊ฐ’์ด๊ณ , ํšŒ๊ท€๋Š” ์—ฐ์†ํ˜• ์ˆซ์ž ๊ฐ’

์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ ์ข…๋ฅ˜

  • ์ผ๋ฐ˜ ์„ ํ˜• ํšŒ๊ท€
    ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ ๊ฐ’์˜ RSS๋ฅผ ์ตœ์†Œํ™” ํ•  ์ˆ˜ ์žˆ๋„๋ก ํšŒ๊ท€ ๊ณ„์ˆ˜๋ฅผ ์ตœ์ ํ™”ํ•˜๋ฉฐ, ๊ทœ์ œ๋ฅผ ์ ์šฉํ•˜์ง€ ์•Š์€ ๋ชจ๋ธ
  • ๋ฆฟ์ง€
    ์„ ํ˜•ํšŒ๊ท€์— L2 ๊ทœ์ œ๋ฅผ ์ถ”๊ฐ€ํ•œ ํšŒ๊ท€ ๋ชจ๋ธ. ๋ฆฟ์ง€ ํšŒ๊ท€๋Š” L2๊ทœ์ œ๋ฅผ ์ ์šฉํ•˜๋Š”๋ฐ, L2 ๊ทœ์ œ๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ํฐ ํšŒ๊ท€ ๊ณ„์ˆ˜๊ฐ’์˜ ์˜ˆ์ธก์˜ํ–ฅ๋„๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ ํšŒ๊ท€ ๊ณ„์ˆ˜๊ฐ’์„ ๋” ์ž‘๊ฒŒ ๋งŒ๋“œ๋Š” ๊ทœ์ œ ๋ชจ๋ธ
  • ๋ผ์˜
    ๋ผ์˜ ํšŒ๊ท€๋Š” ์„ ํ˜•ํšŒ๊ท€์— L1 ๊ทœ์ œ๋ฅผ ์ ์šฉํ•œ ๋ฐฉ์‹. L2 ๊ทœ์ œ๊ฐ€ ํšŒ๊ท€ ๊ณ„์ˆ˜ ๊ฐ’์˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋Š” ๋ฐ˜ํ•ด L1 ๊ทœ์ œ๋Š” ์˜ˆ์ธก ์˜ํ–ฅ๋ ฅ์ด ์ž‘์€ ํ”ผ์ฒ˜์˜ ํšŒ๊ท€ ๊ณ„์ˆ˜๋ฅผ 0์œผ๋กœ ๋งŒ๋“ค์–ด ํšŒ๊ท€ ์˜ˆ์ธก ์‹œ ํ”ผ์ฒ˜๊ฐ€ ์„ ํƒ๋˜์ง€ ์•Š๊ฒŒ ํ•˜๋Š” ๊ฒƒ.
  • ์—˜๋ผ์Šคํ‹ฑ๋„ท
    L2, L1 ๊ทœ์ œ๋ฅผ ํ•จ๊ป˜ ๊ฒฐํ•ฉํ•œ ๋ชจ๋ธ
  • ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€
    ์‚ฌ์‹ค์€ ๋ถ„๋ฅ˜์— ์‚ฌ์šฉ๋˜๋Š” ์„ ํ˜•๋ชจ๋ธ๋กœ ๋งค์šฐ ๊ฐ•๋ ฅํ•œ ๋ถ„๋ฅ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜. ์ด์ง„ ๋ถ„๋ฅ˜๋ฟ๋งŒ์ด ์•„๋‹ˆ๋ผ ํฌ์†Œ ์˜์—ญ์˜ ๋ถ„๋ฅ˜, ํ…์ŠคํŠธ ๋ถ„๋ฅ˜์™€ ๊ฐ™์€ ์˜์—ญ์—์„œ ๋›ฐ์–ด๋‚œ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ๋ณด์ž„.

02. ๋‹จ์ˆœ ์„ ํ˜• ํšŒ๊ท€๋ฅผ ํ†ตํ•œ ํšŒ๊ท€ ์ดํ•ด

ํšŒ๊ท€ ๋ชจ๋ธ์„ 1์ฐจ ํ•จ์ˆ˜๋กœ ๋ชจ๋ธ๋ง ํ–ˆ๋‹ค๋ฉด ํ•จ์ˆ˜ ๊ฐ’์—์„œ ์‹ค์ œ ๊ฐ’๋งŒํผ์˜ ์˜ค๋ฅ˜๊ฐ’์„ ๋ณด์ •ํ•œ๋งŒํผ์ด ์‹ค์ œ ๊ฐ’.
์‹ค์ œ ๊ฐ’๊ณผ ํšŒ๊ท€ ๋ชจ๋ธ์˜ ์ฐจ์ด์— ๋”ฐ๋ฅธ ์˜ค๋ฅ˜ ๊ฐ’์„ ์ž”์ฐจ๋ผ ๋ถ€๋ฆ„. ์ตœ์ ์˜ ํšŒ๊ท€ ๋ชจ๋ธ์„ ๋งŒ๋“ ๋‹ค๋Š” ๊ฒƒ์€ ๋ฐ”๋กœ ์ „์ฒด ๋ฐ์ดํ„ฐ์˜ ์ž”์ฐจ ํ•ฉ์ด ์ตœ์†Œ๊ฐ€ ๋˜๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“ ๋‹ค๋Š” ์˜๋ฏธ

์˜ค๋ฅ˜ ๊ฐ’์„ ๋ณด์ •ํ•˜๋Š” ๋ฐฉ๋ฒ• 2๊ฐ€์ง€๋Š” Mean Absolute Error(์ ˆ๋Œ€๊ฐ’์„ ์ทจํ•ด์„œ ๋”ํ•˜๊ฑฐ๋‚˜), RSS(์˜ค๋ฅ˜ ๊ฐ’์˜ ์ œ๊ณฑ์„ ๊ตฌํ•ด์„œ ๋”ํ•˜๋Š” ๋ฐฉ์‹) . ์ผ๋ฐ˜์ ์œผ๋กœ RSS๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

RSS ๋น„์šฉ์ด๋ฉฐ w ๋ณ€์ˆ˜๋กœ ๊ตฌ์„ฑ๋˜๋Š” RSS๋ฅผ ๋น„์šฉ ํ•จ์ˆ˜
๋น„์šฉ ํ•จ์ˆ˜๊ฐ€ ๋ฐ˜ํ™˜ํ•˜๋Š” ๊ฐ’์„ ์ง€์†ํ•ด์„œ ๊ฐ์†Œ์‹œํ‚ค๊ณ  ์ตœ์ข…์ ์œผ๋กœ ๋” ์ด์ƒ ๊ฐ์†Œํ•˜์ง€ ์•Š๋Š” ์ตœ์†Œ์˜ ์˜ค๋ฅ˜ ๊ฐ’์„ ๊ตฌํ•˜๋Š” ๊ฒƒ

03. ๋น„์šฉ ์ตœ์†Œํ™”ํ•˜๊ธฐ - ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•(Gradient Descent)

๋ฐ˜๋ณต์ ์œผ๋กœ ๋น„์šฉ ํ•จ์ˆ˜์˜ ๋ฐ˜ํ™˜ ๊ฐ’, ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ ๊ฐ’์˜ ์ฐจ์ด๊ฐ€ ์ž‘์•„์ง€๋Š” ๋ฐฉํ–ฅ์„ฑ์„ ๊ฐ€์ง€๊ณ  W ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ง€์†ํ•ด์„œ ๋ณด์ •ํ•ด ๋‚˜๊ฐ
์˜ค๋ฅ˜ ๊ฐ’์ด ๋” ์ด์ƒ ์ž‘์•„์ง€์ง€ ์•Š์œผ๋ฉด ๊ทธ ์˜ค๋ฅ˜ ๊ฐ’์„ ์ตœ์†Œ ๋น„์šฉ์œผ๋กœ ํŒ๋‹จํ•˜๊ณ  ๊ทธ๋•Œ์˜ W ๊ฐ’์„ ์ตœ์  ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๋ฐ˜ํ™˜ํ•จ.

  1. w1, w0๋ฅผ ์ž„์˜์˜ ๊ฐ’์œผ๋กœ ์„ค์ •ํ•˜๊ณ  ์ฒซ ๋น„์šฉ ํ•จ์ˆ˜์˜ ๊ฐ’์„ ๊ณ„์‚ฐ
  2. w1์„ w1+ํ•™์Šต๊ฐ’, w0์„ w0+ํ•™์Šต๊ฐ’์œผ๋กœ ์—…๋ฐ์ดํŠธ ํ•œ ํ›„ ๋‹ค์‹œ ๋น„์šฉํ•จ์ˆ˜์˜ ๊ฐ’์„ ๊ณ„์‚ฐ
  3. ๋น„์šฉ ํ•จ์ˆ˜๊ฐ€ ๊ฐ์†Œํ•˜๋Š” ๋ฐฉํ–ฅ์„ฑ์œผ๋กœ ์ฃผ์–ด์ง„ ํšŸ์ˆ˜๋งŒํผ Step 2๋ฅผ ๋ฐ˜๋ณตํ•˜๋ฉด์„œ w1, w0๋ฅผ ๊ณ„์† ์—…๋ฐ์ดํŠธ

(ํ•™์Šต๊ฐ’)

  1. ๋‹จ์ˆœ ์„ ํ˜•ํšŒ๊ท€ ์˜ˆ์ธก ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
def get_cost(y, y_pred):
    N = len(y)
    cost = np.sum(np.square(y-y_pred))/N
    return cost

w1, w0 = gradient_descent_steps(X, y, iters=1000)
print("w1:{0:.3f} w0:{1:.3f}".format(w1[0,0], w0[0,0]))
y_pred = w1[0,0]*X + w0
print('Gradient Descent Total Cost:{0:.4f}'.format(get_cost(y, y_pred)))

  1. ๋น„์šฉํ•จ์ˆ˜ ์ •์˜
def get_cost(y, y_pred):
    N = len(y)
    cost = np.sum(np.square(y-y_pred))/N
    return cost

์ƒˆ๋กœ์šด w1๊ณผ w0์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ ์šฉํ•˜๋ฉด์„œ w1๊ณผ w0์„ ์—…๋ฐ์ดํŠธ, ndarray์ด๋ฏ€๋กœ ํ–‰๋ ฌ ์ˆ˜์ค€์—์„œ ๋‚ด์  ์—ฐ์‚ฐ ํ•„์š”

# ์ž…๋ ฅ ์ธ์ž iters๋กœ ์ฃผ์–ด์ง„ ํšŸ์ˆ˜๋งŒํผ ๋ฐ˜๋ณต์ ์œผ๋กœ w1๊ณผ w0๋ฅผ ์—…๋ฐ์ดํŠธ ์ ์šฉํ•จ
def gradient_descent_steps(X, y, iters=10000):
    # w0์™€ w1์„ ๋ชจ๋‘ 0์œผ๋กœ ์ดˆ๊ธฐํ™”
    w0 = np.zeros((1,1))
    w1 = np.zeros((1,1))
    
    #์ธ์ž๋กœ ์ฃผ์–ด์ง„ iters ๋งŒํผ ๋ฐ˜๋ณต์ ์œผ๋กœ get_weight_updates() ํ˜ธ์ถœํ•ด w1, w0 ์—…๋ฐ์ดํŠธ ์ˆ˜ํ–‰
    for ind in range(iters):
        w1_update, w0_update = get_weight_updates(w1, w0, X, y, learning_rate=0.01) 
        w1 = w1 - w1_update
        w0 = w0 - w0_update
        
    return w1, w0
  • ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•

์ผ๋ฐ˜ ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์€ ์ˆ˜ํ–‰์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฐ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๊ธฐ ๋Œ€๋ฌธ์— ์‹ค์ „์—์„œ๋Š” ๋Œ€๋ถ€๋ถ„ ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ• ์‚ฌ์šฉ. ์ผ๋ถ€ ๋ฐ์ดํ„ฐ๋งŒ ์ด์šฉํ•ด w๊ฐ€ ์—…๋ฐ์ดํŠธ๋˜๋Š” ๊ฐ’์„ ๊ณ„์‚ฐ
์ „์ฒด X, y ๋ฐ์ดํ„ฐ์—์„œ ๋žœ๋คํ•˜๊ฒŒ batch_size๋งŒํผ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœ


def stochastic_gradient_descent_steps(X, y, batch_size=10, iters=1000):
    w0 = np.zeros((1,1))
    w1 = np.zeros((1,1))
    
    for ind in range(iters):
        np.random.seed(ind)
        #์ „์ฒด X,y ๋ฐ์ดํ„ฐ์—์„œ ๋žœ๋คํ•˜๊ฒŒ batch_size๋งŒํผ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•ด sample_X, sample_y๋กœ ์ €์žฅ
        stochastic_random_index = np.random.permutation(X.shape[0])
        sample_X = X[stochastic_random_index[0:batch_size]]
        sample_y = y[stochastic_random_index[0:batch_size]]
        #๋žœ๋คํ•˜๊ฒŒ batch_size๋งŒํผ ์ถ”์ถœ๋œ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜์œผ๋กœ w1_ypdate, w0_update ๊ณ„์‚ฐ ํ›„ ์—…๋ฐ์ดํŠธ
        w1_update, w0_update = get_weight_updates(w1, w0, sample_X, sample_y, learning_rate=0.01)
        w1= w1 - w1_update
        w0= w0 - w0_update
        
    return w1, w0
  • ํ”ผ์ฒ˜๊ฐ€ ์—ฌ๋Ÿฌ๊ฐœ์ธ ๊ฒฝ์šฐ

ํ”ผ์ฒ˜๊ฐ€ 1๊ฐœ์ธ ๊ฒฝ์šฐ๋ฅผ ํ™•์žฅํ•˜๊ฒŒ ์œ ์‚ฌํ•˜๊ฒŒ ๋„์ถœ
ํ”ผ์ฒ˜๊ฐ€ M๊ฐœ๋ผ๋ฉด ๊ทธ์— ๋”ฐ๋ฅธ ํšŒ๊ท€ ๊ณ„์ˆ˜๋„ M+1๊ฐœ๋กœ ๋„์ถœ(1๊ฐœ๋Š” w_0)

์˜ˆ์ธก ํšŒ๊ท€์‹(1๋ฒˆ์งธ ์ค„)์„ ์˜ˆ์ธก ํ–‰๋ ฌ (2๋ฒˆ์งธ ์ค„)๋กœ ๊ตฌํ•  ์ˆ˜ ์žˆ์Œ

04. ์‚ฌ์ดํ‚ท๋Ÿฐ LinearRegression์„ ์ด์šฉํ•œ ๋ณด์Šคํ„ด ์ฃผํƒ ๊ฐ€๊ฒฉ ์˜ˆ์ธก

LinearRegression ํด๋ž˜์Šค - Ordinary Least Squares

์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’์˜ RSS๋ฅผ ์ตœ์†Œํ™”ํ•ด OLS ์ถ”์ • ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„ํ•œ ํด๋ž˜์Šค

ํšŒ๊ท€ ํ‰๊ฐ€ ์ง€ํ‘œ

์‚ฌ์ดํ‚ท๋Ÿฐ์€ RMSE๋ฅผ ์ง€์›ํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ RMSE๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” MSE์— ์ œ๊ณฑ๊ทผ์„ ์”Œ์›Œ์„œ ๊ณ„์‚ฐํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ์ง์ ‘ ๋งŒ๋“ค์–ด์•ผ ํ•จ.

  • MAE์˜ scoring ํŒŒ๋ผ๋ฏธํ„ฐ neg_mean_absolute_error
    ์‚ฌ์ดํ‚ท๋Ÿฐ์˜ Scoring ํ•จ์ˆ˜๊ฐ€ score ๊ฐ’์ด ํด์ˆ˜๋ก ์ข‹์€ ํ‰๊ฐ€ ๊ฒฐ๊ณผ๋กœ ์ž๋™ ํ‰๊ฐ€. ๋”ฐ๋ผ์„œ negative ๊ฐ’์œผ๋กœ ๋ณ€ํ™˜ํ•ด ์‚ฌ์ดํ‚ท๋Ÿฐ์— ์ž…๋ ฅ.๋งˆ์ง€๋ง‰์— ๋„์ถœ ์‹œ์—๋Š” ๋‹ค์‹œ Positive ๊ฐ’์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ์ž‘์—… ํ•„์š”

LinearRegression์„ ์ด์šฉํ•ด ๋ณด์Šคํ„ด ์ฃผํƒ ๊ฐ€๊ฒฉ ํšŒ๊ท€ ๊ตฌํ˜„

โš ๏ธ ๋ณด์Šคํ„ด ๋ฐ์ดํ„ฐ์…‹์˜ ์œค๋ฆฌ์ ์ธ ๋ฌธ์ œ๋กœ load_boston() ์ ‘๊ทผ ๋ถˆ๊ฐ€
(๊ฐ ํ”ผ์ฒ˜์˜ ์„ ํƒ ๊ธฐ์ค€๋„ ๋น„๋„๋•์ ์ด๋ฉฐ ๊ฐ ํ”ผ์ฒ˜์™€ ์ฃผํƒ ๊ฐ€๊ฒฉ์˜ ๊ด€๊ณ„๋ฅผ ์„ ํ˜•์„ฑ์œผ๋กœ ์ ‘๊ทผํ•˜๋Š” ๊ฒƒ๋„ ์˜ฌ๋ฐ”๋ฅด์ง€ ๋ชปํ•จ.)

์˜ˆ์ œ ์ฝ”๋“œ ์‹คํ–‰X ๊ฒฐ๊ณผ๋งŒ ์ฐธ๊ณ 
  • ์ „์ฒ˜๋ฆฌ
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from scipy import stats
from sklearn.datasets import load_boston
import warnings
warnings.filterwarnings('ignore')  #์‚ฌ์ดํ‚ท๋Ÿฐ 1.2 ๋ถ€ํ„ฐ๋Š” ๋ณด์Šคํ„ด ์ฃผํƒ๊ฐ€๊ฒฉ ๋ฐ์ดํ„ฐ๊ฐ€ ์—†์–ด์ง„๋‹ค๋Š” warning ๋ฉ”์‹œ์ง€ ์ถœ๋ ฅ ์ œ๊ฑฐ
%matplotlib inline

# boston ๋ฐ์ดํƒ€์…‹ ๋กœ๋“œ
boston = load_boston()

# boston ๋ฐ์ดํƒ€์…‹ DataFrame ๋ณ€ํ™˜ 
bostonDF = pd.DataFrame(boston.data , columns = boston.feature_names)

# boston dataset์˜ target array๋Š” ์ฃผํƒ ๊ฐ€๊ฒฉ์ž„. ์ด๋ฅผ PRICE ์ปฌ๋Ÿผ์œผ๋กœ DataFrame์— ์ถ”๊ฐ€ํ•จ. 
bostonDF['PRICE'] = boston.target
print('Boston ๋ฐ์ดํƒ€์…‹ ํฌ๊ธฐ :',bostonDF.shape)
bostonDF.head()
  • ์นผ๋Ÿผ(ํ”ผ์ฒ˜)์˜ ์˜ํ–ฅ๋„

์นผ๋Ÿผ๊ณผ PRICE์˜ ์ƒ๊ด€๊ด€๊ณ„ ํŒŒ์•…

RM์€ ์–‘ ๋ฐฉํ–ฅ์˜ ์„ ํ˜•์„ฑ์ด ๊ฐ€์žฅ ํฌ๊ณ  LSAT๋Š” ์Œ ๋ฐฉํ–ฅ์˜ ์„ ํ˜•์„ฑ์ด ๊ฐ€์žฅ ํผ.

  • LinearRegression

LinearRegression ํด๋ž˜์Šค๋ฅผ ์ด์šฉํ•ด ๋ณด์Šคํ„ด ์ฃผํƒ ๊ฐ€๊ฒฉ์˜ ํšŒ๊ท€ ๋ชจ๋ธ์„ ์ƒ์„ฑ.

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

y_target = bostonDF['PRICE']
X_data = bostonDF.drop(['PRICE'],axis=1,inplace=False)

X_train , X_test , y_train , y_test = train_test_split(X_data , y_target ,test_size=0.3, random_state=156)

# Linear Regression OLS๋กœ ํ•™์Šต/์˜ˆ์ธก/ํ‰๊ฐ€ ์ˆ˜ํ–‰. 
lr = LinearRegression()
lr.fit(X_train ,y_train )
y_preds = lr.predict(X_test)
mse = mean_squared_error(y_test, y_preds)
rmse = np.sqrt(mse)

print('MSE : {0:.3f} , RMSE : {1:.3F}'.format(mse , rmse))
print('Variance score : {0:.3f}'.format(r2_score(y_test, y_preds)))

์ ˆํŽธ์€ LinearRegression ๊ฐ์ฒด์˜ intercept_ ์†์„ฑ์—, ํšŒ๊ท€ ๊ณ„์ˆ˜๋Š” coef_์†์„ฑ์— ์ €์žฅ๋ผ ์žˆ์Œ.

  • ํšŒ๊ท€ ํ‰๊ฐ€

๊ต์ฐจ ๊ฒ€์ฆ์œผ๋กœ MSE์™€ RMSE๋ฅผ ์ธก์ •

from sklearn.model_selection import cross_val_score

y_target = bostonDF['PRICE']
X_data = bostonDF.drop(['PRICE'],axis=1,inplace=False)
lr = LinearRegression()

# cross_val_score( )๋กœ 5 Fold ์…‹์œผ๋กœ MSE ๋ฅผ ๊ตฌํ•œ ๋’ค ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์‹œ  RMSE ๊ตฌํ•จ. 
neg_mse_scores = cross_val_score(lr, X_data, y_target, scoring="neg_mean_squared_error", cv = 5)
rmse_scores  = np.sqrt(-1 * neg_mse_scores)
avg_rmse = np.mean(rmse_scores)

# cross_val_score(scoring="neg_mean_squared_error")๋กœ ๋ฐ˜ํ™˜๋œ ๊ฐ’์€ ๋ชจ๋‘ ์Œ์ˆ˜ 
print(' 5 folds ์˜ ๊ฐœ๋ณ„ Negative MSE scores: ', np.round(neg_mse_scores, 2))
print(' 5 folds ์˜ ๊ฐœ๋ณ„ RMSE scores : ', np.round(rmse_scores, 2))
print(' 5 folds ์˜ ํ‰๊ท  RMSE : {0:.3f} '.format(avg_rmse))

๊ณ„์‚ฐ๋œ MSE(neg_mean_squared_error)์— -1์„ ๊ณฑํ•ด์„œ ๋ฐ˜ํ™˜ (๊ทธ๋ž˜์•ผ ์–‘์˜ ๊ฐ’)

05. ๋‹คํ•ญํšŒ๊ท€์™€ ๊ณผ๋Œ€์ ํ•ฉ/๊ณผ์†Œ์ ํ•ฉ ์ดํ•ด

๋‹คํ•ญํšŒ๊ท€ ์ดํ•ด

๋‹คํ•ญํšŒ๊ท€
๋‹คํ•ญํšŒ๊ท€๋Š” ์„ ํ˜•ํšŒ๊ท€ ํ•จ์ˆ˜์ด๋‹ค.

๋‹คํ•ญ ํšŒ๊ท€ ์—ญ์‹œ ์„ ํ˜•ํšŒ๊ท€์ด๊ธฐ ๋•Œ๋ฌธ์— ๋น„์„ ํ˜• ํ•จ์ˆ˜๋ฅผ ์„ ํ˜• ๋ชจ๋ธ์— ์ ์šฉ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•ด ๊ตฌํ˜„

1. PolynomialFeature ํด๋ž˜์Šค๋ฅผ ํ†ตํ•ด ํ”ผ์ฒ˜๋ฅผ Polynomial ํ”ผ์ฒ˜๋กœ ๋ณ€ํ™˜

๋‹จํ•ญ์‹ ํ”ผ์ฒ˜๋ฅผ degree์— ํ•ด๋‹นํ•˜๋Š” ๋‹คํ•ญ์‹ ํ”ผ์ฒ˜๋กœ ๋ณ€ํ™˜

from sklearn.preprocessing import PolynomialFeatures
import numpy as np

# ๋‹คํ•ญ์‹์œผ๋กœ ๋ณ€ํ™˜ํ•œ ๋‹จํ•ญ์‹ ์ƒ์„ฑ, [[0,1],[2,3]]์˜ 2X2 ํ–‰๋ ฌ ์ƒ์„ฑ
X = np.arange(4).reshape(2,2)
print('์ผ์ฐจ ๋‹จํ•ญ์‹ ๊ณ„์ˆ˜ feature:\n',X )

# degree = 2 ์ธ 2์ฐจ ๋‹คํ•ญ์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•ด PolynomialFeatures๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ณ€ํ™˜
poly = PolynomialFeatures(degree=2)
poly.fit(X)
poly_ftr = poly.transform(X)
print('๋ณ€ํ™˜๋œ 2์ฐจ ๋‹คํ•ญ์‹ ๊ณ„์ˆ˜ feature:\n', poly_ftr)
[Output]
์ผ์ฐจ ๋‹จํ•ญ์‹ ๊ณ„์ˆ˜ feature:
 [[0 1]
 [2 3]]
๋ณ€ํ™˜๋œ 2์ฐจ ๋‹คํ•ญ์‹ ๊ณ„์ˆ˜ feature:
 [[1. 0. 1. 0. 0. 1.]
 [1. 2. 3. 4. 6. 9.]]

์ž…๋ ฅ๋œ x1=0, x2=1์— ๋Œ€ํ•œ 2์ฐจ ๋‹คํ•ญ ๊ณ„์ˆ˜ [1, x_1, x_2, x_1^2, x_1*x_2, x_2^2]๋ฅผ return

2. ๋‹คํ•ญ์‹ ๊ณ„์ˆ˜ feature์™€ ๋‹คํ•ญ์‹ ๊ฒฐ์ •๊ฐ’์œผ๋กœ Linear Regression ํ•™์Šต

๋‹คํ•ญ์‹ ๊ณ„์ˆ˜ feature์€ ์œ„์—์„œ PolynomialFeature๋กœ ๊ตฌํ•œ ๊ณ„์ˆ˜
๋‹คํ•ญ์‹ ๊ฒฐ์ •๊ฐ’์€ ์›๋ž˜ ํ•จ์ˆ˜ ๋Œ€์ž… ๊ฐ’ (์˜ˆ์ธก๊ฐ’์ด ์•„๋‹Œ ์‹ค์ œ ํ•จ์ˆ˜์— ๋Œ€์ž…ํ•œ ์‹ค์ œ๊ฐ’)

์‚ฌ์ดํ‚ท๋Ÿฐ์€ PolynomialFeatures๋กœ ํ”ผ์ฒ˜๋ฅผ ๋ณ€ํ™˜ํ•œ ํ›„์— LinearRegression ํด๋ž˜์Šค๋กœ ๋‹คํ•ญํšŒ๊ท€๋ฅผ ๊ตฌํ˜„

์ผ๋ฐ˜์ ์œผ๋กœ Pipeline ๊ฐ์ฒด๋ฅผ ์ด์šฉํ•ด 1 & 2๋‹จ๊ณ„๋ฅผ ํ•œ๋ฒˆ์— ๊ตฌํ˜„

# Pipeline ๊ฐ์ฒด๋กœ Streamline ํ•˜๊ฒŒ Polynomial Feature๋ณ€ํ™˜๊ณผ Linear Regression์„ ์—ฐ๊ฒฐ
model = Pipeline([('poly', PolynomialFeatures(degree=3)),
                  ('linear', LinearRegression())])

๋‹คํ•ญ ํšŒ๊ท€๋ฅผ ์ด์šฉํ•œ ๊ณผ์†Œ์ ํ•ฉ ๋ฐ ๊ณผ์ ํ•ฉ ์ดํ•ด

๋‹คํ•ญ ํšŒ๊ท€์˜ ์ฐจ์ˆ˜๋ฅผ ๋†’์ผ ์ˆ˜๋ก ํ•™์Šต๋ฐ์ดํ„ฐ์—๋งŒ ๋„ˆ๋ฌด ๋งž์ถ˜ ํ•™์Šต์ด ์ด๋ค„์ ธ์„œ ๊ณผ์ ํ•ฉ์˜ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒ

  • ์ฝ”์‚ฌ์ธ ํ•จ์ˆ˜ ์˜ˆ์ œ

ํ”ผ์ฒ˜ X์™€ target y๊ฐ€ ์žก์Œ์ด ํฌํ•จ๋œ ์ฝ”์‚ฌ์ธ ๊ทธ๋ž˜ํ”„ ๊ด€๊ณ„๋ฅผ ๊ฐ€์ง. ๋‹คํ•ญํšŒ๊ท€์˜ ์ฐธ์ˆ˜๋ฅผ ๋ณ€ํ™”์‹œํ‚ค๋ฉด์„œ ๊ทธ์— ๋”ฐ๋ฅธ ํšŒ๊ท€ ์˜ˆ์ธก ๊ณก์„ ๊ณผ ์˜ˆ์ธก ์ •ํ™•๋„๋ฅผ ๋น„๊ตํ•˜๋Š” ์˜ˆ์ œ

target y = X์˜ ์ฝ”์‚ฌ์ธ ๊ฐ’ + ์•ฝ๊ฐ„์˜ ์žก์Œ ๋ณ€๋™

# ์ž„์˜์˜ ๊ฐ’์œผ๋กœ ๊ตฌ์„ฑ๋œ X๊ฐ’์— ๋Œ€ํ•ด ์ฝ”์‚ฌ์ธ ๋ณ€ํ™˜ ๊ฐ’์„ ๋ฐ˜ํ™˜.
def true_fun(X):
    return np.cos(1.5 * np.pi * X)

# X๋Š” 0๋ถ€ํ„ฐ 1๊นŒ์ง€ 30๊ฐœ์˜ ์ž„์˜์˜ ๊ฐ’์„ ์ˆœ์„œ๋Œ€๋กœ ์ƒ˜ํ”Œ๋งํ•œ ๋ฐ์ดํ„ฐ์ž…๋‹ˆ๋‹ค.
np.random.seed(0)
n_samples = 30
X = np.sort(np.random.rand(n_samples))

# y ๊ฐ’์€ ์ฝ”์‚ฌ์ธ ๊ธฐ๋ฐ˜์˜ true_fun()์—์„œ ์•ฝ๊ฐ„์˜ ๋…ธ์ด์ฆˆ ๋ณ€๋™ ๊ฐ’์„ ๋”ํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค.
y = true_fun(X) + np.random.randn(n_samples) * 0.1
  1. ๋‹คํ•ญ์‹ ์ฐจ์ˆ˜๋ฅผ 1, 4, 15๋กœ ๋ณ€๊ฒฝํ•˜๋ฉด์„œ ์˜ˆ์ธก ๊ฒฐ๊ณผ ๋น„๊ต
  2. cross_val_score()๋กœ MSE ๊ฐ’ ๊ตฌํ•ด ์ฐจ์ˆ˜๋ณ„ ์˜ˆ์ธก ์„ฑ๋Šฅ ๋น„๊ต


(์‹ค์„ ์ด ๋‹คํ•ญํšŒ๊ท€ ์˜ˆ์ธก ๊ณก์„ / ์ ์„ ์ด ์‹ค์ œ ๋ฐ์ดํ„ฐ ์„ธํŠธ X, Y์˜ ์ฝ”์‚ฌ์ธ ๊ณก์„ )

  • Degree1 (MSE=0.41)
    ๋„ˆ๋ฌด ๋‹จ์ˆœํ•œ ์ง์„  ๋ชจ๋ธ. ์˜ˆ์ธก ๊ณก์„ ์ด ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ํŒจํ„ด์„ ์ œ๋Œ€๋กœ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•˜๊ณ  ์žˆ๋Š” ๊ณผ์†Œ์ ํ•ฉ ๋ชจ๋ธ
  • Degree4 (MSE=0.04)
    ํ•™์Šต ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋น„๊ต์  ์ž˜ ๋ฐ˜์˜ํ•ด ์ฝ”์‚ฌ์ธ ๊ณก์„  ๊ธฐ๋ฐ˜์œผ๋กœ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ์ž˜ ์˜ˆ์ธกํ•œ ๊ณก์„ ์„ ๊ฐ€์ง„ ๋ชจ๋ธ์ด ๋จ
  • Degree (MSE=182581084.83)
    ์˜ˆ์ธก ๊ณก์„ ์ด ํ•™์Šต ๋ฐ์ดํ„ฐ ์„ธํŠธ๋งŒ ์ •ํ™•ํžˆ ์˜ˆ์ธกํ•˜๊ณ , ํ…Œ์ŠคํŠธ ๊ฐ’์˜ ์‹ค์ œ ๊ณก์„ ๊ณผ๋Š” ์™„์ „ํžˆ ๋‹ค๋ฅธ ํ˜•ํƒœ์˜ ์˜ˆ์ธก ๊ณก์„ ์ด ๋งŒ๋“ค์–ด์ง

ํŽธํ–ฅ-๋ถ„์‚ฐ ํŠธ๋ ˆ์ด๋“œ ์˜คํ”„ (Bias-Variance TradeOff)

๊ณ ํŽธํ–ฅ - ๋งค์šฐ ๋‹จ์ˆœํ™”๋œ ๋ชจ๋ธ๋กœ ์ง€๋‚˜์น˜๊ฒŒ ํ•œ ๋ฐฉํ–ฅ์„ฑ์œผ๋กœ ์น˜์šฐ์นœ ๊ฒฝํ–ฅ ์กด์žฌ
๊ณ ๋ถ„์‚ฐ - ๋งค์šฐ ๋ณต์žกํ•œ ๋ชจ๋ธ๋กœ ์ง€๋‚˜์น˜๊ฒŒ ๋†’์€ ๋ณ€๋™์„ฑ์„ ๊ฐ€์ง


ํŽธํ–ฅ๊ณผ ๋ถ„์‚ฐ์ด ์„œ๋กœ ํŠธ๋ ˆ์ด๋“œ ์˜คํ”„๋ฅผ ์ด๋ฃจ๋ฉด์„œ ์˜ค๋ฅ˜ Cost ๊ฐ’์ด ์ตœ๋Œ€๋กœ ๋‚ฎ์•„์ง€๋Š” ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ํšจ์œจ์ ์ธ ๋จธ์‹ ๋Ÿฌ๋‹ ์˜ˆ์ธก ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•

06. ๊ทœ์ œ ์„ ํ˜• ๋ชจ๋ธ - ๋ฆฟ์ง€, ๋ผ์˜, ์—˜๋ผ์Šคํ‹ฑ๋„ท

์ด๋•Œ๋™์•ˆ์€ RSS๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ๋งŒ ๊ณ ๋ ค. ํ•™์Šต ๋ฐ์ดํ„ฐ์— ์ง€๋‚˜์น˜๊ฒŒ ๋งž์ถ”๊ฒŒ ๋˜๊ณ , ํšŒ๊ท€ ๊ณ„์ˆ˜๊ฐ€ ์‰ฝ๊ฒŒ ์ปค์ง.

RSS ์ตœ์†Œํ™” ๋ฐฉ๋ฒ•๊ณผ ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ํšŒ๊ท€ ๊ณ„์ˆ˜ ๊ฐ’์ด ์ปค์ง€์ง€ ์•Š๋„๋ก ํ•˜๋Š” ๋ฐฉ๋ฒ• ์‚ฌ์ด ๊ท ํ˜• ํ•„์š”

alpha๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ ์ ํ•ฉ ์ •๋„์™€ ํšŒ๊ท€ ๊ณ„์ˆ˜ ๊ฐ’์˜ ํฌ๊ธฐ ์ œ์–ด๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ํŠœ๋‹ ํŒŒ๋ผ๋ฏธํ„ฐ
alpha๊ฐ€ 0์ด๋ผ๋ฉด ๊ธฐ์กด๊ณผ ๋™์ผํ•œ ์‹ (๊ทœ์ œ๋ฅผ ์ ์šฉํ•˜์ง€ ์•Š์€ ์‹)
alpha๊ฐ€ ๋ฌดํ•œ๋Œ€๋ผ๋ฉด ๋น„์šฉํ•จ์ˆ˜ ์‹์—์„œ W ๊ทœ์ œ๋Š” ๋งค์šฐ ์ปค์ ธ 0์œผ๋กœ ์ˆ˜๋ ด

๋น„์šฉํ•จ์ˆ˜์— alpha ๊ฐ’์œผ๋กœ ํŒจ๋„ํ‹ฐ๋ฅผ ๋ถ€์—ฌํ•ด ํšŒ๊ท€ ๊ณ„์ˆ˜ ๊ฐ’์˜ ํฌ๊ธฐ๋ฅผ ๊ฐ์†Œ์‹œ์ผœ ๊ณผ์ ํ•ฉ์„ ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉ์‹์„ ๊ทœ์ œ(Regularization)๋ผ๊ณ  ๋ถ€๋ฆ„

๋ฆฟ์ง€ ํšŒ๊ท€ (L2 ๊ทœ์ œ)

alpha L2 ๊ทœ์ œ ๊ณ„์ˆ˜ - L2 ๊ทœ์ œํ•˜๋Š” ํŠน์ • alpha ์ •์˜

๋ฆฟ์ง€ ํšŒ๊ท€๋Š” alpha ๊ฐ’์ด ์ปค์งˆ ์ˆ˜๋ก ํšŒ๊ท€ ๊ณ„์ˆ˜ ๊ฐ’์„ ์ž‘๊ฒŒ ๋งŒ๋“ฆ

  • alpha์— ๋”ฐ๋ฅธ ํšŒ๊ท€ ๊ณ„์ˆ˜ ๊ฐ’

๊ฐ ํ”ผ์ฒ˜์— ๋Œ€ํ•œ ๊ฐ€๋กœ๋ง‰๋Œ€๋Š” ํšŒ๊ท€ ๊ณ„์ˆ˜ ์˜๋ฏธ

# ๊ฐ alpha์— ๋”ฐ๋ฅธ ํšŒ๊ท€ ๊ณ„์ˆ˜ ๊ฐ’์„ ์‹œ๊ฐํ™”ํ•˜๊ธฐ ์œ„ํ•ด 5๊ฐœ์˜ ์—ด๋กœ ๋œ ๋งทํ”Œ๋กฏ๋ฆฝ ์ถ• ์ƒ์„ฑ  
fig , axs = plt.subplots(figsize=(18,6) , nrows=1 , ncols=5)
# ๊ฐ alpha์— ๋”ฐ๋ฅธ ํšŒ๊ท€ ๊ณ„์ˆ˜ ๊ฐ’์„ ๋ฐ์ดํ„ฐ๋กœ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•œ DataFrame ์ƒ์„ฑ  
coeff_df = pd.DataFrame()

# alphas ๋ฆฌ์ŠคํŠธ ๊ฐ’์„ ์ฐจ๋ก€๋กœ ์ž…๋ ฅํ•ด ํšŒ๊ท€ ๊ณ„์ˆ˜ ๊ฐ’ ์‹œ๊ฐํ™” ๋ฐ ๋ฐ์ดํ„ฐ ์ €์žฅ. pos๋Š” axis์˜ ์œ„์น˜ ์ง€์ •
for pos , alpha in enumerate(alphas) :
    ridge = Ridge(alpha = alpha)
    ridge.fit(X_data , y_target)
    # alpha์— ๋”ฐ๋ฅธ ํ”ผ์ฒ˜๋ณ„ ํšŒ๊ท€ ๊ณ„์ˆ˜๋ฅผ Series๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  ์ด๋ฅผ DataFrame์˜ ์ปฌ๋Ÿผ์œผ๋กœ ์ถ”๊ฐ€.  
    coeff = pd.Series(data=ridge.coef_ , index=X_data.columns )
    colname='alpha:'+str(alpha)
    coeff_df[colname] = coeff
    # ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„๋กœ ๊ฐ alpha ๊ฐ’์—์„œ์˜ ํšŒ๊ท€ ๊ณ„์ˆ˜๋ฅผ ์‹œ๊ฐํ™”. ํšŒ๊ท€ ๊ณ„์ˆ˜๊ฐ’์ด ๋†’์€ ์ˆœ์œผ๋กœ ํ‘œํ˜„
    coeff = coeff.sort_values(ascending=False)
    axs[pos].set_title(colname)
    axs[pos].set_xlim(-3,6)
    sns.barplot(x=coeff.values , y=coeff.index, ax=axs[pos])

# for ๋ฌธ ๋ฐ”๊นฅ์—์„œ ๋งทํ”Œ๋กฏ๋ฆฝ์˜ show ํ˜ธ์ถœ ๋ฐ alpha์— ๋”ฐ๋ฅธ ํ”ผ์ฒ˜๋ณ„ ํšŒ๊ท€ ๊ณ„์ˆ˜๋ฅผ DataFrame์œผ๋กœ ํ‘œ์‹œ
plt.show()


alpha๊ฐ€ ์ปค์งˆ์ˆ˜๋ก ํšŒ๊ท€ ๊ณ„์ˆ˜๊ฐ€ ๊ฐ์†Œํ•จ์„ ํ™•์ธ ๊ฐ€๋Šฅ

(์‚ฌ์ง„ ์ถœ์ฒ˜ : ํ‹ฐ์Šคํ† ๋ฆฌ)

๋ผ์˜ ํšŒ๊ท€ (L1 ๊ทœ์ œ)


L1 ๊ทœ์ œ๋Š” alpha *||W|| ์˜๋ฏธ. ์ ์ ˆํ•œ ํ”ผ์ฒ˜๋งŒ ํšŒ๊ท€์— ํฌํ•จ์‹œํ‚ค๋Š” ํ”ผ์ฒ˜ ์„ ํƒ์˜ ํŠน์„ฑ ๊ฐ€์ง

L1 ๊ทœ์ œ์™€ L2 ๊ทœ์ œ์˜ ์ฐจ์ด์ 
L2 ๊ทœ์ œ๊ฐ€ ํšŒ๊ท€ ๊ณ„์ˆ˜์˜ ํฌ๊ธฐ๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๋Š” ๋ฐ˜ํ•ด L1 ๊ทœ์ œ๋Š” ๋ถˆํ•„์š”ํ•œ ํšŒ๊ท€ ๊ณ„์ˆ˜๋ฅผ ๊ธ‰๊ฒฉํ•˜๊ฒŒ ๊ฐ์†Œ์‹œ์ผœ 0์œผ๋กœ ๋งŒ๋“ค๊ณ  ์ œ๊ฑฐ
๋ผ์˜ ํšŒ๊ท€์˜ ๊ฒฝ์šฐ ์ตœ์ ๊ฐ’์€ ๋ชจ์„œ๋ฆฌ ๋ถ€๋ถ„์—์„œ ๋‚˜ํƒ€๋‚  ํ™•๋ฅ ์ด ๋ฆฟ์ง€์— ๋น„ํ•ด ๋†’๊ธฐ ๋•Œ๋ฌธ์— ๋ช‡๋ช‡ ์œ ์˜๋ฏธํ•˜์ง€ ์•Š์€ ๋ณ€์ˆ˜๋“ค์— ๋Œ€ํ•ด ๊ณ„์ˆ˜๋ฅผ 0์— ๊ฐ€๊น๊ฒŒ (๋˜๋Š” 0) ์œผ๋กœ ์ถ”์ •ํ•ด feature selection์˜ ํšจ๊ณผ ๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค. ๋ผ์˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ํฌ๊ธฐ์— ๊ด€๊ณ„ ์—†์ด ๊ฐ™์€ ์ˆ˜์ค€์˜ reularizaton ์„ ์ ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ž‘์€ ๊ฐ’์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ 0์œผ๋กœ ๋งŒ๋“ค์–ด ํ•ด๋‹น ๋ณ€์ˆ˜๋ฅผ ๋ชจ๋ธ์—์„œ ์‚ญ์ œํ•˜๊ณ , ๋”ฐ๋ผ์„œ ๋ชจ๋ธ์„ ๋‹จ์ˆœํ•˜๊ฒŒ ๋งŒ๋“ค์–ด์ฃผ๊ณ  ํ•ด์„์„ ์šฉ์ดํ•˜๊ฒŒ ํ•œ๋‹ค. (์ถœ์ฒ˜ : velog)

  • alpha ๊ฐ’์„ ๋ณ€ํ™”์‹œํ‚ค๋ฉด์„œ RMSE์™€ ๊ฐ ํ”ผ์ฒ˜์˜ ํšŒ๊ท€ ๊ณ„์ˆ˜๋ฅผ ์ถœ๋ ฅ
from sklearn.linear_model import Lasso, ElasticNet

# alpha๊ฐ’์— ๋”ฐ๋ฅธ ํšŒ๊ท€ ๋ชจ๋ธ์˜ ํด๋“œ ํ‰๊ท  RMSE๋ฅผ ์ถœ๋ ฅํ•˜๊ณ  ํšŒ๊ท€ ๊ณ„์ˆ˜๊ฐ’๋“ค์„ DataFrame์œผ๋กœ ๋ฐ˜ํ™˜ 
def get_linear_reg_eval(model_name, params=None, X_data_n=None, y_target_n=None, 
                        verbose=True, return_coeff=True):
    coeff_df = pd.DataFrame()
    if verbose : print('####### ', model_name , '#######')
    for param in params:
        if model_name =='Ridge': model = Ridge(alpha=param)
        elif model_name =='Lasso': model = Lasso(alpha=param)
        elif model_name =='ElasticNet': model = ElasticNet(alpha=param, l1_ratio=0.7)
        neg_mse_scores = cross_val_score(model, X_data_n, 
                                             y_target_n, scoring="neg_mean_squared_error", cv = 5)
        avg_rmse = np.mean(np.sqrt(-1 * neg_mse_scores))
        print('alpha {0}์ผ ๋•Œ 5 ํด๋“œ ์„ธํŠธ์˜ ํ‰๊ท  RMSE: {1:.3f} '.format(param, avg_rmse))
        # cross_val_score๋Š” evaluation metric๋งŒ ๋ฐ˜ํ™˜ํ•˜๋ฏ€๋กœ ๋ชจ๋ธ์„ ๋‹ค์‹œ ํ•™์Šตํ•˜์—ฌ ํšŒ๊ท€ ๊ณ„์ˆ˜ ์ถ”์ถœ
        
        model.fit(X_data_n , y_target_n)
        if return_coeff:
            # alpha์— ๋”ฐ๋ฅธ ํ”ผ์ฒ˜๋ณ„ ํšŒ๊ท€ ๊ณ„์ˆ˜๋ฅผ Series๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  ์ด๋ฅผ DataFrame์˜ ์ปฌ๋Ÿผ์œผ๋กœ ์ถ”๊ฐ€. 
            coeff = pd.Series(data=model.coef_ , index=X_data_n.columns )
            colname='alpha:'+str(param)
            coeff_df[colname] = coeff
    
    return coeff_df
# end of get_linear_regre_eval

(์‚ฌ์ง„ ์ถœ์ฒ˜ : ํ‹ฐ์Šคํ† ๋ฆฌ)

์ผ๋ถ€ ํ”ผ์ฒ˜์˜ ํšŒ๊ท€ ๊ณ„์ˆ˜๋Š” ์•„์˜ˆ 0์œผ๋กœ ๋ฐ”๋€Œ๊ณ  ์žˆ์Œ

์—˜๋ผ์Šคํ‹ฑ๋„ท ํšŒ๊ท€

L1 ๊ทœ์ œ์™€ L2 ๊ทœ์ œ๋ฅผ ๊ฒฐํ•ฉํ•œ ํšŒ๊ท€
RSS(W) + alpha2 ||W|| + alpha1 ||W|| ์„ ์ตœ์†Œํ™” ํ•˜๋Š” W ์ฐพ๊ธฐ
(๋ผ์˜์™€ ๋ฆฟ์ง€์˜ alpha ๊ฐ’์€ ๋‹ค๋ฆ„!)

๋ผ์˜ ํšŒ๊ท€์—์„œ ์ค‘์š” ํ”ผ์ฒ˜๋ฅผ ๊ณ ๋ฅด๊ณ  ๋‹ค๋ฅธ ํ”ผ์ฒ˜๋ฅผ ๋ชจ๋‘ ํšŒ๊ท€ ๊ณ„์ˆ˜๋ฅผ 0์œผ๋กœ ๋งŒ๋“œ๋Š” ์„ฑํ–ฅ์ด ๊ฐ•ํ•จ. ๊ธ‰๊ฒฉํžˆ ๋ณ€๋™ํ•  ์ˆ˜๋„ ์žˆ๋Š”๋ฐ, ์—˜๋ผ์Šคํ‹ฑ ํšŒ๊ท€๋Š” ์ด๋ฅผ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด L2 ๊ทœ์ œ๋ฅผ ๋ผ์˜ ํšŒ๊ท€์— ์ถ”๊ฐ€ํ•œ ๊ฒƒ

๋‹จ์ ์€ ์ˆ˜ํ–‰์‹œ๊ฐ„์ด ์ƒ๋Œ€์ ์œผ๋กœ ์˜ค๋ž˜ ๊ฑธ๋ฆผ

  • ElasticNet ํด๋ž˜์Šค
์—˜๋ผ์Šคํ‹ฑ๋„ท ๊ทœ์ œ a * L1 + b * L2

a = L1 ๊ทœ์ œ์˜ alpha๊ฐ’
b = L2 ๊ทœ์ œ์˜ alpha๊ฐ’
Elastic alpha ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’ = a+b
l1_ratio ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’ = a / (a+b)
  • l1_ratio๋ฅผ ๊ณ ์ •ํ•œ alpha๊ฐ’์˜ ๋‹จ์ˆœํ•œ ๋ณ€ํ™”
์—˜๋ผ์Šคํ‹ฑ๋„ท์— ์‚ฌ์šฉ๋  alpha ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๊ฐ’๋“ค์„ ์ •์˜ํ•˜๊ณ  get_linear_reg_eval() ํ•จ์ˆ˜ ํ˜ธ์ถœ
# l1_ratio๋Š” 0.7๋กœ ๊ณ ์ •
elastic_alphas = [ 0.07, 0.1, 0.5, 1, 3]
coeff_elastic_df =get_linear_reg_eval('ElasticNet', params=elastic_alphas,
                                      X_data_n=X_data, y_target_n=y_target)

(์‚ฌ์ง„ ์ถœ์ฒ˜ : ํ‹ฐ์Šคํ† ๋ฆฌ)

์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ๋ณ€ํ™˜

์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ์€ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ๊ฐ€ ์ค‘์š”ํ•˜๋‹ค

์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ์€ ํ”ผ์ฒ˜๊ฐ’๊ณผ ํƒ€๊นƒ๊ฐ’์˜ ๋ถ„ํฌ๊ฐ€ ์ •๊ทœ๋ถ„ํฌ ํ˜•ํƒœ๋ฅผ ๋งค์šฐ ์„ ํ˜ธ. ์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ์„ ์ ์šฉํ•˜๊ธฐ ์ „์— ๋จผ์ € ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์Šค์ผ€์ผ๋ง/์ •๊ทœํ™” ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ 

์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋ฐฉ๋ฒ• ์ข…๋ฅ˜
1. ์ •๊ทœํ™” ๋ฐฉ๋ฒ• 2๊ฐ€์ง€
StandardScaler ํด๋ž˜์Šค๋ฅผ ์ด์šฉํ•ด ์ •๊ทœ๋ถ„ํฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋กœ ๋ณ€ํ™˜ํ•˜๊ฑฐ๋‚˜ MinMaxScaler ํด๋ž˜์Šค๋ฅผ ์ด์šฉํ•ด ์ •๊ทœํ™”๋ฅผ ์ˆ˜ํ–‰
2. ์Šค์ผ€์ผ๋ง/์ •๊ทœํ™”๋ฅผ ์ˆ˜ํ–‰ํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋‹ค์‹œ ๋‹คํ•ญ ํŠน์„ฑ์„ ์ ์šฉํ•˜์—ฌ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐฉ๋ฒ•
3. ๋กœ๊ทธ ๋ณ€ํ™˜
์›๋ž˜ ๊ฐ’์— log ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๋ฉด ๋ณด๋‹ค ์ •๊ทœ๋ถ„ํฌ์— ๊ฐ€๊นŒ์šด ํ˜•ํƒœ๋กœ ๊ฐ’์ด ๋ถ„ํฌ

ํƒ€๊นƒ๊ฐ’์˜ ๊ฒฝ์šฐ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋กœ๊ทธ ๋ณ€ํ™˜ ์ ์šฉ

  • ํ‘œ์ค€ ์ •๊ทœ ๋ถ„ํฌ ๋ณ€ํ™˜, ์ตœ๋Œ“๊ฐ’/์ตœ์†Ÿ๊ฐ’ ์ •๊ทœํ™”, ๋กœ๊ทธ ๋ณ€ํ™˜์„ ์ฐจ๋ก€๋กœ ์ ์šฉ ํ›„ RMSE๋กœ ๊ฐ ๊ฒฝ์šฐ๋ณ„ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ์ธก์ •
from sklearn.preprocessing import StandardScaler, MinMaxScaler, PolynomialFeatures

# method๋Š” ํ‘œ์ค€ ์ •๊ทœ ๋ถ„ํฌ ๋ณ€ํ™˜(Standard), ์ตœ๋Œ€๊ฐ’/์ตœ์†Œ๊ฐ’ ์ •๊ทœํ™”(MinMax), ๋กœ๊ทธ๋ณ€ํ™˜(Log) ๊ฒฐ์ •
# p_degree๋Š” ๋‹คํ–ฅ์‹ ํŠน์„ฑ์„ ์ถ”๊ฐ€ํ•  ๋•Œ ์ ์šฉ. p_degree๋Š” 2์ด์ƒ ๋ถ€์—ฌํ•˜์ง€ ์•Š์Œ. 
def get_scaled_data(method='None', p_degree=None, input_data=None):
    if method == 'Standard':
        scaled_data = StandardScaler().fit_transform(input_data)
    elif method == 'MinMax':
        scaled_data = MinMaxScaler().fit_transform(input_data)
    elif method == 'Log':
        scaled_data = np.log1p(input_data)
    else:
        scaled_data = input_data

    if p_degree != None:
        scaled_data = PolynomialFeatures(degree=p_degree, 
                                         include_bias=False).fit_transform(scaled_data)
    
    return scaled_data

์ผ๋ฐ˜์ ์œผ๋กœ log()ํ•จ์ˆ˜๋ณด๋‹ค 1+log() ํ•จ์ˆ˜np.log1p() ํ•จ์ˆ˜ ์ ์šฉ

scale_methods=[(None, None), ('Standard', None), ('Standard', 2), 
               ('MinMax', None), ('MinMax', 

[์›๋ณธ ๋ฐ์ดํ„ฐ,
ํ‘œ์ค€ ์ •๊ทœ๋ถ„ํฌ,
ํ‘œ์ค€ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋‹ค์‹œ 2์ฐจ ๋‹คํ•ญ์‹ ๋ณ€ํ™˜,
์ตœ์†Ÿ๊ฐ’/์ตœ๋Œ“๊ฐ’ ์ •๊ทœํ™”,
์ตœ์†Ÿ๊ฐ’/์ตœ๋Œ€๊ฐ’ ์ •๊ทœํ™”๋ฅผ ๋‹ค์‹œ 2์ฐจ ๋‹คํ•ญ์‹ ๋ณ€ํ™˜,
๋กœ๊ทธ ๋ณ€ํ™˜]

2์ฐจ ๋‹คํ•ญ์‹ ๋ณ€ํ™˜ - LinearRegression ํ›ˆ๋ จ ์ „ 1์ฐจ๊ฐ€ ์•„๋‹Œ 2์ฐจ๋กœ ๋ณ€ํ™˜ (์ง€๊ธˆ ์ „ ๊ณผ์ •์ด 1์ฐจ ์„ ํ˜• ํ•จ์ˆ˜๋กœ ์ง„ํ–‰๋˜๋Š” ์ค‘)

  • ๊ฒฐ๊ณผ
[Output]

## ๋ณ€ํ™˜ ์œ ํ˜•:None, Polynomial Degree:None
##### Ridge #####
alpha: 0.1 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.788
alpha: 1 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.653
alpha: 10 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.518
alpha: 100 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.330

## ๋ณ€ํ™˜ ์œ ํ˜•:Standard, Polynomial Degree:None
##### Ridge #####
alpha: 0.1 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.788
alpha: 1 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.653
alpha: 10 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.518
alpha: 100 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.330

## ๋ณ€ํ™˜ ์œ ํ˜•:Standard, Polynomial Degree:2
##### Ridge #####
alpha: 0.1 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.788
alpha: 1 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.653
alpha: 10 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.518
alpha: 100 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.330

## ๋ณ€ํ™˜ ์œ ํ˜•:MinMax, Polynomial Degree:None
##### Ridge #####
alpha: 0.1 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.788
alpha: 1 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.653
alpha: 10 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.518
alpha: 100 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.330

## ๋ณ€ํ™˜ ์œ ํ˜•:MinMax, Polynomial Degree:2
##### Ridge #####
alpha: 0.1 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.788
alpha: 1 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.653
alpha: 10 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.518
alpha: 100 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.330

## ๋ณ€ํ™˜ ์œ ํ˜•:Log, Polynomial Degree:None
##### Ridge #####
alpha: 0.1 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.788
alpha: 1 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.653
alpha: 10 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.518
alpha: 100 ์ผ ๋•Œ 5 folds์˜ ํ‰๊ท  RMSE: 5.330

์ผ๋ฐ˜์ ์œผ๋กœ ์„ ํ˜• ํšŒ๊ท€ ์ ์šฉ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋ฐ์ดํ„ฐ ๊ฐ’ ๋ถ„ํฌ๊ฐ€ ์‹ฌํ•˜๊ฒŒ ์™œ๊ณก๋  ๊ฒฝ์šฐ ๋กœ๊ทธ ๋ณ€ํ™˜์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹์€ ๊ฒฐ๊ณผ ๊ธฐ๋Œ€ ๊ฐ€๋Šฅ

์™œ ๋ถ„๋ฅ˜๊ฐ€ ์•„๋‹Œ ํšŒ๊ท€์—๋งŒ ๊ทœ์ œ ์ ์šฉ?

07. ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋Š” ์„ ํ˜• ํšŒ๊ท€๋ฐฉ์‹์„ ๋ถ„๋ฅ˜์— ์ ์šฉํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜. ํ•™์Šต์„ ํ†ตํ•ด ์„ ํ˜• ํ•จ์ˆ˜์˜ ์ตœ์ ์„ ์„ ์ฐพ๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜ ์ตœ์ ์„ ์„ ์ฐพ๊ณ  ์ด ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜์˜ ๋ฐ˜ํ™˜ ๊ฐ’์„ ํ™•๋ฅ ๋กœ ๊ฐ„์ฃผํ•ด ํ™•๋ฅ ์— ๋”ฐ๋ผ ๋ถ„๋ฅ˜๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค๋Š” ๊ฒƒ (ํšŒ๊ท€ ๊ฒฐ๊ณผ๊ฐ’์„ ๋ถ„๋ฅ˜์— ์‚ฌ์šฉํ•œ๋‹ค)

  • ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋Š” ํšŒ๊ท€ ๋ฌธ์ œ๊ฐ€ ์•„๋‹Œ ๋ถ„๋ฅ˜ ๋ฌธ์ œ์— ์ ํ•ฉ

์„ ํ˜•ํšŒ๊ท€ ๋ผ์ธ์€ 0๊ณผ 1์„ ์ œ๋Œ€๋กœ ๋ถ„๋ฅ˜ํ•˜์ง€ ๋ชปํ•˜๊ณ  ์žˆ์ง€๋งŒ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋Š” ์„ ํ˜• ํšŒ๊ท€ ๋ฐฉ์‹์„ ๊ธฐ๋ฐ˜์œผ๋กœํ•œ ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•ด ์„ฑ๋Šฅ์ด ์ข‹์€ ๋ถ„๋ฅ˜๋ฅผ ๋ณด์ด๊ณ  ์žˆ์Œ. (์ด์ง„ ๋ถ„๋ฅ˜์— ์„ ํ˜• ํšŒ๊ท€๋ผ์ธ๋ณด๋‹ค ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜ ๋ผ์ธ์ด ์ ํ•ฉ)

  • Logistic Regression ํด๋ž˜์Šค
  1. solver ํŒŒ๋ผ๋ฏธํ„ฐ
    (๋Œ€๋ถ€๋ถ„ ์‚ฌ์šฉ๋˜๋Š” 2๊ฐ€์ง€) : lbfgs, liblinear,
    ๊ธฐ๋ณธ solver ๊ฐ’์ธ lbfgs ๋ณด๋‹ค๋Š” liblinear๊ฐ€ ์ข€ ๋” ๋น ๋ฅด๊ฒŒ ์ˆ˜ํ–‰๋˜๋ฉฐ ์ˆ˜ํ–‰์„ฑ๋Šฅ์ด ๋” ๋‚˜์€ ๊ฒฐ๊ณผ ๋ณด์ž„.
    newton-cg, sag(๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ• ๊ธฐ๋ฐ˜์˜ ์ตœ์ ํ™”๋ฅผ ์ ์šฉ), saga (L1 ์ •๊ทœํ™”)
  2. ์ฃผ์š” ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ
    penalty - ๊ทœ์ œ์˜ ์œ ํ˜•์„ ์„ค์ •(default=l2, 'l2'= L2๊ทœ์ œ, 'l1' = L1 ๊ทœ์ œ)
    C - 1/alpha (C๊ฐ’์ด ์ž‘์„ ์ˆ˜๋ก ๊ทœ์ œ ๊ฐ•๋„๊ฐ€ ํผ)
  • ์œ ๋ฐฉ์•” ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋กœ ์•”์—ฌ๋ถ€ ํŒ๋‹จ
  1. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ
    ์ •๊ทœ๋ถ„ํฌ ํ˜•ํƒœ์˜ ํ‘œ์ค€ ์Šค์ผ€์ผ๋ง์„ ์ ์šฉ & ํ›ˆ๋ จ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ์„ธํŠธ ๋ถ„๋ฆฌ
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# StandardScaler( )๋กœ ํ‰๊ท ์ด 0, ๋ถ„์‚ฐ 1๋กœ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋„ ๋ณ€ํ™˜
scaler = StandardScaler()
data_scaled = scaler.fit_transform(cancer.data)

X_train , X_test, y_train , y_test = train_test_split(data_scaled, cancer.target, test_size=0.3, random_state=0)
  1. ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ํ•™์Šต ๋ฐ ์˜ˆ์ธก ์ˆ˜ํ–‰
    ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜๋กœ 2๊ฐœ ํด๋ž˜์Šค๋กœ ์ž๋™ ๋ถ„๋ฅ˜

  2. ์ •ํ™•๋„์™€ ROC-AUC ๊ฐ’ ๊ณ„์‚ฐ

from sklearn.metrics import accuracy_score, roc_auc_score

# ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋ฅผ ์ด์šฉํ•˜์—ฌ ํ•™์Šต ๋ฐ ์˜ˆ์ธก ์ˆ˜ํ–‰. 
# solver์ธ์ž๊ฐ’์„ ์ƒ์„ฑ์ž๋กœ ์ž…๋ ฅํ•˜์ง€ ์•Š์œผ๋ฉด solver='lbfgs'  
lr_clf = LogisticRegression() # solver='lbfgs'
lr_clf.fit(X_train, y_train)
lr_preds = lr_clf.predict(X_test)
lr_preds_proba = lr_clf.predict_proba(X_test)[:, 1]

# accuracy์™€ roc_auc ์ธก์ •
print('accuracy: {0:.3f}, roc_auc:{1:.3f}'.format(accuracy_score(y_test, lr_preds),
                                                 roc_auc_score(y_test , lr_preds_proba)))
[Output]
accuracy: 0.977, roc_auc:0.995

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋Š” ๊ฐ€๋ณ๊ณ  ๋น ๋ฅด์ง€๋งŒ, ์ด์ง„ ๋ถ„๋ฅ˜ ์˜ˆ์ธก ์„ฑ๋Šฅ์ด ๋›ฐ์–ด๋‚˜ ์ด์ง„ ๋ถ„๋ฅ˜์˜ ๊ธฐ๋ณธ๋ชจ๋ธ๋กœ ์ฃผ๋กœ ์‚ฌ์šฉ

08. ํšŒ๊ท€ ํŠธ๋ฆฌ

(์ด์ „๊นŒ์ง€๋Š”...)
์„ ํ˜• ํšŒ๊ท€๋Š” ํšŒ๊ท€ ๊ณ„์ˆ˜๋ฅผ ์„ ํ˜•์œผ๋กœ ๊ฒฐํ•ฉํ•˜๋Š” ํšŒ๊ท€ ํ•จ์ˆ˜๋ฅผ ๊ตฌํ•ด, ์—ฌ๊ธฐ์— ๋…๋ฆฝ๋ณ€์ˆ˜๋ฅผ ์ž…๋ ฅํ•ด ๊ฒฐ๊ด๊ฐ’์„ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ. ๋น„์„ ํ˜• ํšŒ๊ท€๋Š” ๋น„์„ ํ˜•์œผ๋กœ ํšŒ๊ท€ ๊ณ„์ˆ˜๋ฅผ ๊ฒฐํ•ฉํ•ด ๊ฒฐ๊ด๊ฐ’์„ ์˜ˆ์ธก

๋ถ„๋ฅ˜ ํŠธ๋ฆฌ๊ฐ€ ํŠน์ • ํด๋ž˜์Šค ๋ ˆ์ด๋ธ”์„ ๊ฒฐ์ •ํ•˜๋Š” ๊ฒƒ๊ณผ๋Š” ๋‹ฌ๋ฆฌ ํšŒ๊ท€ ํŠธ๋ฆฌ๋Š” ๋ฆฌํ”„ ๋…ธ๋“œ์— ์†ํ•œ ๋ฐ์ดํ„ฐ ๊ฐ’์˜ ํ‰๊ท ๊ฐ’์„ ๊ตฌํ•ด ํšŒ๊ท€ ์˜ˆ์ธก๊ฐ’์„ ๊ณ„์‚ฐ

  1. X๊ฐ’์˜ ๊ท ์ผ๋„(์ง€๋‹ˆ๊ณ„์ˆ˜)์— ๋”ฐ๋ผ ๋ถ„ํ• 
  1. ํŠธ๋ฆฌ ๋ถ„ํ• ์ด ์™„๋ฃŒ๋๋‹ค๋ฉด ๋ฆฌํ”„ ๋…ธ๋“œ์— ์†Œ์†๋œ ๋ฐ์ดํ„ฐ ๊ฐ’์˜ ํ‰๊ท ๊ฐ’์„ ๊ตฌํ•ด์„œ ์ตœ์ข…์ ์œผ๋กœ ๋ฆฌํ”„๋…ธ๋“œ์— ๊ฒฐ์ • ๊ฐ’์œผ๋กœ ํ• ๋‹น
  • ๋ถ„๋ฅ˜ ํŠธ๋ฆฌ ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜

๊ฒฐ์ •ํŠธ๋ฆฌ, ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ, GBM, XGBoost, LightGBM ๋“ฑ ๋ถ„๋ฅ˜์—์„œ ์‚ฌ์šฉํ•œ ํŠธ๋ฆฌ ๊ธฐ๋ฐ˜์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ถ„๋ฅ˜๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ํšŒ๊ท€๋„ ๊ฐ€๋Šฅ (CART ์•Œ๊ณ ๋ฆฌ์ฆ˜, Classification And Regression Trees)

  • RandomForestRegressor ๋ณด์Šคํ„ด ์ฃผํƒ๊ฐ€๊ฒฉ ์˜ˆ์ธก

  • GBM, XGBoost, LightGBM, ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ
def get_model_cv_prediction(model, X_data, y_target):
    neg_mse_scores = cross_val_score(model, X_data, y_target, scoring="neg_mean_squared_error", cv = 5)
    rmse_scores  = np.sqrt(-1 * neg_mse_scores)
    avg_rmse = np.mean(rmse_scores)
    print('##### ',model.__class__.__name__ , ' #####')
    print(' 5 ๊ต์ฐจ ๊ฒ€์ฆ์˜ ํ‰๊ท  RMSE : {0:.3f} '.format(avg_rmse))

์ž…๋ ฅ ๋ชจ๋ธ๊ณผ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ ๊ต์ฐจ๊ฒ€์ฆ์œผ๋กœ RMSE ๊ณ„์‚ฐ

from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import GradientBoostingRegressor
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor

dt_reg = DecisionTreeRegressor(random_state=0, max_depth=4)
rf_reg = RandomForestRegressor(random_state=0, n_estimators=1000)
gb_reg = GradientBoostingRegressor(random_state=0, n_estimators=1000)
xgb_reg = XGBRegressor(n_estimators=1000)
lgb_reg = LGBMRegressor(n_estimators=1000)

# ํŠธ๋ฆฌ ๊ธฐ๋ฐ˜์˜ ํšŒ๊ท€ ๋ชจ๋ธ์„ ๋ฐ˜๋ณตํ•˜๋ฉด์„œ ํ‰๊ฐ€ ์ˆ˜ํ–‰ 
models = [dt_reg, rf_reg, gb_reg, xgb_reg, lgb_reg]
for model in models:  
    get_model_cv_prediction(model, X_data, y_target)
[Output]
##### DecisionTreeRegressor #####
5 ๊ต์ฐจ ๊ฒ€์ฆ์˜ ํ‰๊ท  RMSE : 5.978
##### RandomForestRegressor #####
5 ๊ต์ฐจ ๊ฒ€์ฆ์˜ ํ‰๊ท  RMSE : 4.423
##### GradientBoostingRegressor #####
5 ๊ต์ฐจ ๊ฒ€์ฆ์˜ ํ‰๊ท  RMSE : 4.269
##### XGBRegressor #####
5 ๊ต์ฐจ ๊ฒ€์ฆ์˜ ํ‰๊ท  RMSE : 4.251
##### LGBMRegressor #####
5 ๊ต์ฐจ ๊ฒ€์ฆ์˜ ํ‰๊ท  RMSE : 4.646
  • ์„ ํ˜•ํšŒ๊ท€์™€ ํšŒ๊ท€ ํŠธ๋ฆฌ ๋น„๊ต

ํšŒ๊ท€ ํŠธ๋ฆฌ Regressor๊ฐ€ ์–ด๋–ป๊ฒŒ ์˜ˆ์ธก๊ฐ’์„ ํŒ๋‹จํ•˜๋Š” ์ง€ ์„ ํ˜•ํšŒ๊ท€์™€ ๋น„๊ตํ•ด ์‹œ๊ฐํ™”

import numpy as np
from sklearn.linear_model import LinearRegression

# ์„ ํ˜• ํšŒ๊ท€์™€ ๊ฒฐ์ • ํŠธ๋ฆฌ ๊ธฐ๋ฐ˜์˜ Regressor ์ƒ์„ฑ. DecisionTreeRegressor์˜ max_depth๋Š” ๊ฐ๊ฐ 2, 7
lr_reg = LinearRegression()
rf_reg2 = DecisionTreeRegressor(max_depth=2)
rf_reg7 = DecisionTreeRegressor(max_depth=7)

# ์‹ค์ œ ์˜ˆ์ธก์„ ์ ์šฉํ•  ํ…Œ์ŠคํŠธ์šฉ ๋ฐ์ดํ„ฐ ์…‹์„ 4.5 ~ 8.5 ๊นŒ์ง€ 100๊ฐœ ๋ฐ์ดํ„ฐ ์…‹ ์ƒ์„ฑ. 
X_test = np.arange(4.5, 8.5, 0.04).reshape(-1, 1)

# ๋ณด์Šคํ„ด ์ฃผํƒ๊ฐ€๊ฒฉ ๋ฐ์ดํ„ฐ์—์„œ ์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•ด ํ”ผ์ฒ˜๋Š” RM๋งŒ, ๊ทธ๋ฆฌ๊ณ  ๊ฒฐ์ • ๋ฐ์ดํ„ฐ์ธ PRICE ์ถ”์ถœ
X_feature = bostonDF_sample['RM'].values.reshape(-1,1)
y_target = bostonDF_sample['PRICE'].values.reshape(-1,1)

# ํ•™์Šต๊ณผ ์˜ˆ์ธก ์ˆ˜ํ–‰. 
lr_reg.fit(X_feature, y_target)
rf_reg2.fit(X_feature, y_target)
rf_reg7.fit(X_feature, y_target)

pred_lr = lr_reg.predict(X_test)
pred_rf2 = rf_reg2.predict(X_test)
pred_rf7 = rf_reg7.predict(X_test)

๊ฒฐ์ • ํŠธ๋ฆฌ์˜ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ์ธ max_depth ์˜ˆ์ธก๊ฐ’ ํ™•์ธ


ํšŒ๊ท€ํŠธ๋ฆฌ์˜ ๊ฒฝ์šฐ ๋ถ„ํ• ๋˜๋Š” ๋ฐ์ดํ„ฐ ์ง€์ ์— ๋”ฐ๋ผ ๋ธŒ๋žœ์น˜๋ฅผ ๋งŒ๋“ค๋ฉด์„œ ๊ณ„๋‹จ ํ˜•ํƒœ๋กœ ํšŒ๊ท€์„ ์„ ์ƒ์„ฑ

profile
for well-being we need nectar and ambrosia

0๊ฐœ์˜ ๋Œ“๊ธ€