새싹 인공지능 응용sw 개발자 양성 교육 프로그램 심선조 강사님 수업 정리 글입니다.
XGBoost와 함께 각광
부스팅계열은 앙상블보다 하이퍼파라미터 튜닝이 더 많이 들어가야 한다.
하이퍼 파라미터 값이 더 많다.
병렬처리가 가능하다.
-교재 245p
가장 큰 장점은 XGBoost보다 학습에 걸리는 시간이 훨씬 시간이 적다.
메모리사용량 상대적으로 적다.
한 가지 단점은 적은 데이터 세트의 기준의 애매하지만, 일반적으로 10,000건 이하는 과적합이 발생하기 쉽다.
속도를 빠르게 하기 위해서 균형트리를 쓰지 않고
균형트리의 분할을 깊이 값이
import lightgbm
lightgbm.__version__ #lightgbm 버전확인
'3.2.1'
균형트리가 아니라 가지치기를 해서 max_Depth 다른 것보다 더 깊은 값을 가져야 한다.
-objective: 최솟값을 가져야 할 손실함수를 정의, 손실함수 값이 적어야 좋은 것
과적합을 방지하기 위해서 하이퍼파라미터 튜닝
lightgbm 버전 다를 수 있다.
from lightgbm import LGBMClassifier
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
dataset = load_breast_cancer(as_frame=True)
X = dataset.data
y = dataset.target
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=156)
X_tr,X_val,y_tr,y_val = train_test_split(X_train,y_train,test_size=0.1,random_state=156) #9:1로 분류
lgbm = LGBMClassifier(n_estimators=400,learning_rate=0.05) #사이킷런 래퍼
evals = [(X_tr,y_tr),(X_val,y_val)]
lgbm.fit(X_tr,y_tr,early_stopping_rounds=50,eval_metric='logloss',eval_set=evals,verbose=True)
[1] training's binary_logloss: 0.625671 valid_1's binary_logloss: 0.628248
Training until validation scores don't improve for 50 rounds
[2] training's binary_logloss: 0.588173 valid_1's binary_logloss: 0.601106
[3] training's binary_logloss: 0.554518 valid_1's binary_logloss: 0.577587
[4] training's binary_logloss: 0.523972 valid_1's binary_logloss: 0.556324
[5] training's binary_logloss: 0.49615 valid_1's binary_logloss: 0.537407
[6] training's binary_logloss: 0.470108 valid_1's binary_logloss: 0.519401
[7] training's binary_logloss: 0.446647 valid_1's binary_logloss: 0.502637
[8] training's binary_logloss: 0.425055 valid_1's binary_logloss: 0.488311
[9] training's binary_logloss: 0.405125 valid_1's binary_logloss: 0.474664
[10] training's binary_logloss: 0.386526 valid_1's binary_logloss: 0.461267
[11] training's binary_logloss: 0.367027 valid_1's binary_logloss: 0.444274
[12] training's binary_logloss: 0.350713 valid_1's binary_logloss: 0.432755
[13] training's binary_logloss: 0.334601 valid_1's binary_logloss: 0.421371
[14] training's binary_logloss: 0.319854 valid_1's binary_logloss: 0.411418
[15] training's binary_logloss: 0.306374 valid_1's binary_logloss: 0.402989
[16] training's binary_logloss: 0.293116 valid_1's binary_logloss: 0.393973
[17] training's binary_logloss: 0.280812 valid_1's binary_logloss: 0.384801
[18] training's binary_logloss: 0.268352 valid_1's binary_logloss: 0.376191
[19] training's binary_logloss: 0.256942 valid_1's binary_logloss: 0.368378
[20] training's binary_logloss: 0.246443 valid_1's binary_logloss: 0.362062
[21] training's binary_logloss: 0.236874 valid_1's binary_logloss: 0.355162
[22] training's binary_logloss: 0.227501 valid_1's binary_logloss: 0.348933
[23] training's binary_logloss: 0.218988 valid_1's binary_logloss: 0.342819
[24] training's binary_logloss: 0.210621 valid_1's binary_logloss: 0.337386
[25] training's binary_logloss: 0.202076 valid_1's binary_logloss: 0.331523
[26] training's binary_logloss: 0.194199 valid_1's binary_logloss: 0.326349
[27] training's binary_logloss: 0.187107 valid_1's binary_logloss: 0.322785
[28] training's binary_logloss: 0.180535 valid_1's binary_logloss: 0.317877
[29] training's binary_logloss: 0.173834 valid_1's binary_logloss: 0.313928
[30] training's binary_logloss: 0.167198 valid_1's binary_logloss: 0.310105
[31] training's binary_logloss: 0.161229 valid_1's binary_logloss: 0.307107
[32] training's binary_logloss: 0.155494 valid_1's binary_logloss: 0.303837
[33] training's binary_logloss: 0.149125 valid_1's binary_logloss: 0.300315
[34] training's binary_logloss: 0.144045 valid_1's binary_logloss: 0.297816
[35] training's binary_logloss: 0.139341 valid_1's binary_logloss: 0.295387
[36] training's binary_logloss: 0.134625 valid_1's binary_logloss: 0.293063
[37] training's binary_logloss: 0.129167 valid_1's binary_logloss: 0.289127
[38] training's binary_logloss: 0.12472 valid_1's binary_logloss: 0.288697
[39] training's binary_logloss: 0.11974 valid_1's binary_logloss: 0.28576
[40] training's binary_logloss: 0.115054 valid_1's binary_logloss: 0.282853
[41] training's binary_logloss: 0.110662 valid_1's binary_logloss: 0.279441
[42] training's binary_logloss: 0.106358 valid_1's binary_logloss: 0.28113
[43] training's binary_logloss: 0.102324 valid_1's binary_logloss: 0.279139
[44] training's binary_logloss: 0.0985699 valid_1's binary_logloss: 0.276465
[45] training's binary_logloss: 0.094858 valid_1's binary_logloss: 0.275946
[46] training's binary_logloss: 0.0912486 valid_1's binary_logloss: 0.272819
[47] training's binary_logloss: 0.0883115 valid_1's binary_logloss: 0.272306
[48] training's binary_logloss: 0.0849963 valid_1's binary_logloss: 0.270452
[49] training's binary_logloss: 0.0821742 valid_1's binary_logloss: 0.268671
[50] training's binary_logloss: 0.0789991 valid_1's binary_logloss: 0.267587
[51] training's binary_logloss: 0.0761072 valid_1's binary_logloss: 0.26626
[52] training's binary_logloss: 0.0732567 valid_1's binary_logloss: 0.265542
[53] training's binary_logloss: 0.0706388 valid_1's binary_logloss: 0.264547
[54] training's binary_logloss: 0.0683911 valid_1's binary_logloss: 0.26502
[55] training's binary_logloss: 0.0659347 valid_1's binary_logloss: 0.264388
[56] training's binary_logloss: 0.0636873 valid_1's binary_logloss: 0.263128
[57] training's binary_logloss: 0.0613354 valid_1's binary_logloss: 0.26231
[58] training's binary_logloss: 0.0591944 valid_1's binary_logloss: 0.262011
[59] training's binary_logloss: 0.057033 valid_1's binary_logloss: 0.261454
[60] training's binary_logloss: 0.0550801 valid_1's binary_logloss: 0.260746
[61] training's binary_logloss: 0.0532381 valid_1's binary_logloss: 0.260236
[62] training's binary_logloss: 0.0514074 valid_1's binary_logloss: 0.261586
[63] training's binary_logloss: 0.0494837 valid_1's binary_logloss: 0.261797
[64] training's binary_logloss: 0.0477826 valid_1's binary_logloss: 0.262533
[65] training's binary_logloss: 0.0460364 valid_1's binary_logloss: 0.263305
[66] training's binary_logloss: 0.0444552 valid_1's binary_logloss: 0.264072
[67] training's binary_logloss: 0.0427638 valid_1's binary_logloss: 0.266223
[68] training's binary_logloss: 0.0412449 valid_1's binary_logloss: 0.266817
[69] training's binary_logloss: 0.0398589 valid_1's binary_logloss: 0.267819
[70] training's binary_logloss: 0.0383095 valid_1's binary_logloss: 0.267484
[71] training's binary_logloss: 0.0368803 valid_1's binary_logloss: 0.270233
[72] training's binary_logloss: 0.0355637 valid_1's binary_logloss: 0.268442
[73] training's binary_logloss: 0.0341747 valid_1's binary_logloss: 0.26895
[74] training's binary_logloss: 0.0328302 valid_1's binary_logloss: 0.266958
[75] training's binary_logloss: 0.0317853 valid_1's binary_logloss: 0.268091
[76] training's binary_logloss: 0.0305626 valid_1's binary_logloss: 0.266419
[77] training's binary_logloss: 0.0295001 valid_1's binary_logloss: 0.268588
[78] training's binary_logloss: 0.0284699 valid_1's binary_logloss: 0.270964
[79] training's binary_logloss: 0.0273953 valid_1's binary_logloss: 0.270293
[80] training's binary_logloss: 0.0264668 valid_1's binary_logloss: 0.270523
[81] training's binary_logloss: 0.0254636 valid_1's binary_logloss: 0.270683
[82] training's binary_logloss: 0.0245911 valid_1's binary_logloss: 0.273187
[83] training's binary_logloss: 0.0236486 valid_1's binary_logloss: 0.275994
[84] training's binary_logloss: 0.0228047 valid_1's binary_logloss: 0.274053
[85] training's binary_logloss: 0.0221693 valid_1's binary_logloss: 0.273211
[86] training's binary_logloss: 0.0213043 valid_1's binary_logloss: 0.272626
[87] training's binary_logloss: 0.0203934 valid_1's binary_logloss: 0.27534
[88] training's binary_logloss: 0.0195552 valid_1's binary_logloss: 0.276228
[89] training's binary_logloss: 0.0188623 valid_1's binary_logloss: 0.27525
[90] training's binary_logloss: 0.0183664 valid_1's binary_logloss: 0.276485
[91] training's binary_logloss: 0.0176788 valid_1's binary_logloss: 0.277052
[92] training's binary_logloss: 0.0170059 valid_1's binary_logloss: 0.277686
[93] training's binary_logloss: 0.0164317 valid_1's binary_logloss: 0.275332
[94] training's binary_logloss: 0.015878 valid_1's binary_logloss: 0.276236
[95] training's binary_logloss: 0.0152959 valid_1's binary_logloss: 0.274538
[96] training's binary_logloss: 0.0147216 valid_1's binary_logloss: 0.275244
[97] training's binary_logloss: 0.0141758 valid_1's binary_logloss: 0.275829
[98] training's binary_logloss: 0.0136551 valid_1's binary_logloss: 0.276654
[99] training's binary_logloss: 0.0131585 valid_1's binary_logloss: 0.277859
[100] training's binary_logloss: 0.0126961 valid_1's binary_logloss: 0.279265
[101] training's binary_logloss: 0.0122421 valid_1's binary_logloss: 0.276695
[102] training's binary_logloss: 0.0118067 valid_1's binary_logloss: 0.278488
[103] training's binary_logloss: 0.0113994 valid_1's binary_logloss: 0.278932
[104] training's binary_logloss: 0.0109799 valid_1's binary_logloss: 0.280997
[105] training's binary_logloss: 0.0105953 valid_1's binary_logloss: 0.281454
[106] training's binary_logloss: 0.0102381 valid_1's binary_logloss: 0.282058
[107] training's binary_logloss: 0.00986714 valid_1's binary_logloss: 0.279275
[108] training's binary_logloss: 0.00950998 valid_1's binary_logloss: 0.281427
[109] training's binary_logloss: 0.00915965 valid_1's binary_logloss: 0.280752
[110] training's binary_logloss: 0.00882581 valid_1's binary_logloss: 0.282152
[111] training's binary_logloss: 0.00850714 valid_1's binary_logloss: 0.280894
Early stopping, best iteration is:
[61] training's binary_logloss: 0.0532381 valid_1's binary_logloss: 0.260236
LGBMClassifier(learning_rate=0.05, n_estimators=400)
pred = lgbm.predict(X_test)
pred_proba = lgbm.predict_proba(X_test)[:,1] #행은 전부다, 열은 2번째 것을 가져와 쓰겠다. 1이 될 확률
def get_clf_eval(y_test,pred,pred_proba_1): #(y_test,pred) 지역변수 pred = 결정값,pred_proba_1=확률값?
from sklearn.metrics import accuracy_score,precision_score,recall_score,confusion_matrix,f1_score,roc_auc_score
confusion = confusion_matrix(y_test,pred)
accuracy = accuracy_score(y_test,pred)
precision = precision_score(y_test,pred)
recall = recall_score(y_test,pred)
f1 = f1_score(y_test,pred)
auc = roc_auc_score(y_test,pred_proba_1)
print('오차행렬')
print(confusion)
print(f'정확도:{accuracy:.4f}, 정밀도:{precision:.4f}, 재현율:{recall:.4f}, F1:{f1:.4f}, AUC:{auc:.4f}')
get_clf_eval(y_test,pred,pred_proba)
오차행렬
[[34 3]
[ 2 75]]
정확도:0.9561, 정밀도:0.9615, 재현율:0.9740, F1:0.9677, AUC:0.9877
model = XGBClassifier(n_estimators=500,learning_rate=0.05,max_depth=3)
evals=[(X_tr,y_tr),(X_val,y_val)]#리스트에 튜플로 구성 #검증에 쓸 것
model.fit(X_tr, #train중에 90:10 으로 나눈 것
y_tr,
verbose=True,
eval_set=evals,
early_stopping_rounds=50, #조기종료
eval_metric='logloss') #평가
pred = model.predict(X_test)
pred_proba = model.predict_proba(X_test)
get_clf_eval(y_test,pred,pred_proba[:,1])
와 비교
오차행렬
[[34 3][ 2 75]]
정확도:0.9561, 정밀도:0.9615, 재현율:0.9740, F1:0.9677, AUC:0.9933
lightgbm.plot_importance(lgbm)
<AxesSubplot:title={'center':'Feature importance'}, xlabel='Feature importance', ylabel='Features'>
lightgbm.plot_tree(lgbm)
<AxesSubplot:>
함수를 점점 정답에 가깝게 하게끔
-교재 255p
검정색 점 실제 파라미터 값
주황색 선 우리가 찾아야 하는 것
파란색 선은 관측된 데이터를 통해 예측한 값
파란색 색깔범위는 신뢰 구간
값을 하나 추천받아서 실제 관측된 값을 확인하고 파랑색 목표예측함수를 수정해나간다. 주황색 선에 가깝에 예측되도록
기본제공이 아니라서 설치해야 한다.
적용해야 할 하이퍼 파라미터와 검색 공간을 설정
목표함수 : 왼쪽 그림
추천함수의 값, 목표함수, fmin(짝은쪽으로 움직인다.)
import hyperopt
hyperopt.__version__
'0.2.7'
dataset = load_breast_cancer(as_frame=True)
X = dataset.data
y = dataset.target
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=156)
X_tr,X_val,y_tr,y_val = train_test_split(X_train,y_train,test_size=0.1,random_state=156) #9:1로 분류
from hyperopt import hp
from sklearn.model_selection import cross_val_score
from xgboost import XGBClassifier
from hyperopt import STATUS_OK
import warnings
warnings.filterwarnings('ignore')
search_space = { #params와 비슷
'max_depth':hp.quniform('max_depth',5,20,1), #정규분포형태로 데이터를 뽑는다.
'min_child_weigh':hp.quniform('min_child_weight',1,2,1),
'learning_rete':hp.uniform('learning_rete',0.01,0.2),
'colsample_bytree':hp.uniform('colsample_bytree',0.5,1), #랜덤으로 추출되서 들어간다.
}
교재 257p
hp.quniform : label로 지정된 입력값 변수 검색 공간을 최소값 low에서 최댓값 high까지의 q의 간견을 가지고 설정#uniform이라서 정규분포형태로 추출
hp.uniform
hp.randint
hp.loguniform
hp.choice
교재 264p
cross_val_score 교차검증해서 정수값 보여줌
주황색 점선이 우리가 찾고 자하는 것
파랑색 실선이 우리의 상태
파랑생 실선 -> 주황색 점선 찾는 것이 목적함수이다.
def objective_func(search_space): #objective_func = 목적함수
xgb_clf = XGBClassifier(n_estimators=100,
max_depth=int(search_space['max_depth']), #실수값으로 바꿔서
min_child_weight=int(search_space['min_child_weigh']),
learning_rate=search_space['learning_rete'],
colsample_bytree=search_space['colsample_bytree'],
eval_metric='logloss')
accuracy = cross_val_score(xgb_clf,X_train,y_train,scoring='accuracy',cv=3) #큰값을 작게 만들기 위해 -1곱함
return {'loss':-1 *np.mean(accuracy),'status':STATUS_OK} #-1 곱하기, 정확도는 높은게 좋음. cross_val_score가 정확도이기 때문에 큰 값이 좋은 것이다.'logloss'는 작은 게 좋아서 -1곱함
#작은 쪽으로 이동 fmin #값이 작은 쪽으로 학습한다.
from hyperopt import fmin,tpe,Trials
import numpy as np
trial_val = Trials()
best = fmin(fn=objective_func, #목적함수
space=search_space,
algo=tpe.suggest,
max_evals=50,
trials=trial_val,
rstate=np.random.default_rng(seed=9)) #값들을 넣어서 #rstate = random state #주석처리하면 할 때마다 값이 달라짐
100%|███████████████████████████████████████████████| 50/50 [00:07<00:00, 6.84trial/s, best loss: -0.9670616939700244]
best #gridsearchcv보다 빠르다. #최적의 파라미터를 찾음
{'colsample_bytree': 0.5017622652385679,
'learning_rete': 0.17202786449634327,
'max_depth': 11.0,
'min_child_weight': 2.0}
model = XGBClassifier(n_estimators=400, #최적의 파라미터를 넣어서 모델 실행
learning_rete=round(best['learning_rete'],5), #best가 딕셔너리라서 key로 접근
max_depth=int(best['max_depth']),
min_child_weight=int(best['min_child_weight']),
colsample_bytree=round(best['colsample_bytree'],5)
)
evals=[(X_tr,y_tr),(X_val,y_val)]
model.fit(X_tr,
y_tr,
verbose=True,
eval_set=evals,
early_stopping_rounds=50,
eval_metric='logloss')
pred = model.predict(X_test)
pred_proba = model.predict_proba(X_test)
get_clf_eval(y_test,pred,pred_proba[:,1])
[13:08:54] WARNING: ..\src\learner.cc:576:
Parameters: { "learning_rete" } might not be used.
This could be a false alarm, with some parameters getting used by language bindings but
then being mistakenly passed down to XGBoost core, or some parameter actually being used
but getting flagged wrongly here. Please open an issue if you find any such cases.
[0] validation_0-logloss:0.46780 validation_1-logloss:0.53951
[1] validation_0-logloss:0.33860 validation_1-logloss:0.45055
[2] validation_0-logloss:0.25480 validation_1-logloss:0.38982
[3] validation_0-logloss:0.19908 validation_1-logloss:0.36525
[4] validation_0-logloss:0.15836 validation_1-logloss:0.34947
[5] validation_0-logloss:0.12936 validation_1-logloss:0.33215
[6] validation_0-logloss:0.10800 validation_1-logloss:0.32261
[7] validation_0-logloss:0.09188 validation_1-logloss:0.31803
[8] validation_0-logloss:0.07969 validation_1-logloss:0.31458
[9] validation_0-logloss:0.06982 validation_1-logloss:0.29838
[10] validation_0-logloss:0.06112 validation_1-logloss:0.29127
[11] validation_0-logloss:0.05569 validation_1-logloss:0.29192
[12] validation_0-logloss:0.04953 validation_1-logloss:0.29192
[13] validation_0-logloss:0.04482 validation_1-logloss:0.28254
[14] validation_0-logloss:0.04086 validation_1-logloss:0.28237
[15] validation_0-logloss:0.03751 validation_1-logloss:0.28031
[16] validation_0-logloss:0.03485 validation_1-logloss:0.26671
[17] validation_0-logloss:0.03265 validation_1-logloss:0.26695
[18] validation_0-logloss:0.03086 validation_1-logloss:0.26435
[19] validation_0-logloss:0.02923 validation_1-logloss:0.26792
[20] validation_0-logloss:0.02784 validation_1-logloss:0.26543
[21] validation_0-logloss:0.02651 validation_1-logloss:0.26652
[22] validation_0-logloss:0.02606 validation_1-logloss:0.26353
[23] validation_0-logloss:0.02520 validation_1-logloss:0.26226
[24] validation_0-logloss:0.02427 validation_1-logloss:0.25847
[25] validation_0-logloss:0.02377 validation_1-logloss:0.26393
[26] validation_0-logloss:0.02296 validation_1-logloss:0.26746
[27] validation_0-logloss:0.02264 validation_1-logloss:0.26769
[28] validation_0-logloss:0.02236 validation_1-logloss:0.27243
[29] validation_0-logloss:0.02116 validation_1-logloss:0.26105
[30] validation_0-logloss:0.02091 validation_1-logloss:0.26321
[31] validation_0-logloss:0.02065 validation_1-logloss:0.25900
[32] validation_0-logloss:0.02042 validation_1-logloss:0.25218
[33] validation_0-logloss:0.02014 validation_1-logloss:0.25071
[34] validation_0-logloss:0.01987 validation_1-logloss:0.25543
[35] validation_0-logloss:0.01962 validation_1-logloss:0.25458
[36] validation_0-logloss:0.01940 validation_1-logloss:0.25232
[37] validation_0-logloss:0.01918 validation_1-logloss:0.25137
[38] validation_0-logloss:0.01892 validation_1-logloss:0.25232
[39] validation_0-logloss:0.01872 validation_1-logloss:0.25478
[40] validation_0-logloss:0.01852 validation_1-logloss:0.24843
[41] validation_0-logloss:0.01835 validation_1-logloss:0.25245
[42] validation_0-logloss:0.01818 validation_1-logloss:0.24804
[43] validation_0-logloss:0.01800 validation_1-logloss:0.24821
[44] validation_0-logloss:0.01782 validation_1-logloss:0.24605
[45] validation_0-logloss:0.01765 validation_1-logloss:0.24532
[46] validation_0-logloss:0.01749 validation_1-logloss:0.24621
[47] validation_0-logloss:0.01736 validation_1-logloss:0.24575
[48] validation_0-logloss:0.01722 validation_1-logloss:0.24590
[49] validation_0-logloss:0.01707 validation_1-logloss:0.24403
[50] validation_0-logloss:0.01695 validation_1-logloss:0.24420
[51] validation_0-logloss:0.01679 validation_1-logloss:0.24644
[52] validation_0-logloss:0.01668 validation_1-logloss:0.24758
[53] validation_0-logloss:0.01652 validation_1-logloss:0.24236
[54] validation_0-logloss:0.01640 validation_1-logloss:0.23969
[55] validation_0-logloss:0.01630 validation_1-logloss:0.23905
[56] validation_0-logloss:0.01619 validation_1-logloss:0.23847
[57] validation_0-logloss:0.01607 validation_1-logloss:0.23958
[58] validation_0-logloss:0.01594 validation_1-logloss:0.24174
[59] validation_0-logloss:0.01584 validation_1-logloss:0.24002
[60] validation_0-logloss:0.01573 validation_1-logloss:0.23589
[61] validation_0-logloss:0.01561 validation_1-logloss:0.23594
[62] validation_0-logloss:0.01552 validation_1-logloss:0.23950
[63] validation_0-logloss:0.01542 validation_1-logloss:0.23957
[64] validation_0-logloss:0.01532 validation_1-logloss:0.23573
[65] validation_0-logloss:0.01524 validation_1-logloss:0.23897
[66] validation_0-logloss:0.01515 validation_1-logloss:0.23894
[67] validation_0-logloss:0.01507 validation_1-logloss:0.23711
[68] validation_0-logloss:0.01496 validation_1-logloss:0.23724
[69] validation_0-logloss:0.01488 validation_1-logloss:0.23623
[70] validation_0-logloss:0.01482 validation_1-logloss:0.23321
[71] validation_0-logloss:0.01473 validation_1-logloss:0.23709
[72] validation_0-logloss:0.01465 validation_1-logloss:0.23816
[73] validation_0-logloss:0.01458 validation_1-logloss:0.23679
[74] validation_0-logloss:0.01452 validation_1-logloss:0.23688
[75] validation_0-logloss:0.01444 validation_1-logloss:0.23684
[76] validation_0-logloss:0.01437 validation_1-logloss:0.23980
[77] validation_0-logloss:0.01432 validation_1-logloss:0.23685
[78] validation_0-logloss:0.01424 validation_1-logloss:0.23752
[79] validation_0-logloss:0.01418 validation_1-logloss:0.23639
[80] validation_0-logloss:0.01412 validation_1-logloss:0.23636
[81] validation_0-logloss:0.01406 validation_1-logloss:0.23700
[82] validation_0-logloss:0.01401 validation_1-logloss:0.23555
[83] validation_0-logloss:0.01396 validation_1-logloss:0.23566
[84] validation_0-logloss:0.01391 validation_1-logloss:0.23430
[85] validation_0-logloss:0.01385 validation_1-logloss:0.23662
[86] validation_0-logloss:0.01379 validation_1-logloss:0.23934
[87] validation_0-logloss:0.01375 validation_1-logloss:0.23858
[88] validation_0-logloss:0.01370 validation_1-logloss:0.23759
[89] validation_0-logloss:0.01364 validation_1-logloss:0.23757
[90] validation_0-logloss:0.01358 validation_1-logloss:0.23869
[91] validation_0-logloss:0.01354 validation_1-logloss:0.23930
[92] validation_0-logloss:0.01349 validation_1-logloss:0.23792
[93] validation_0-logloss:0.01344 validation_1-logloss:0.23789
[94] validation_0-logloss:0.01339 validation_1-logloss:0.23693
[95] validation_0-logloss:0.01335 validation_1-logloss:0.23936
[96] validation_0-logloss:0.01331 validation_1-logloss:0.23997
[97] validation_0-logloss:0.01326 validation_1-logloss:0.23996
[98] validation_0-logloss:0.01322 validation_1-logloss:0.23865
[99] validation_0-logloss:0.01318 validation_1-logloss:0.23809
[100] validation_0-logloss:0.01314 validation_1-logloss:0.23908
[101] validation_0-logloss:0.01311 validation_1-logloss:0.23965
[102] validation_0-logloss:0.01307 validation_1-logloss:0.23735
[103] validation_0-logloss:0.01303 validation_1-logloss:0.23652
[104] validation_0-logloss:0.01300 validation_1-logloss:0.23871
[105] validation_0-logloss:0.01297 validation_1-logloss:0.23818
[106] validation_0-logloss:0.01294 validation_1-logloss:0.23810
[107] validation_0-logloss:0.01291 validation_1-logloss:0.23868
[108] validation_0-logloss:0.01288 validation_1-logloss:0.23957
[109] validation_0-logloss:0.01286 validation_1-logloss:0.23902
[110] validation_0-logloss:0.01283 validation_1-logloss:0.23806
[111] validation_0-logloss:0.01281 validation_1-logloss:0.23858
[112] validation_0-logloss:0.01279 validation_1-logloss:0.23779
[113] validation_0-logloss:0.01276 validation_1-logloss:0.23971
[114] validation_0-logloss:0.01274 validation_1-logloss:0.23891
[115] validation_0-logloss:0.01272 validation_1-logloss:0.23843
[116] validation_0-logloss:0.01270 validation_1-logloss:0.23919
[117] validation_0-logloss:0.01268 validation_1-logloss:0.23903
[118] validation_0-logloss:0.01266 validation_1-logloss:0.23950
[119] validation_0-logloss:0.01264 validation_1-logloss:0.23906
오차행렬
[[33 4]
[ 3 74]]
정확도:0.9386, 정밀도:0.9487, 재현율:0.9610, F1:0.9548, AUC:0.9933
오차행렬
[[33 4][ 3 74]]
정확도:0.9386, 정밀도:0.9487, 재현율:0.9610, F1:0.9548, AUC:0.9933
-교재 266p
정확도가 약 0.964~
분류,회귀모델이 같이 있다.
-교재 267p
https://www.kaggle.com/competitions/santander-customer-satisfaction
출물은 예측 확률과 관찰 대상 사이의 ROC 곡선 아래 영역에서 평가됩니다.
파일 설명
train.csv - 대상을 포함하는 훈련 세트
test.csv - 대상이 없는 테스트 세트
sample_submission.csv - 올바른 형식의 샘플 제출 파일
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
df = pd.read_csv('santander.csv',encoding='latin-1')
df.head(2)
ID | var3 | var15 | imp_ent_var16_ult1 | imp_op_var39_comer_ult1 | imp_op_var39_comer_ult3 | imp_op_var40_comer_ult1 | imp_op_var40_comer_ult3 | imp_op_var40_efect_ult1 | imp_op_var40_efect_ult3 | ... | saldo_medio_var33_hace2 | saldo_medio_var33_hace3 | saldo_medio_var33_ult1 | saldo_medio_var33_ult3 | saldo_medio_var44_hace2 | saldo_medio_var44_hace3 | saldo_medio_var44_ult1 | saldo_medio_var44_ult3 | var38 | TARGET | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 23 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 39205.17 | 0 |
1 | 3 | 2 | 34 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 49278.03 | 0 |
2 rows × 371 columns
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 76020 entries, 0 to 76019
Columns: 371 entries, ID to TARGET
dtypes: float64(111), int64(260)
memory usage: 215.2 MB
df['TARGET'].value_counts() #value_counts() = 유일값뽑음
0 73012
1 3008
Name: TARGET, dtype: int64
0 73012
1 3008
Name: TARGET, dtype: int64
분포가 불균형하기때문에 정확도만 가지고 판단하기는 어려운 자료이다.
un_cnt = df[df['TARGET']==1].TARGET.count() #컬럼별로 나온다.
total_cnt = df.TARGET.count()
un_cnt/total_cnt #불만족 4%정도
0.0395685345961589
df.describe() #내용확인
ID | var3 | var15 | imp_ent_var16_ult1 | imp_op_var39_comer_ult1 | imp_op_var39_comer_ult3 | imp_op_var40_comer_ult1 | imp_op_var40_comer_ult3 | imp_op_var40_efect_ult1 | imp_op_var40_efect_ult3 | ... | saldo_medio_var33_hace2 | saldo_medio_var33_hace3 | saldo_medio_var33_ult1 | saldo_medio_var33_ult3 | saldo_medio_var44_hace2 | saldo_medio_var44_hace3 | saldo_medio_var44_ult1 | saldo_medio_var44_ult3 | var38 | TARGET | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | ... | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 7.602000e+04 | 76020.000000 |
mean | 75964.050723 | -1523.199277 | 33.212865 | 86.208265 | 72.363067 | 119.529632 | 3.559130 | 6.472698 | 0.412946 | 0.567352 | ... | 7.935824 | 1.365146 | 12.215580 | 8.784074 | 31.505324 | 1.858575 | 76.026165 | 56.614351 | 1.172358e+05 | 0.039569 |
std | 43781.947379 | 39033.462364 | 12.956486 | 1614.757313 | 339.315831 | 546.266294 | 93.155749 | 153.737066 | 30.604864 | 36.513513 | ... | 455.887218 | 113.959637 | 783.207399 | 538.439211 | 2013.125393 | 147.786584 | 4040.337842 | 2852.579397 | 1.826646e+05 | 0.194945 |
min | 1.000000 | -999999.000000 | 5.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5.163750e+03 | 0.000000 |
25% | 38104.750000 | 2.000000 | 23.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 6.787061e+04 | 0.000000 |
50% | 76043.000000 | 2.000000 | 28.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.064092e+05 | 0.000000 |
75% | 113748.750000 | 2.000000 | 40.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.187563e+05 | 0.000000 |
max | 151838.000000 | 238.000000 | 105.000000 | 210000.000000 | 12888.030000 | 21024.810000 | 8237.820000 | 11073.570000 | 6600.000000 | 6600.000000 | ... | 50003.880000 | 20385.720000 | 138831.630000 | 91778.730000 | 438329.220000 | 24650.010000 | 681462.900000 | 397884.300000 | 2.203474e+07 | 1.000000 |
8 rows × 371 columns
df['var3'].replace(-999999,2,inplace=True) #-999999 -> 2 변경
df.drop(columns=['ID'],inplace=True) #'ID'제거
df.describe()
var3 | var15 | imp_ent_var16_ult1 | imp_op_var39_comer_ult1 | imp_op_var39_comer_ult3 | imp_op_var40_comer_ult1 | imp_op_var40_comer_ult3 | imp_op_var40_efect_ult1 | imp_op_var40_efect_ult3 | imp_op_var40_ult1 | ... | saldo_medio_var33_hace2 | saldo_medio_var33_hace3 | saldo_medio_var33_ult1 | saldo_medio_var33_ult3 | saldo_medio_var44_hace2 | saldo_medio_var44_hace3 | saldo_medio_var44_ult1 | saldo_medio_var44_ult3 | var38 | TARGET | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | ... | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 76020.000000 | 7.602000e+04 | 76020.000000 |
mean | 2.716483 | 33.212865 | 86.208265 | 72.363067 | 119.529632 | 3.559130 | 6.472698 | 0.412946 | 0.567352 | 3.160715 | ... | 7.935824 | 1.365146 | 12.215580 | 8.784074 | 31.505324 | 1.858575 | 76.026165 | 56.614351 | 1.172358e+05 | 0.039569 |
std | 9.447971 | 12.956486 | 1614.757313 | 339.315831 | 546.266294 | 93.155749 | 153.737066 | 30.604864 | 36.513513 | 95.268204 | ... | 455.887218 | 113.959637 | 783.207399 | 538.439211 | 2013.125393 | 147.786584 | 4040.337842 | 2852.579397 | 1.826646e+05 | 0.194945 |
min | 0.000000 | 5.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5.163750e+03 | 0.000000 |
25% | 2.000000 | 23.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 6.787061e+04 | 0.000000 |
50% | 2.000000 | 28.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.064092e+05 | 0.000000 |
75% | 2.000000 | 40.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.187563e+05 | 0.000000 |
max | 238.000000 | 105.000000 | 210000.000000 | 12888.030000 | 21024.810000 | 8237.820000 | 11073.570000 | 6600.000000 | 6600.000000 | 8237.820000 | ... | 50003.880000 | 20385.720000 | 138831.630000 | 91778.730000 | 438329.220000 | 24650.010000 | 681462.900000 | 397884.300000 | 2.203474e+07 | 1.000000 |
8 rows × 370 columns
X = df.iloc[:,:-1] #:-1 = 처음부터 -1(뒤에서 첫 번째)까지
y = df.iloc[:,-1]
#데이터를 학습,검증에 사용할 것으로 나눔
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=0) #데이터 양이 불균형해서 비율을 나눔 ,stratify = 비율 나눔
train_cnt = y_train.count()
test_cnt = y_test.count()
y_train.value_counts()/train_cnt
0 0.960964
1 0.039036
Name: TARGET, dtype: float64
y_test.value_counts()/train_cnt #series라서 value_counts가능
0 0.239575
1 0.010425
Name: TARGET, dtype: float64
X_tr,X_val,y_tr,y_val = train_test_split(X_train,y_train,test_size=0.3,random_state=0)
from xgboost import XGBClassifier
from sklearn.metrics import roc_auc_score
xgb_clf = XGBClassifier(n_estimators=500,learning_rate=0.05,random_state=156) #학습
xgb_clf.fit(X_tr,y_tr,early_stopping_rounds=100,eval_metric='auc',eval_set=[(X_tr,y_tr),(X_val,y_val)]) #roc는 작을수록 좋음? auc 1에 가까울 수록 좋음
[0] validation_0-auc:0.82179 validation_1-auc:0.80068
[1] validation_0-auc:0.83092 validation_1-auc:0.80941
[2] validation_0-auc:0.83207 validation_1-auc:0.80903
[3] validation_0-auc:0.83288 validation_1-auc:0.80889
[4] validation_0-auc:0.83414 validation_1-auc:0.80924
[5] validation_0-auc:0.83524 validation_1-auc:0.80907
[6] validation_0-auc:0.83568 validation_1-auc:0.81005
[7] validation_0-auc:0.83741 validation_1-auc:0.81088
[8] validation_0-auc:0.83896 validation_1-auc:0.81305
[9] validation_0-auc:0.83949 validation_1-auc:0.81363
[10] validation_0-auc:0.83908 validation_1-auc:0.81277
[11] validation_0-auc:0.83913 validation_1-auc:0.81260
[12] validation_0-auc:0.84009 validation_1-auc:0.81325
[13] validation_0-auc:0.84081 validation_1-auc:0.81329
[14] validation_0-auc:0.84196 validation_1-auc:0.81380
[15] validation_0-auc:0.84394 validation_1-auc:0.81540
[16] validation_0-auc:0.84414 validation_1-auc:0.81573
[17] validation_0-auc:0.84437 validation_1-auc:0.81577
[18] validation_0-auc:0.84468 validation_1-auc:0.81569
[19] validation_0-auc:0.84586 validation_1-auc:0.81625
[20] validation_0-auc:0.84641 validation_1-auc:0.81619
[21] validation_0-auc:0.84685 validation_1-auc:0.81611
[22] validation_0-auc:0.84735 validation_1-auc:0.81671
[23] validation_0-auc:0.84793 validation_1-auc:0.81682
[24] validation_0-auc:0.84825 validation_1-auc:0.81675
[25] validation_0-auc:0.84893 validation_1-auc:0.81647
[26] validation_0-auc:0.85104 validation_1-auc:0.81724
[27] validation_0-auc:0.85206 validation_1-auc:0.81764
[28] validation_0-auc:0.85327 validation_1-auc:0.81873
[29] validation_0-auc:0.85425 validation_1-auc:0.82038
[30] validation_0-auc:0.85624 validation_1-auc:0.82231
[31] validation_0-auc:0.85716 validation_1-auc:0.82223
[32] validation_0-auc:0.85785 validation_1-auc:0.82261
[33] validation_0-auc:0.85878 validation_1-auc:0.82289
[34] validation_0-auc:0.85931 validation_1-auc:0.82389
[35] validation_0-auc:0.86006 validation_1-auc:0.82446
[36] validation_0-auc:0.86079 validation_1-auc:0.82537
[37] validation_0-auc:0.86101 validation_1-auc:0.82546
[38] validation_0-auc:0.86156 validation_1-auc:0.82593
[39] validation_0-auc:0.86224 validation_1-auc:0.82610
[40] validation_0-auc:0.86284 validation_1-auc:0.82603
[41] validation_0-auc:0.86314 validation_1-auc:0.82624
[42] validation_0-auc:0.86388 validation_1-auc:0.82694
[43] validation_0-auc:0.86493 validation_1-auc:0.82741
[44] validation_0-auc:0.86557 validation_1-auc:0.82757
[45] validation_0-auc:0.86643 validation_1-auc:0.82795
[46] validation_0-auc:0.86733 validation_1-auc:0.82860
[47] validation_0-auc:0.86788 validation_1-auc:0.82878
[48] validation_0-auc:0.86815 validation_1-auc:0.82881
[49] validation_0-auc:0.86902 validation_1-auc:0.83000
[50] validation_0-auc:0.86956 validation_1-auc:0.83040
[51] validation_0-auc:0.86992 validation_1-auc:0.83036
[52] validation_0-auc:0.87037 validation_1-auc:0.83061
[53] validation_0-auc:0.87088 validation_1-auc:0.83071
[54] validation_0-auc:0.87157 validation_1-auc:0.83092
[55] validation_0-auc:0.87206 validation_1-auc:0.83143
[56] validation_0-auc:0.87277 validation_1-auc:0.83170
[57] validation_0-auc:0.87329 validation_1-auc:0.83171
[58] validation_0-auc:0.87369 validation_1-auc:0.83168
[59] validation_0-auc:0.87428 validation_1-auc:0.83172
[60] validation_0-auc:0.87489 validation_1-auc:0.83166
[61] validation_0-auc:0.87565 validation_1-auc:0.83160
[62] validation_0-auc:0.87618 validation_1-auc:0.83164
[63] validation_0-auc:0.87685 validation_1-auc:0.83174
[64] validation_0-auc:0.87749 validation_1-auc:0.83209
[65] validation_0-auc:0.87810 validation_1-auc:0.83233
[66] validation_0-auc:0.87867 validation_1-auc:0.83246
[67] validation_0-auc:0.87932 validation_1-auc:0.83256
[68] validation_0-auc:0.87982 validation_1-auc:0.83264
[69] validation_0-auc:0.88036 validation_1-auc:0.83250
[70] validation_0-auc:0.88087 validation_1-auc:0.83226
[71] validation_0-auc:0.88182 validation_1-auc:0.83208
[72] validation_0-auc:0.88232 validation_1-auc:0.83234
[73] validation_0-auc:0.88293 validation_1-auc:0.83247
[74] validation_0-auc:0.88342 validation_1-auc:0.83244
[75] validation_0-auc:0.88401 validation_1-auc:0.83246
[76] validation_0-auc:0.88451 validation_1-auc:0.83238
[77] validation_0-auc:0.88487 validation_1-auc:0.83224
[78] validation_0-auc:0.88518 validation_1-auc:0.83234
[79] validation_0-auc:0.88561 validation_1-auc:0.83233
[80] validation_0-auc:0.88637 validation_1-auc:0.83253
[81] validation_0-auc:0.88665 validation_1-auc:0.83255
[82] validation_0-auc:0.88703 validation_1-auc:0.83245
[83] validation_0-auc:0.88756 validation_1-auc:0.83261
[84] validation_0-auc:0.88791 validation_1-auc:0.83249
[85] validation_0-auc:0.88852 validation_1-auc:0.83263
[86] validation_0-auc:0.88895 validation_1-auc:0.83251
[87] validation_0-auc:0.88933 validation_1-auc:0.83237
[88] validation_0-auc:0.88970 validation_1-auc:0.83233
[89] validation_0-auc:0.89021 validation_1-auc:0.83231
[90] validation_0-auc:0.89065 validation_1-auc:0.83222
[91] validation_0-auc:0.89105 validation_1-auc:0.83236
[92] validation_0-auc:0.89142 validation_1-auc:0.83218
[93] validation_0-auc:0.89176 validation_1-auc:0.83239
[94] validation_0-auc:0.89213 validation_1-auc:0.83220
[95] validation_0-auc:0.89241 validation_1-auc:0.83227
[96] validation_0-auc:0.89278 validation_1-auc:0.83213
[97] validation_0-auc:0.89302 validation_1-auc:0.83223
[98] validation_0-auc:0.89329 validation_1-auc:0.83209
[99] validation_0-auc:0.89361 validation_1-auc:0.83227
[100] validation_0-auc:0.89380 validation_1-auc:0.83236
[101] validation_0-auc:0.89410 validation_1-auc:0.83232
[102] validation_0-auc:0.89438 validation_1-auc:0.83227
[103] validation_0-auc:0.89474 validation_1-auc:0.83220
[104] validation_0-auc:0.89509 validation_1-auc:0.83221
[105] validation_0-auc:0.89550 validation_1-auc:0.83226
[106] validation_0-auc:0.89586 validation_1-auc:0.83224
[107] validation_0-auc:0.89604 validation_1-auc:0.83231
[108] validation_0-auc:0.89611 validation_1-auc:0.83229
[109] validation_0-auc:0.89634 validation_1-auc:0.83230
[110] validation_0-auc:0.89666 validation_1-auc:0.83242
[111] validation_0-auc:0.89677 validation_1-auc:0.83238
[112] validation_0-auc:0.89695 validation_1-auc:0.83241
[113] validation_0-auc:0.89720 validation_1-auc:0.83241
[114] validation_0-auc:0.89728 validation_1-auc:0.83247
[115] validation_0-auc:0.89739 validation_1-auc:0.83249
[116] validation_0-auc:0.89764 validation_1-auc:0.83240
[117] validation_0-auc:0.89780 validation_1-auc:0.83240
[118] validation_0-auc:0.89793 validation_1-auc:0.83257
[119] validation_0-auc:0.89851 validation_1-auc:0.83260
[120] validation_0-auc:0.89886 validation_1-auc:0.83279
[121] validation_0-auc:0.89929 validation_1-auc:0.83272
[122] validation_0-auc:0.89957 validation_1-auc:0.83273
[123] validation_0-auc:0.90005 validation_1-auc:0.83269
[124] validation_0-auc:0.90036 validation_1-auc:0.83284
[125] validation_0-auc:0.90077 validation_1-auc:0.83297
[126] validation_0-auc:0.90086 validation_1-auc:0.83300
[127] validation_0-auc:0.90114 validation_1-auc:0.83315
[128] validation_0-auc:0.90151 validation_1-auc:0.83316
[129] validation_0-auc:0.90181 validation_1-auc:0.83337
[130] validation_0-auc:0.90211 validation_1-auc:0.83340
[131] validation_0-auc:0.90240 validation_1-auc:0.83340
[132] validation_0-auc:0.90266 validation_1-auc:0.83353
[133] validation_0-auc:0.90277 validation_1-auc:0.83347
[134] validation_0-auc:0.90279 validation_1-auc:0.83353
[135] validation_0-auc:0.90292 validation_1-auc:0.83353
[136] validation_0-auc:0.90302 validation_1-auc:0.83344
[137] validation_0-auc:0.90309 validation_1-auc:0.83348
[138] validation_0-auc:0.90312 validation_1-auc:0.83344
[139] validation_0-auc:0.90325 validation_1-auc:0.83340
[140] validation_0-auc:0.90338 validation_1-auc:0.83335
[141] validation_0-auc:0.90339 validation_1-auc:0.83339
[142] validation_0-auc:0.90363 validation_1-auc:0.83351
[143] validation_0-auc:0.90383 validation_1-auc:0.83358
[144] validation_0-auc:0.90395 validation_1-auc:0.83357
[145] validation_0-auc:0.90399 validation_1-auc:0.83361
[146] validation_0-auc:0.90417 validation_1-auc:0.83354
[147] validation_0-auc:0.90430 validation_1-auc:0.83349
[148] validation_0-auc:0.90434 validation_1-auc:0.83346
[149] validation_0-auc:0.90451 validation_1-auc:0.83346
[150] validation_0-auc:0.90459 validation_1-auc:0.83343
[151] validation_0-auc:0.90462 validation_1-auc:0.83344
[152] validation_0-auc:0.90476 validation_1-auc:0.83342
[153] validation_0-auc:0.90494 validation_1-auc:0.83339
[154] validation_0-auc:0.90507 validation_1-auc:0.83336
[155] validation_0-auc:0.90512 validation_1-auc:0.83334
[156] validation_0-auc:0.90518 validation_1-auc:0.83331
[157] validation_0-auc:0.90524 validation_1-auc:0.83339
[158] validation_0-auc:0.90543 validation_1-auc:0.83330
[159] validation_0-auc:0.90553 validation_1-auc:0.83331
[160] validation_0-auc:0.90567 validation_1-auc:0.83342
[161] validation_0-auc:0.90586 validation_1-auc:0.83339
[162] validation_0-auc:0.90592 validation_1-auc:0.83340
[163] validation_0-auc:0.90594 validation_1-auc:0.83340
[164] validation_0-auc:0.90622 validation_1-auc:0.83337
[165] validation_0-auc:0.90634 validation_1-auc:0.83333
[166] validation_0-auc:0.90645 validation_1-auc:0.83329
[167] validation_0-auc:0.90654 validation_1-auc:0.83329
[168] validation_0-auc:0.90659 validation_1-auc:0.83336
[169] validation_0-auc:0.90670 validation_1-auc:0.83339
[170] validation_0-auc:0.90675 validation_1-auc:0.83341
[171] validation_0-auc:0.90679 validation_1-auc:0.83334
[172] validation_0-auc:0.90701 validation_1-auc:0.83321
[173] validation_0-auc:0.90702 validation_1-auc:0.83321
[174] validation_0-auc:0.90706 validation_1-auc:0.83319
[175] validation_0-auc:0.90720 validation_1-auc:0.83323
[176] validation_0-auc:0.90730 validation_1-auc:0.83325
[177] validation_0-auc:0.90741 validation_1-auc:0.83323
[178] validation_0-auc:0.90753 validation_1-auc:0.83318
[179] validation_0-auc:0.90761 validation_1-auc:0.83317
[180] validation_0-auc:0.90768 validation_1-auc:0.83315
[181] validation_0-auc:0.90773 validation_1-auc:0.83313
[182] validation_0-auc:0.90785 validation_1-auc:0.83312
[183] validation_0-auc:0.90804 validation_1-auc:0.83303
[184] validation_0-auc:0.90816 validation_1-auc:0.83305
[185] validation_0-auc:0.90821 validation_1-auc:0.83307
[186] validation_0-auc:0.90823 validation_1-auc:0.83307
[187] validation_0-auc:0.90836 validation_1-auc:0.83308
[188] validation_0-auc:0.90841 validation_1-auc:0.83309
[189] validation_0-auc:0.90882 validation_1-auc:0.83304
[190] validation_0-auc:0.90885 validation_1-auc:0.83306
[191] validation_0-auc:0.90897 validation_1-auc:0.83301
[192] validation_0-auc:0.90909 validation_1-auc:0.83302
[193] validation_0-auc:0.90914 validation_1-auc:0.83303
[194] validation_0-auc:0.90927 validation_1-auc:0.83300
[195] validation_0-auc:0.90946 validation_1-auc:0.83298
[196] validation_0-auc:0.90959 validation_1-auc:0.83291
[197] validation_0-auc:0.90970 validation_1-auc:0.83293
[198] validation_0-auc:0.90972 validation_1-auc:0.83293
[199] validation_0-auc:0.90986 validation_1-auc:0.83293
[200] validation_0-auc:0.90992 validation_1-auc:0.83293
[201] validation_0-auc:0.90997 validation_1-auc:0.83291
[202] validation_0-auc:0.91010 validation_1-auc:0.83290
[203] validation_0-auc:0.91016 validation_1-auc:0.83287
[204] validation_0-auc:0.91025 validation_1-auc:0.83289
[205] validation_0-auc:0.91045 validation_1-auc:0.83282
[206] validation_0-auc:0.91056 validation_1-auc:0.83280
[207] validation_0-auc:0.91060 validation_1-auc:0.83287
[208] validation_0-auc:0.91063 validation_1-auc:0.83291
[209] validation_0-auc:0.91068 validation_1-auc:0.83292
[210] validation_0-auc:0.91069 validation_1-auc:0.83290
[211] validation_0-auc:0.91077 validation_1-auc:0.83286
[212] validation_0-auc:0.91084 validation_1-auc:0.83286
[213] validation_0-auc:0.91099 validation_1-auc:0.83293
[214] validation_0-auc:0.91133 validation_1-auc:0.83279
[215] validation_0-auc:0.91137 validation_1-auc:0.83276
[216] validation_0-auc:0.91143 validation_1-auc:0.83274
[217] validation_0-auc:0.91150 validation_1-auc:0.83274
[218] validation_0-auc:0.91158 validation_1-auc:0.83268
[219] validation_0-auc:0.91163 validation_1-auc:0.83267
[220] validation_0-auc:0.91165 validation_1-auc:0.83267
[221] validation_0-auc:0.91175 validation_1-auc:0.83269
[222] validation_0-auc:0.91192 validation_1-auc:0.83259
[223] validation_0-auc:0.91194 validation_1-auc:0.83260
[224] validation_0-auc:0.91199 validation_1-auc:0.83258
[225] validation_0-auc:0.91206 validation_1-auc:0.83262
[226] validation_0-auc:0.91210 validation_1-auc:0.83262
[227] validation_0-auc:0.91215 validation_1-auc:0.83263
[228] validation_0-auc:0.91231 validation_1-auc:0.83247
[229] validation_0-auc:0.91255 validation_1-auc:0.83239
[230] validation_0-auc:0.91281 validation_1-auc:0.83225
[231] validation_0-auc:0.91286 validation_1-auc:0.83222
[232] validation_0-auc:0.91294 validation_1-auc:0.83224
[233] validation_0-auc:0.91299 validation_1-auc:0.83227
[234] validation_0-auc:0.91317 validation_1-auc:0.83221
[235] validation_0-auc:0.91323 validation_1-auc:0.83221
[236] validation_0-auc:0.91349 validation_1-auc:0.83213
[237] validation_0-auc:0.91351 validation_1-auc:0.83208
[238] validation_0-auc:0.91362 validation_1-auc:0.83204
[239] validation_0-auc:0.91365 validation_1-auc:0.83201
[240] validation_0-auc:0.91370 validation_1-auc:0.83198
[241] validation_0-auc:0.91380 validation_1-auc:0.83197
[242] validation_0-auc:0.91385 validation_1-auc:0.83197
[243] validation_0-auc:0.91387 validation_1-auc:0.83197
[244] validation_0-auc:0.91395 validation_1-auc:0.83204
[245] validation_0-auc:0.91402 validation_1-auc:0.83196
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.05, max_delta_step=0,
max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=500, n_jobs=8,
num_parallel_tree=1, predictor='auto', random_state=156,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', validate_parameters=1, verbosity=None)
xgb_roc_score = roc_auc_score(y_test,xgb_clf.predict_proba(X_test)[:,1])
xgb_roc_score
0.842853493090032
#파라미터 튜닝을 통해 수치값을 높일 수 있는지
from hyperopt import hp
# max_depth는 5에서 15까지 1간격으로, min_child_weight는 1에서 6까지 1간격으로
# colsample_bytree는 0.5에서 0.95사이, learning_rate는 0.01에서 0.2사이 정규 분포된 값으로 검색.
xgb_search_space = {'max_depth': hp.quniform('max_depth', 5, 15, 1), #max_depth는 정수값으로 끝나야 한다.
'min_child_weight': hp.quniform('min_child_weight', 1, 6, 1),
'colsample_bytree': hp.uniform('colsample_bytree', 0.5, 0.95),
'learning_rate': hp.uniform('learning_rate', 0.01, 0.2)
}
from sklearn.model_selection import KFold
from sklearn.metrics import roc_auc_score
# 목적 함수 설정.
# 추후 fmin()에서 입력된 search_space값으로 XGBClassifier 교차 검증 학습 후 -1* roc_auc 평균 값을 반환.
def objective_func(search_space):
xgb_clf = XGBClassifier(n_estimators=100, max_depth=int(search_space['max_depth']),
min_child_weight=int(search_space['min_child_weight']),
colsample_bytree=search_space['colsample_bytree'],
learning_rate=search_space['learning_rate']
)
# 3개 k-fold 방식으로 평가된 roc_auc 지표를 담는 list
roc_auc_list= [] #교차검증에 나온 값 저장
# 3개 k-fold방식 적용
kf = KFold(n_splits=3)
# X_train을 다시 학습과 검증용 데이터로 분리
for tr_index, val_index in kf.split(X_train):
# kf.split(X_train)으로 추출된 학습과 검증 index값으로 학습과 검증 데이터 세트 분리
X_tr, y_tr = X_train.iloc[tr_index], y_train.iloc[tr_index] #학습
X_val, y_val = X_train.iloc[val_index], y_train.iloc[val_index] #검증
# early stopping은 30회로 설정하고 추출된 학습과 검증 데이터로 XGBClassifier 학습 수행.
xgb_clf.fit(X_tr, y_tr, early_stopping_rounds=30, eval_metric='auc',
eval_set=[(X_tr, y_tr), (X_val, y_val)])
# 1로 예측한 확률값 추출후 roc auc 계산하고 평균 roc auc 계산을 위해 list에 결과값 담음.
score = roc_auc_score(y_val, xgb_clf.predict_proba(X_val)[:, 1])
roc_auc_list.append(score)
# 3개 k-fold로 계산된 roc_auc값의 평균값을 반환하되,
# HyperOpt는 목적함수의 최소값을 위한 입력값을 찾으므로 -1을 곱한 뒤 반환.
return -1 * np.mean(roc_auc_list)
from hyperopt import fmin, tpe, Trials
trials = Trials()
# fmin()함수를 호출. max_evals지정된 횟수만큼 반복 후 목적함수의 최소값을 가지는 최적 입력값 추출.
best = fmin(fn=objective_func,
space=xgb_search_space,
algo=tpe.suggest,
max_evals=50, # 최대 반복 횟수를 지정합니다.
trials=trials, rstate=np.random.default_rng(seed=30))
print('best:', best)
[0] validation_0-auc:0.81678 validation_1-auc:0.79160
[1] validation_0-auc:0.82454 validation_1-auc:0.79688
[2] validation_0-auc:0.83323 validation_1-auc:0.80572
[3] validation_0-auc:0.83854 validation_1-auc:0.81095
[4] validation_0-auc:0.83847 validation_1-auc:0.80989
[5] validation_0-auc:0.83879 validation_1-auc:0.80978
[6] validation_0-auc:0.84053 validation_1-auc:0.81042
[7] validation_0-auc:0.84129 validation_1-auc:0.81116
[8] validation_0-auc:0.84224 validation_1-auc:0.81135
[9] validation_0-auc:0.84515 validation_1-auc:0.81587
[10] validation_0-auc:0.84736 validation_1-auc:0.81683
[11] validation_0-auc:0.84784 validation_1-auc:0.81750
[12] validation_0-auc:0.84909 validation_1-auc:0.81925
[13] validation_0-auc:0.84984 validation_1-auc:0.82003
[14] validation_0-auc:0.85307 validation_1-auc:0.82285
[15] validation_0-auc:0.85493 validation_1-auc:0.82363
[16] validation_0-auc:0.85640 validation_1-auc:0.82444
[17] validation_0-auc:0.85791 validation_1-auc:0.82505
[18] validation_0-auc:0.85803 validation_1-auc:0.82578
[19] validation_0-auc:0.85864 validation_1-auc:0.82518
[20] validation_0-auc:0.85944 validation_1-auc:0.82483
[21] validation_0-auc:0.86087 validation_1-auc:0.82507
[22] validation_0-auc:0.86286 validation_1-auc:0.82542
[23] validation_0-auc:0.86395 validation_1-auc:0.82617
[24] validation_0-auc:0.86454 validation_1-auc:0.82586
[25] validation_0-auc:0.86530 validation_1-auc:0.82628
[26] validation_0-auc:0.86599 validation_1-auc:0.82721
[27] validation_0-auc:0.86691 validation_1-auc:0.82734
[28] validation_0-auc:0.86755 validation_1-auc:0.82787
[29] validation_0-auc:0.86836 validation_1-auc:0.82835
[30] validation_0-auc:0.86887 validation_1-auc:0.82912
[31] validation_0-auc:0.86988 validation_1-auc:0.82956
[32] validation_0-auc:0.87047 validation_1-auc:0.82990
[33] validation_0-auc:0.87144 validation_1-auc:0.83007
[34] validation_0-auc:0.87253 validation_1-auc:0.82995
[35] validation_0-auc:0.87350 validation_1-auc:0.83031
[36] validation_0-auc:0.87375 validation_1-auc:0.83091
[37] validation_0-auc:0.87443 validation_1-auc:0.83088
[38] validation_0-auc:0.87521 validation_1-auc:0.83091
[39] validation_0-auc:0.87620 validation_1-auc:0.83137
[40] validation_0-auc:0.87686 validation_1-auc:0.83113
[41] validation_0-auc:0.87799 validation_1-auc:0.83149
[42] validation_0-auc:0.87908 validation_1-auc:0.83174
[43] validation_0-auc:0.87966 validation_1-auc:0.83152
[44] validation_0-auc:0.88016 validation_1-auc:0.83162
[45] validation_0-auc:0.88107 validation_1-auc:0.83126
[46] validation_0-auc:0.88178 validation_1-auc:0.83151
[47] validation_0-auc:0.88252 validation_1-auc:0.83172
[48] validation_0-auc:0.88320 validation_1-auc:0.83204
[49] validation_0-auc:0.88356 validation_1-auc:0.83195
[50] validation_0-auc:0.88411 validation_1-auc:0.83212
[51] validation_0-auc:0.88469 validation_1-auc:0.83217
[52] validation_0-auc:0.88510 validation_1-auc:0.83272
[53] validation_0-auc:0.88562 validation_1-auc:0.83264
[54] validation_0-auc:0.88674 validation_1-auc:0.83268
[55] validation_0-auc:0.88719 validation_1-auc:0.83291
[56] validation_0-auc:0.88780 validation_1-auc:0.83279
[57] validation_0-auc:0.88854 validation_1-auc:0.83297
[58] validation_0-auc:0.88885 validation_1-auc:0.83277
[59] validation_0-auc:0.88919 validation_1-auc:0.83298
[60] validation_0-auc:0.88993 validation_1-auc:0.83282
[61] validation_0-auc:0.89042 validation_1-auc:0.83248
[62] validation_0-auc:0.89106 validation_1-auc:0.83279
[63] validation_0-auc:0.89142 validation_1-auc:0.83290
[64] validation_0-auc:0.89172 validation_1-auc:0.83277
[65] validation_0-auc:0.89200 validation_1-auc:0.83248
[66] validation_0-auc:0.89232 validation_1-auc:0.83274
[67] validation_0-auc:0.89232 validation_1-auc:0.83283
[68] validation_0-auc:0.89247 validation_1-auc:0.83282
[69] validation_0-auc:0.89266 validation_1-auc:0.83292
[70] validation_0-auc:0.89306 validation_1-auc:0.83292
[71] validation_0-auc:0.89358 validation_1-auc:0.83264
[72] validation_0-auc:0.89402 validation_1-auc:0.83256
[73] validation_0-auc:0.89428 validation_1-auc:0.83239
[74] validation_0-auc:0.89455 validation_1-auc:0.83261
[75] validation_0-auc:0.89482 validation_1-auc:0.83233
[76] validation_0-auc:0.89494 validation_1-auc:0.83225
[77] validation_0-auc:0.89509 validation_1-auc:0.83233
[78] validation_0-auc:0.89546 validation_1-auc:0.83236
[79] validation_0-auc:0.89578 validation_1-auc:0.83220
[80] validation_0-auc:0.89591 validation_1-auc:0.83214
[81] validation_0-auc:0.89652 validation_1-auc:0.83189
[82] validation_0-auc:0.89669 validation_1-auc:0.83193
[83] validation_0-auc:0.89721 validation_1-auc:0.83186
[84] validation_0-auc:0.89746 validation_1-auc:0.83174
[85] validation_0-auc:0.89802 validation_1-auc:0.83179
[86] validation_0-auc:0.89823 validation_1-auc:0.83172
[87] validation_0-auc:0.89861 validation_1-auc:0.83173
[88] validation_0-auc:0.89889 validation_1-auc:0.83155
[0] validation_0-auc:0.81645 validation_1-auc:0.80415
[1] validation_0-auc:0.82149 validation_1-auc:0.80712
[2] validation_0-auc:0.83100 validation_1-auc:0.82058
[3] validation_0-auc:0.83162 validation_1-auc:0.81977
[4] validation_0-auc:0.83682 validation_1-auc:0.81846
[5] validation_0-auc:0.83858 validation_1-auc:0.82111
[6] validation_0-auc:0.84021 validation_1-auc:0.82048
[7] validation_0-auc:0.84093 validation_1-auc:0.82130
[8] validation_0-auc:0.84181 validation_1-auc:0.82142
[9] validation_0-auc:0.84628 validation_1-auc:0.82601
[10] validation_0-auc:0.84747 validation_1-auc:0.82481
[11] validation_0-auc:0.84642 validation_1-auc:0.82134
[12] validation_0-auc:0.85104 validation_1-auc:0.82403
[13] validation_0-auc:0.85295 validation_1-auc:0.82562
[14] validation_0-auc:0.85536 validation_1-auc:0.82714
[15] validation_0-auc:0.85630 validation_1-auc:0.82832
[16] validation_0-auc:0.85751 validation_1-auc:0.82827
[17] validation_0-auc:0.85842 validation_1-auc:0.82859
[18] validation_0-auc:0.85831 validation_1-auc:0.82826
[19] validation_0-auc:0.85946 validation_1-auc:0.82842
[20] validation_0-auc:0.86060 validation_1-auc:0.82849
[21] validation_0-auc:0.86173 validation_1-auc:0.82915
[22] validation_0-auc:0.86369 validation_1-auc:0.82993
[23] validation_0-auc:0.86460 validation_1-auc:0.82943
[24] validation_0-auc:0.86562 validation_1-auc:0.83030
[25] validation_0-auc:0.86608 validation_1-auc:0.83044
[26] validation_0-auc:0.86676 validation_1-auc:0.83060
[27] validation_0-auc:0.86760 validation_1-auc:0.83127
[28] validation_0-auc:0.86829 validation_1-auc:0.83130
[29] validation_0-auc:0.86898 validation_1-auc:0.83197
[30] validation_0-auc:0.86983 validation_1-auc:0.83139
[31] validation_0-auc:0.87046 validation_1-auc:0.83214
[32] validation_0-auc:0.87121 validation_1-auc:0.83274
[33] validation_0-auc:0.87196 validation_1-auc:0.83310
[34] validation_0-auc:0.87262 validation_1-auc:0.83325
[35] validation_0-auc:0.87371 validation_1-auc:0.83351
[36] validation_0-auc:0.87444 validation_1-auc:0.83318
[37] validation_0-auc:0.87501 validation_1-auc:0.83313
[38] validation_0-auc:0.87603 validation_1-auc:0.83367
[39] validation_0-auc:0.87669 validation_1-auc:0.83405
[40] validation_0-auc:0.87754 validation_1-auc:0.83409
[41] validation_0-auc:0.87853 validation_1-auc:0.83480
[42] validation_0-auc:0.87960 validation_1-auc:0.83517
[43] validation_0-auc:0.88007 validation_1-auc:0.83521
[44] validation_0-auc:0.88100 validation_1-auc:0.83549
[45] validation_0-auc:0.88167 validation_1-auc:0.83587
[46] validation_0-auc:0.88227 validation_1-auc:0.83625
[47] validation_0-auc:0.88318 validation_1-auc:0.83603
[48] validation_0-auc:0.88406 validation_1-auc:0.83587
[49] validation_0-auc:0.88500 validation_1-auc:0.83567
[50] validation_0-auc:0.88567 validation_1-auc:0.83621
[51] validation_0-auc:0.88618 validation_1-auc:0.83640
[52] validation_0-auc:0.88687 validation_1-auc:0.83592
[53] validation_0-auc:0.88747 validation_1-auc:0.83641
[54] validation_0-auc:0.88845 validation_1-auc:0.83647
[55] validation_0-auc:0.88910 validation_1-auc:0.83633
[56] validation_0-auc:0.88934 validation_1-auc:0.83654
[57] validation_0-auc:0.88980 validation_1-auc:0.83647
[58] validation_0-auc:0.89028 validation_1-auc:0.83645
[59] validation_0-auc:0.89052 validation_1-auc:0.83661
[60] validation_0-auc:0.89098 validation_1-auc:0.83654
[61] validation_0-auc:0.89180 validation_1-auc:0.83638
[62] validation_0-auc:0.89222 validation_1-auc:0.83630
[63] validation_0-auc:0.89314 validation_1-auc:0.83599
[64] validation_0-auc:0.89344 validation_1-auc:0.83620
[65] validation_0-auc:0.89368 validation_1-auc:0.83629
[66] validation_0-auc:0.89394 validation_1-auc:0.83631
[67] validation_0-auc:0.89417 validation_1-auc:0.83643
[68] validation_0-auc:0.89445 validation_1-auc:0.83653
[69] validation_0-auc:0.89465 validation_1-auc:0.83649
[70] validation_0-auc:0.89490 validation_1-auc:0.83655
[71] validation_0-auc:0.89518 validation_1-auc:0.83656
[72] validation_0-auc:0.89558 validation_1-auc:0.83650
[73] validation_0-auc:0.89633 validation_1-auc:0.83656
[74] validation_0-auc:0.89677 validation_1-auc:0.83662
[75] validation_0-auc:0.89706 validation_1-auc:0.83671
[76] validation_0-auc:0.89731 validation_1-auc:0.83669
[77] validation_0-auc:0.89748 validation_1-auc:0.83670
[78] validation_0-auc:0.89755 validation_1-auc:0.83675
[79] validation_0-auc:0.89798 validation_1-auc:0.83652
[80] validation_0-auc:0.89838 validation_1-auc:0.83681
[81] validation_0-auc:0.89872 validation_1-auc:0.83669
[82] validation_0-auc:0.89915 validation_1-auc:0.83660
[83] validation_0-auc:0.89960 validation_1-auc:0.83667
[84] validation_0-auc:0.89974 validation_1-auc:0.83667
[85] validation_0-auc:0.89997 validation_1-auc:0.83680
[86] validation_0-auc:0.90025 validation_1-auc:0.83672
[87] validation_0-auc:0.90063 validation_1-auc:0.83659
[88] validation_0-auc:0.90073 validation_1-auc:0.83664
[89] validation_0-auc:0.90081 validation_1-auc:0.83661
[90] validation_0-auc:0.90090 validation_1-auc:0.83661
[91] validation_0-auc:0.90097 validation_1-auc:0.83656
[92] validation_0-auc:0.90127 validation_1-auc:0.83650
[93] validation_0-auc:0.90134 validation_1-auc:0.83650
[94] validation_0-auc:0.90137 validation_1-auc:0.83657
[95] validation_0-auc:0.90160 validation_1-auc:0.83642
[96] validation_0-auc:0.90182 validation_1-auc:0.83643
[97] validation_0-auc:0.90200 validation_1-auc:0.83626
[98] validation_0-auc:0.90206 validation_1-auc:0.83620
[99] validation_0-auc:0.90244 validation_1-auc:0.83622
[0] validation_0-auc:0.82722 validation_1-auc:0.81055
[1] validation_0-auc:0.83004 validation_1-auc:0.81294
[2] validation_0-auc:0.83657 validation_1-auc:0.81821
[3] validation_0-auc:0.83902 validation_1-auc:0.81736
[4] validation_0-auc:0.84061 validation_1-auc:0.81876
[5] validation_0-auc:0.84373 validation_1-auc:0.82262
[6] validation_0-auc:0.84346 validation_1-auc:0.82117
[7] validation_0-auc:0.84401 validation_1-auc:0.82159
[8] validation_0-auc:0.84546 validation_1-auc:0.82202
[9] validation_0-auc:0.84981 validation_1-auc:0.82481
[10] validation_0-auc:0.84979 validation_1-auc:0.82447
[11] validation_0-auc:0.84925 validation_1-auc:0.82420
[12] validation_0-auc:0.85143 validation_1-auc:0.82483
[13] validation_0-auc:0.85317 validation_1-auc:0.82525
[14] validation_0-auc:0.85451 validation_1-auc:0.82647
[15] validation_0-auc:0.85543 validation_1-auc:0.82596
[16] validation_0-auc:0.85660 validation_1-auc:0.82658
[17] validation_0-auc:0.85740 validation_1-auc:0.82710
[18] validation_0-auc:0.85886 validation_1-auc:0.82679
[19] validation_0-auc:0.85976 validation_1-auc:0.82722
[20] validation_0-auc:0.86042 validation_1-auc:0.82733
[21] validation_0-auc:0.86157 validation_1-auc:0.82861
[22] validation_0-auc:0.86180 validation_1-auc:0.82913
[23] validation_0-auc:0.86289 validation_1-auc:0.82976
[24] validation_0-auc:0.86406 validation_1-auc:0.83100
[25] validation_0-auc:0.86446 validation_1-auc:0.83093
[26] validation_0-auc:0.86549 validation_1-auc:0.83073
[27] validation_0-auc:0.86646 validation_1-auc:0.83126
[28] validation_0-auc:0.86704 validation_1-auc:0.83175
[29] validation_0-auc:0.86740 validation_1-auc:0.83218
[30] validation_0-auc:0.86842 validation_1-auc:0.83240
[31] validation_0-auc:0.86985 validation_1-auc:0.83361
[32] validation_0-auc:0.87086 validation_1-auc:0.83447
[33] validation_0-auc:0.87205 validation_1-auc:0.83449
[34] validation_0-auc:0.87294 validation_1-auc:0.83475
[35] validation_0-auc:0.87363 validation_1-auc:0.83452
[36] validation_0-auc:0.87409 validation_1-auc:0.83446
[37] validation_0-auc:0.87477 validation_1-auc:0.83445
[38] validation_0-auc:0.87558 validation_1-auc:0.83496
[39] validation_0-auc:0.87655 validation_1-auc:0.83521
[40] validation_0-auc:0.87735 validation_1-auc:0.83508
[41] validation_0-auc:0.87813 validation_1-auc:0.83483
[42] validation_0-auc:0.87946 validation_1-auc:0.83554
[43] validation_0-auc:0.88027 validation_1-auc:0.83537
[44] validation_0-auc:0.88164 validation_1-auc:0.83542
[45] validation_0-auc:0.88292 validation_1-auc:0.83586
[46] validation_0-auc:0.88330 validation_1-auc:0.83634
[47] validation_0-auc:0.88416 validation_1-auc:0.83624
[48] validation_0-auc:0.88499 validation_1-auc:0.83635
[49] validation_0-auc:0.88527 validation_1-auc:0.83635
[50] validation_0-auc:0.88595 validation_1-auc:0.83635
[51] validation_0-auc:0.88653 validation_1-auc:0.83626
[52] validation_0-auc:0.88729 validation_1-auc:0.83625
[53] validation_0-auc:0.88772 validation_1-auc:0.83629
[54] validation_0-auc:0.88813 validation_1-auc:0.83641
[55] validation_0-auc:0.88855 validation_1-auc:0.83655
[56] validation_0-auc:0.88921 validation_1-auc:0.83681
[57] validation_0-auc:0.88958 validation_1-auc:0.83665
[58] validation_0-auc:0.88991 validation_1-auc:0.83691
[59] validation_0-auc:0.89033 validation_1-auc:0.83699
[60] validation_0-auc:0.89135 validation_1-auc:0.83681
[61] validation_0-auc:0.89196 validation_1-auc:0.83658
[62] validation_0-auc:0.89222 validation_1-auc:0.83647
[63] validation_0-auc:0.89279 validation_1-auc:0.83625
[64] validation_0-auc:0.89342 validation_1-auc:0.83651
[65] validation_0-auc:0.89374 validation_1-auc:0.83649
[66] validation_0-auc:0.89413 validation_1-auc:0.83678
[67] validation_0-auc:0.89477 validation_1-auc:0.83686
[68] validation_0-auc:0.89523 validation_1-auc:0.83661
[69] validation_0-auc:0.89550 validation_1-auc:0.83708
[70] validation_0-auc:0.89589 validation_1-auc:0.83712
[71] validation_0-auc:0.89605 validation_1-auc:0.83692
[72] validation_0-auc:0.89630 validation_1-auc:0.83701
[73] validation_0-auc:0.89717 validation_1-auc:0.83680
[74] validation_0-auc:0.89765 validation_1-auc:0.83659
[75] validation_0-auc:0.89788 validation_1-auc:0.83666
[76] validation_0-auc:0.89816 validation_1-auc:0.83684
[77] validation_0-auc:0.89834 validation_1-auc:0.83680
[78] validation_0-auc:0.89850 validation_1-auc:0.83687
[79] validation_0-auc:0.89873 validation_1-auc:0.83699
[80] validation_0-auc:0.89909 validation_1-auc:0.83698
[81] validation_0-auc:0.89960 validation_1-auc:0.83698
[82] validation_0-auc:0.90021 validation_1-auc:0.83718
[83] validation_0-auc:0.90075 validation_1-auc:0.83709
[84] validation_0-auc:0.90107 validation_1-auc:0.83727
[85] validation_0-auc:0.90143 validation_1-auc:0.83751
[86] validation_0-auc:0.90158 validation_1-auc:0.83765
[87] validation_0-auc:0.90167 validation_1-auc:0.83763
[88] validation_0-auc:0.90173 validation_1-auc:0.83761
[89] validation_0-auc:0.90208 validation_1-auc:0.83762
[90] validation_0-auc:0.90247 validation_1-auc:0.83791
[91] validation_0-auc:0.90255 validation_1-auc:0.83779
[92] validation_0-auc:0.90285 validation_1-auc:0.83776
[93] validation_0-auc:0.90299 validation_1-auc:0.83768
[94] validation_0-auc:0.90325 validation_1-auc:0.83765
[95] validation_0-auc:0.90332 validation_1-auc:0.83758
[96] validation_0-auc:0.90378 validation_1-auc:0.83770
[97] validation_0-auc:0.90387 validation_1-auc:0.83760
[98] validation_0-auc:0.90409 validation_1-auc:0.83756
[99] validation_0-auc:0.90428 validation_1-auc:0.83769
[0] validation_0-auc:0.82774 validation_1-auc:0.80070
[1] validation_0-auc:0.83281 validation_1-auc:0.80392
[2] validation_0-auc:0.83808 validation_1-auc:0.80674
[3] validation_0-auc:0.84214 validation_1-auc:0.80910
[4] validation_0-auc:0.84461 validation_1-auc:0.81348
[5] validation_0-auc:0.84588 validation_1-auc:0.81084
[6] validation_0-auc:0.84961 validation_1-auc:0.81557
[7] validation_0-auc:0.85247 validation_1-auc:0.81736
[8] validation_0-auc:0.85333 validation_1-auc:0.81849
[9] validation_0-auc:0.85277 validation_1-auc:0.81597
[10] validation_0-auc:0.85281 validation_1-auc:0.81484
[11] validation_0-auc:0.85388 validation_1-auc:0.81609
[12] validation_0-auc:0.85745 validation_1-auc:0.81817
[13] validation_0-auc:0.86111 validation_1-auc:0.82011
[14] validation_0-auc:0.86299 validation_1-auc:0.82119
[15] validation_0-auc:0.86410 validation_1-auc:0.82093
[16] validation_0-auc:0.86566 validation_1-auc:0.82213
[17] validation_0-auc:0.86732 validation_1-auc:0.82303
[18] validation_0-auc:0.86814 validation_1-auc:0.82290
[19] validation_0-auc:0.87072 validation_1-auc:0.82312
[20] validation_0-auc:0.87043 validation_1-auc:0.82314
[21] validation_0-auc:0.87006 validation_1-auc:0.82142
[22] validation_0-auc:0.87174 validation_1-auc:0.82267
[23] validation_0-auc:0.87406 validation_1-auc:0.82300
[24] validation_0-auc:0.87584 validation_1-auc:0.82412
[25] validation_0-auc:0.87671 validation_1-auc:0.82477
[26] validation_0-auc:0.87852 validation_1-auc:0.82374
[27] validation_0-auc:0.88070 validation_1-auc:0.82474
[28] validation_0-auc:0.88029 validation_1-auc:0.82428
[29] validation_0-auc:0.88254 validation_1-auc:0.82502
[30] validation_0-auc:0.88386 validation_1-auc:0.82427
[31] validation_0-auc:0.88581 validation_1-auc:0.82567
[32] validation_0-auc:0.88696 validation_1-auc:0.82697
[33] validation_0-auc:0.88771 validation_1-auc:0.82786
[34] validation_0-auc:0.88839 validation_1-auc:0.82790
[35] validation_0-auc:0.88925 validation_1-auc:0.82823
[36] validation_0-auc:0.89028 validation_1-auc:0.82736
[37] validation_0-auc:0.89087 validation_1-auc:0.82673
[38] validation_0-auc:0.89143 validation_1-auc:0.82639
[39] validation_0-auc:0.89216 validation_1-auc:0.82729
[40] validation_0-auc:0.89319 validation_1-auc:0.82798
[41] validation_0-auc:0.89363 validation_1-auc:0.82734
[42] validation_0-auc:0.89540 validation_1-auc:0.82862
[43] validation_0-auc:0.89602 validation_1-auc:0.82923
[44] validation_0-auc:0.89736 validation_1-auc:0.82953
[45] validation_0-auc:0.89786 validation_1-auc:0.83018
[46] validation_0-auc:0.89846 validation_1-auc:0.83030
[47] validation_0-auc:0.89942 validation_1-auc:0.83039
[48] validation_0-auc:0.90076 validation_1-auc:0.83044
[49] validation_0-auc:0.90236 validation_1-auc:0.83062
[50] validation_0-auc:0.90265 validation_1-auc:0.83022
[51] validation_0-auc:0.90402 validation_1-auc:0.83032
[52] validation_0-auc:0.90437 validation_1-auc:0.83093
[53] validation_0-auc:0.90514 validation_1-auc:0.82994
[54] validation_0-auc:0.90563 validation_1-auc:0.82977
[55] validation_0-auc:0.90637 validation_1-auc:0.82996
[56] validation_0-auc:0.90670 validation_1-auc:0.82932
[57] validation_0-auc:0.90707 validation_1-auc:0.82900
[58] validation_0-auc:0.90802 validation_1-auc:0.82926
[59] validation_0-auc:0.90872 validation_1-auc:0.82902
[60] validation_0-auc:0.90923 validation_1-auc:0.82842
[61] validation_0-auc:0.91025 validation_1-auc:0.82827
[62] validation_0-auc:0.91068 validation_1-auc:0.82857
[63] validation_0-auc:0.91145 validation_1-auc:0.82818
[64] validation_0-auc:0.91238 validation_1-auc:0.82859
[65] validation_0-auc:0.91291 validation_1-auc:0.82860
[66] validation_0-auc:0.91337 validation_1-auc:0.82819
[67] validation_0-auc:0.91429 validation_1-auc:0.82859
[68] validation_0-auc:0.91471 validation_1-auc:0.82848
[69] validation_0-auc:0.91521 validation_1-auc:0.82931
[70] validation_0-auc:0.91591 validation_1-auc:0.82860
[71] validation_0-auc:0.91605 validation_1-auc:0.82844
[72] validation_0-auc:0.91678 validation_1-auc:0.82890
[73] validation_0-auc:0.91720 validation_1-auc:0.82836
[74] validation_0-auc:0.91744 validation_1-auc:0.82802
[75] validation_0-auc:0.91781 validation_1-auc:0.82864
[76] validation_0-auc:0.91828 validation_1-auc:0.82930
[77] validation_0-auc:0.91853 validation_1-auc:0.82911
[78] validation_0-auc:0.91887 validation_1-auc:0.82947
[79] validation_0-auc:0.91929 validation_1-auc:0.82932
[80] validation_0-auc:0.91972 validation_1-auc:0.82964
[81] validation_0-auc:0.92031 validation_1-auc:0.82954
[82] validation_0-auc:0.92056 validation_1-auc:0.82965
[0] validation_0-auc:0.82128 validation_1-auc:0.80603
[1] validation_0-auc:0.82734 validation_1-auc:0.80820
[2] validation_0-auc:0.83693 validation_1-auc:0.81837
[3] validation_0-auc:0.84076 validation_1-auc:0.82377
[4] validation_0-auc:0.84675 validation_1-auc:0.82426
[5] validation_0-auc:0.84979 validation_1-auc:0.82363
[6] validation_0-auc:0.85384 validation_1-auc:0.82628
[7] validation_0-auc:0.85475 validation_1-auc:0.82713
[8] validation_0-auc:0.85587 validation_1-auc:0.82832
[9] validation_0-auc:0.85593 validation_1-auc:0.82735
[10] validation_0-auc:0.85605 validation_1-auc:0.82348
[11] validation_0-auc:0.85825 validation_1-auc:0.82216
[12] validation_0-auc:0.85977 validation_1-auc:0.82423
[13] validation_0-auc:0.86183 validation_1-auc:0.82581
[14] validation_0-auc:0.86394 validation_1-auc:0.82742
[15] validation_0-auc:0.86612 validation_1-auc:0.82586
[16] validation_0-auc:0.86785 validation_1-auc:0.82719
[17] validation_0-auc:0.87010 validation_1-auc:0.82776
[18] validation_0-auc:0.86974 validation_1-auc:0.82651
[19] validation_0-auc:0.87217 validation_1-auc:0.82698
[20] validation_0-auc:0.87221 validation_1-auc:0.82552
[21] validation_0-auc:0.87148 validation_1-auc:0.82379
[22] validation_0-auc:0.87351 validation_1-auc:0.82475
[23] validation_0-auc:0.87605 validation_1-auc:0.82582
[24] validation_0-auc:0.87870 validation_1-auc:0.82673
[25] validation_0-auc:0.88012 validation_1-auc:0.82815
[26] validation_0-auc:0.88124 validation_1-auc:0.82690
[27] validation_0-auc:0.88327 validation_1-auc:0.82789
[28] validation_0-auc:0.88251 validation_1-auc:0.82677
[29] validation_0-auc:0.88404 validation_1-auc:0.82756
[30] validation_0-auc:0.88450 validation_1-auc:0.82673
[31] validation_0-auc:0.88596 validation_1-auc:0.82749
[32] validation_0-auc:0.88711 validation_1-auc:0.82808
[33] validation_0-auc:0.88754 validation_1-auc:0.82837
[34] validation_0-auc:0.88815 validation_1-auc:0.82919
[35] validation_0-auc:0.88945 validation_1-auc:0.83037
[36] validation_0-auc:0.89004 validation_1-auc:0.82920
[37] validation_0-auc:0.89056 validation_1-auc:0.82864
[38] validation_0-auc:0.89098 validation_1-auc:0.82831
[39] validation_0-auc:0.89169 validation_1-auc:0.82861
[40] validation_0-auc:0.89308 validation_1-auc:0.82975
[41] validation_0-auc:0.89324 validation_1-auc:0.82911
[42] validation_0-auc:0.89554 validation_1-auc:0.82967
[43] validation_0-auc:0.89671 validation_1-auc:0.82997
[44] validation_0-auc:0.89801 validation_1-auc:0.83066
[45] validation_0-auc:0.89858 validation_1-auc:0.83086
[46] validation_0-auc:0.89946 validation_1-auc:0.83136
[47] validation_0-auc:0.90084 validation_1-auc:0.83134
[48] validation_0-auc:0.90193 validation_1-auc:0.83171
[49] validation_0-auc:0.90334 validation_1-auc:0.83230
[50] validation_0-auc:0.90405 validation_1-auc:0.83199
[51] validation_0-auc:0.90508 validation_1-auc:0.83210
[52] validation_0-auc:0.90534 validation_1-auc:0.83251
[53] validation_0-auc:0.90608 validation_1-auc:0.83169
[54] validation_0-auc:0.90659 validation_1-auc:0.83136
[55] validation_0-auc:0.90760 validation_1-auc:0.83184
[56] validation_0-auc:0.90755 validation_1-auc:0.83151
[57] validation_0-auc:0.90826 validation_1-auc:0.83118
[58] validation_0-auc:0.90927 validation_1-auc:0.83190
[59] validation_0-auc:0.90985 validation_1-auc:0.83136
[60] validation_0-auc:0.90982 validation_1-auc:0.83122
[61] validation_0-auc:0.91104 validation_1-auc:0.83080
[62] validation_0-auc:0.91125 validation_1-auc:0.83026
[63] validation_0-auc:0.91183 validation_1-auc:0.82965
[64] validation_0-auc:0.91283 validation_1-auc:0.83049
[65] validation_0-auc:0.91341 validation_1-auc:0.83012
[66] validation_0-auc:0.91400 validation_1-auc:0.82958
[67] validation_0-auc:0.91461 validation_1-auc:0.83030
[68] validation_0-auc:0.91501 validation_1-auc:0.82990
[69] validation_0-auc:0.91547 validation_1-auc:0.83067
[70] validation_0-auc:0.91622 validation_1-auc:0.83023
[71] validation_0-auc:0.91644 validation_1-auc:0.83046
[72] validation_0-auc:0.91717 validation_1-auc:0.83111
[73] validation_0-auc:0.91781 validation_1-auc:0.83120
[74] validation_0-auc:0.91827 validation_1-auc:0.83108
[75] validation_0-auc:0.91883 validation_1-auc:0.83182
[76] validation_0-auc:0.91964 validation_1-auc:0.83229
[77] validation_0-auc:0.91980 validation_1-auc:0.83228
[78] validation_0-auc:0.92005 validation_1-auc:0.83275
[79] validation_0-auc:0.92038 validation_1-auc:0.83330
[80] validation_0-auc:0.92111 validation_1-auc:0.83392
[81] validation_0-auc:0.92178 validation_1-auc:0.83396
[82] validation_0-auc:0.92190 validation_1-auc:0.83421
[83] validation_0-auc:0.92229 validation_1-auc:0.83413
[84] validation_0-auc:0.92275 validation_1-auc:0.83403
[85] validation_0-auc:0.92308 validation_1-auc:0.83423
[86] validation_0-auc:0.92327 validation_1-auc:0.83433
[87] validation_0-auc:0.92362 validation_1-auc:0.83464
[88] validation_0-auc:0.92388 validation_1-auc:0.83469
[89] validation_0-auc:0.92452 validation_1-auc:0.83477
[90] validation_0-auc:0.92458 validation_1-auc:0.83481
[91] validation_0-auc:0.92486 validation_1-auc:0.83482
[92] validation_0-auc:0.92520 validation_1-auc:0.83508
[93] validation_0-auc:0.92552 validation_1-auc:0.83547
[94] validation_0-auc:0.92564 validation_1-auc:0.83551
[95] validation_0-auc:0.92584 validation_1-auc:0.83559
[96] validation_0-auc:0.92590 validation_1-auc:0.83574
[97] validation_0-auc:0.92599 validation_1-auc:0.83564
[98] validation_0-auc:0.92602 validation_1-auc:0.83574
[99] validation_0-auc:0.92623 validation_1-auc:0.83593
[0] validation_0-auc:0.83112 validation_1-auc:0.81268
[1] validation_0-auc:0.83682 validation_1-auc:0.81355
[2] validation_0-auc:0.84258 validation_1-auc:0.81920
[3] validation_0-auc:0.84275 validation_1-auc:0.81851
[4] validation_0-auc:0.84851 validation_1-auc:0.81928
[5] validation_0-auc:0.85511 validation_1-auc:0.82100
[6] validation_0-auc:0.85611 validation_1-auc:0.82251
[7] validation_0-auc:0.85814 validation_1-auc:0.82231
[8] validation_0-auc:0.85953 validation_1-auc:0.82212
[9] validation_0-auc:0.86282 validation_1-auc:0.82205
[10] validation_0-auc:0.86284 validation_1-auc:0.82076
[11] validation_0-auc:0.86207 validation_1-auc:0.82028
[12] validation_0-auc:0.86463 validation_1-auc:0.82218
[13] validation_0-auc:0.86692 validation_1-auc:0.82317
[14] validation_0-auc:0.86878 validation_1-auc:0.82529
[15] validation_0-auc:0.86830 validation_1-auc:0.82424
[16] validation_0-auc:0.87026 validation_1-auc:0.82532
[17] validation_0-auc:0.87279 validation_1-auc:0.82603
[18] validation_0-auc:0.87304 validation_1-auc:0.82561
[19] validation_0-auc:0.87533 validation_1-auc:0.82637
[20] validation_0-auc:0.87539 validation_1-auc:0.82525
[21] validation_0-auc:0.87631 validation_1-auc:0.82475
[22] validation_0-auc:0.87755 validation_1-auc:0.82573
[23] validation_0-auc:0.88026 validation_1-auc:0.82674
[24] validation_0-auc:0.88166 validation_1-auc:0.82771
[25] validation_0-auc:0.88277 validation_1-auc:0.82822
[26] validation_0-auc:0.88436 validation_1-auc:0.82806
[27] validation_0-auc:0.88607 validation_1-auc:0.82819
[28] validation_0-auc:0.88625 validation_1-auc:0.82665
[29] validation_0-auc:0.88761 validation_1-auc:0.82739
[30] validation_0-auc:0.88815 validation_1-auc:0.82659
[31] validation_0-auc:0.88919 validation_1-auc:0.82770
[32] validation_0-auc:0.89118 validation_1-auc:0.82820
[33] validation_0-auc:0.89151 validation_1-auc:0.82799
[34] validation_0-auc:0.89241 validation_1-auc:0.82822
[35] validation_0-auc:0.89386 validation_1-auc:0.82878
[36] validation_0-auc:0.89465 validation_1-auc:0.82859
[37] validation_0-auc:0.89505 validation_1-auc:0.82796
[38] validation_0-auc:0.89548 validation_1-auc:0.82770
[39] validation_0-auc:0.89606 validation_1-auc:0.82809
[40] validation_0-auc:0.89688 validation_1-auc:0.82988
[41] validation_0-auc:0.89741 validation_1-auc:0.82913
[42] validation_0-auc:0.89894 validation_1-auc:0.83030
[43] validation_0-auc:0.89966 validation_1-auc:0.83100
[44] validation_0-auc:0.90098 validation_1-auc:0.83117
[45] validation_0-auc:0.90133 validation_1-auc:0.83174
[46] validation_0-auc:0.90162 validation_1-auc:0.83209
[47] validation_0-auc:0.90273 validation_1-auc:0.83154
[48] validation_0-auc:0.90335 validation_1-auc:0.83188
[49] validation_0-auc:0.90452 validation_1-auc:0.83232
[50] validation_0-auc:0.90516 validation_1-auc:0.83217
[51] validation_0-auc:0.90613 validation_1-auc:0.83277
[52] validation_0-auc:0.90642 validation_1-auc:0.83298
[53] validation_0-auc:0.90755 validation_1-auc:0.83230
[54] validation_0-auc:0.90784 validation_1-auc:0.83227
[55] validation_0-auc:0.90899 validation_1-auc:0.83253
[56] validation_0-auc:0.90932 validation_1-auc:0.83214
[57] validation_0-auc:0.90973 validation_1-auc:0.83150
[58] validation_0-auc:0.91060 validation_1-auc:0.83250
[59] validation_0-auc:0.91125 validation_1-auc:0.83183
[60] validation_0-auc:0.91164 validation_1-auc:0.83135
[61] validation_0-auc:0.91230 validation_1-auc:0.83104
[62] validation_0-auc:0.91270 validation_1-auc:0.83091
[63] validation_0-auc:0.91309 validation_1-auc:0.83032
[64] validation_0-auc:0.91390 validation_1-auc:0.83069
[65] validation_0-auc:0.91470 validation_1-auc:0.83041
[66] validation_0-auc:0.91542 validation_1-auc:0.83024
[67] validation_0-auc:0.91593 validation_1-auc:0.83104
[68] validation_0-auc:0.91647 validation_1-auc:0.83041
[69] validation_0-auc:0.91713 validation_1-auc:0.83122
[70] validation_0-auc:0.91761 validation_1-auc:0.83085
[71] validation_0-auc:0.91784 validation_1-auc:0.83059
[72] validation_0-auc:0.91846 validation_1-auc:0.83110
[73] validation_0-auc:0.91874 validation_1-auc:0.83085
[74] validation_0-auc:0.91944 validation_1-auc:0.83065
[75] validation_0-auc:0.91976 validation_1-auc:0.83107
[76] validation_0-auc:0.92022 validation_1-auc:0.83147
[77] validation_0-auc:0.92049 validation_1-auc:0.83143
[78] validation_0-auc:0.92081 validation_1-auc:0.83193
[79] validation_0-auc:0.92117 validation_1-auc:0.83232
[80] validation_0-auc:0.92192 validation_1-auc:0.83284
[81] validation_0-auc:0.92240 validation_1-auc:0.83290
[82] validation_0-auc:0.92254 validation_1-auc:0.83322
[83] validation_0-auc:0.92306 validation_1-auc:0.83309
[84] validation_0-auc:0.92343 validation_1-auc:0.83280
[85] validation_0-auc:0.92374 validation_1-auc:0.83290
[86] validation_0-auc:0.92391 validation_1-auc:0.83325
[87] validation_0-auc:0.92430 validation_1-auc:0.83357
[88] validation_0-auc:0.92465 validation_1-auc:0.83339
[89] validation_0-auc:0.92489 validation_1-auc:0.83350
[90] validation_0-auc:0.92517 validation_1-auc:0.83344
[91] validation_0-auc:0.92542 validation_1-auc:0.83371
[92] validation_0-auc:0.92564 validation_1-auc:0.83369
[93] validation_0-auc:0.92574 validation_1-auc:0.83388
[94] validation_0-auc:0.92606 validation_1-auc:0.83363
[95] validation_0-auc:0.92645 validation_1-auc:0.83380
[96] validation_0-auc:0.92652 validation_1-auc:0.83394
[97] validation_0-auc:0.92662 validation_1-auc:0.83381
[98] validation_0-auc:0.92682 validation_1-auc:0.83373
[99] validation_0-auc:0.92697 validation_1-auc:0.83395
[0] validation_0-auc:0.82657 validation_1-auc:0.80149
[1] validation_0-auc:0.83724 validation_1-auc:0.80530
[2] validation_0-auc:0.84620 validation_1-auc:0.81009
[3] validation_0-auc:0.84867 validation_1-auc:0.81020
[4] validation_0-auc:0.85292 validation_1-auc:0.81531
[5] validation_0-auc:0.85540 validation_1-auc:0.81681
[6] validation_0-auc:0.85916 validation_1-auc:0.81716
[7] validation_0-auc:0.86259 validation_1-auc:0.81789
[8] validation_0-auc:0.86617 validation_1-auc:0.81788
[9] validation_0-auc:0.87338 validation_1-auc:0.82355
[10] validation_0-auc:0.87480 validation_1-auc:0.82203
[11] validation_0-auc:0.87537 validation_1-auc:0.82012
[12] validation_0-auc:0.88123 validation_1-auc:0.82314
[13] validation_0-auc:0.88803 validation_1-auc:0.82376
[14] validation_0-auc:0.88986 validation_1-auc:0.82503
[15] validation_0-auc:0.89340 validation_1-auc:0.82631
[16] validation_0-auc:0.89568 validation_1-auc:0.82610
[17] validation_0-auc:0.89997 validation_1-auc:0.82685
[18] validation_0-auc:0.90108 validation_1-auc:0.82601
[19] validation_0-auc:0.90426 validation_1-auc:0.82584
[20] validation_0-auc:0.90603 validation_1-auc:0.82607
[21] validation_0-auc:0.90763 validation_1-auc:0.82503
[22] validation_0-auc:0.90984 validation_1-auc:0.82692
[23] validation_0-auc:0.91251 validation_1-auc:0.82787
[24] validation_0-auc:0.91464 validation_1-auc:0.82807
[25] validation_0-auc:0.91644 validation_1-auc:0.82851
[26] validation_0-auc:0.91787 validation_1-auc:0.82772
[27] validation_0-auc:0.92011 validation_1-auc:0.82873
[28] validation_0-auc:0.92232 validation_1-auc:0.82934
[29] validation_0-auc:0.92407 validation_1-auc:0.82839
[30] validation_0-auc:0.92527 validation_1-auc:0.82740
[31] validation_0-auc:0.92675 validation_1-auc:0.82803
[32] validation_0-auc:0.92822 validation_1-auc:0.82714
[33] validation_0-auc:0.92920 validation_1-auc:0.82659
[34] validation_0-auc:0.93015 validation_1-auc:0.82688
[35] validation_0-auc:0.93111 validation_1-auc:0.82661
[36] validation_0-auc:0.93189 validation_1-auc:0.82657
[37] validation_0-auc:0.93278 validation_1-auc:0.82603
[38] validation_0-auc:0.93396 validation_1-auc:0.82642
[39] validation_0-auc:0.93414 validation_1-auc:0.82625
[40] validation_0-auc:0.93486 validation_1-auc:0.82603
[41] validation_0-auc:0.93513 validation_1-auc:0.82577
[42] validation_0-auc:0.93574 validation_1-auc:0.82560
[43] validation_0-auc:0.93603 validation_1-auc:0.82574
[44] validation_0-auc:0.93697 validation_1-auc:0.82628
[45] validation_0-auc:0.93714 validation_1-auc:0.82631
[46] validation_0-auc:0.93738 validation_1-auc:0.82635
[47] validation_0-auc:0.93798 validation_1-auc:0.82599
[48] validation_0-auc:0.93835 validation_1-auc:0.82550
[49] validation_0-auc:0.93861 validation_1-auc:0.82571
[50] validation_0-auc:0.93908 validation_1-auc:0.82567
[51] validation_0-auc:0.93929 validation_1-auc:0.82541
[52] validation_0-auc:0.93967 validation_1-auc:0.82542
[53] validation_0-auc:0.93994 validation_1-auc:0.82518
[54] validation_0-auc:0.94052 validation_1-auc:0.82525
[55] validation_0-auc:0.94081 validation_1-auc:0.82504
[56] validation_0-auc:0.94162 validation_1-auc:0.82495
[57] validation_0-auc:0.94169 validation_1-auc:0.82482
[58] validation_0-auc:0.94227 validation_1-auc:0.82445
[0] validation_0-auc:0.82424 validation_1-auc:0.80450
[1] validation_0-auc:0.83668 validation_1-auc:0.81505
[2] validation_0-auc:0.84661 validation_1-auc:0.82324
[3] validation_0-auc:0.84859 validation_1-auc:0.82247
[4] validation_0-auc:0.85267 validation_1-auc:0.82327
[5] validation_0-auc:0.85659 validation_1-auc:0.82285
[6] validation_0-auc:0.86155 validation_1-auc:0.82378
[7] validation_0-auc:0.86509 validation_1-auc:0.82397
[8] validation_0-auc:0.87084 validation_1-auc:0.82619
[9] validation_0-auc:0.87774 validation_1-auc:0.82645
[10] validation_0-auc:0.87954 validation_1-auc:0.82517
[11] validation_0-auc:0.88039 validation_1-auc:0.82228
[12] validation_0-auc:0.88679 validation_1-auc:0.82404
[13] validation_0-auc:0.89074 validation_1-auc:0.82625
[14] validation_0-auc:0.89342 validation_1-auc:0.82792
[15] validation_0-auc:0.89701 validation_1-auc:0.82786
[16] validation_0-auc:0.89923 validation_1-auc:0.82903
[17] validation_0-auc:0.90148 validation_1-auc:0.82949
[18] validation_0-auc:0.90263 validation_1-auc:0.82898
[19] validation_0-auc:0.90488 validation_1-auc:0.83076
[20] validation_0-auc:0.90660 validation_1-auc:0.83152
[21] validation_0-auc:0.90746 validation_1-auc:0.83091
[22] validation_0-auc:0.91012 validation_1-auc:0.83122
[23] validation_0-auc:0.91329 validation_1-auc:0.83084
[24] validation_0-auc:0.91559 validation_1-auc:0.83109
[25] validation_0-auc:0.91763 validation_1-auc:0.83178
[26] validation_0-auc:0.91886 validation_1-auc:0.83141
[27] validation_0-auc:0.92161 validation_1-auc:0.83048
[28] validation_0-auc:0.92383 validation_1-auc:0.82992
[29] validation_0-auc:0.92523 validation_1-auc:0.83001
[30] validation_0-auc:0.92728 validation_1-auc:0.82892
[31] validation_0-auc:0.92877 validation_1-auc:0.82889
[32] validation_0-auc:0.92991 validation_1-auc:0.82872
[33] validation_0-auc:0.93084 validation_1-auc:0.82927
[34] validation_0-auc:0.93239 validation_1-auc:0.82973
[35] validation_0-auc:0.93331 validation_1-auc:0.83009
[36] validation_0-auc:0.93401 validation_1-auc:0.83035
[37] validation_0-auc:0.93457 validation_1-auc:0.82993
[38] validation_0-auc:0.93561 validation_1-auc:0.82995
[39] validation_0-auc:0.93592 validation_1-auc:0.83063
[40] validation_0-auc:0.93653 validation_1-auc:0.83064
[41] validation_0-auc:0.93724 validation_1-auc:0.83059
[42] validation_0-auc:0.93780 validation_1-auc:0.83065
[43] validation_0-auc:0.93805 validation_1-auc:0.83080
[44] validation_0-auc:0.93852 validation_1-auc:0.83062
[45] validation_0-auc:0.93883 validation_1-auc:0.83053
[46] validation_0-auc:0.93924 validation_1-auc:0.83035
[47] validation_0-auc:0.93978 validation_1-auc:0.83050
[48] validation_0-auc:0.94019 validation_1-auc:0.83073
[49] validation_0-auc:0.94044 validation_1-auc:0.83093
[50] validation_0-auc:0.94066 validation_1-auc:0.83113
[51] validation_0-auc:0.94151 validation_1-auc:0.83151
[52] validation_0-auc:0.94212 validation_1-auc:0.83151
[53] validation_0-auc:0.94252 validation_1-auc:0.83139
[54] validation_0-auc:0.94318 validation_1-auc:0.83146
[0] validation_0-auc:0.83493 validation_1-auc:0.81063
[1] validation_0-auc:0.84197 validation_1-auc:0.81316
[2] validation_0-auc:0.85026 validation_1-auc:0.81739
[3] validation_0-auc:0.85484 validation_1-auc:0.81605
[4] validation_0-auc:0.85947 validation_1-auc:0.81764
[5] validation_0-auc:0.86117 validation_1-auc:0.81876
[6] validation_0-auc:0.86492 validation_1-auc:0.82089
[7] validation_0-auc:0.86831 validation_1-auc:0.82248
[8] validation_0-auc:0.87071 validation_1-auc:0.82290
[9] validation_0-auc:0.87695 validation_1-auc:0.82570
[10] validation_0-auc:0.87940 validation_1-auc:0.82562
[11] validation_0-auc:0.88177 validation_1-auc:0.82244
[12] validation_0-auc:0.88622 validation_1-auc:0.82344
[13] validation_0-auc:0.89065 validation_1-auc:0.82471
[14] validation_0-auc:0.89352 validation_1-auc:0.82734
[15] validation_0-auc:0.89774 validation_1-auc:0.82795
[16] validation_0-auc:0.90057 validation_1-auc:0.82916
[17] validation_0-auc:0.90264 validation_1-auc:0.83059
[18] validation_0-auc:0.90409 validation_1-auc:0.82815
[19] validation_0-auc:0.90743 validation_1-auc:0.82937
[20] validation_0-auc:0.91043 validation_1-auc:0.82959
[21] validation_0-auc:0.91094 validation_1-auc:0.82917
[22] validation_0-auc:0.91353 validation_1-auc:0.82960
[23] validation_0-auc:0.91641 validation_1-auc:0.83030
[24] validation_0-auc:0.91794 validation_1-auc:0.83041
[25] validation_0-auc:0.91987 validation_1-auc:0.83083
[26] validation_0-auc:0.92135 validation_1-auc:0.83006
[27] validation_0-auc:0.92381 validation_1-auc:0.83143
[28] validation_0-auc:0.92559 validation_1-auc:0.83141
[29] validation_0-auc:0.92712 validation_1-auc:0.83202
[30] validation_0-auc:0.92833 validation_1-auc:0.83131
[31] validation_0-auc:0.92964 validation_1-auc:0.83176
[32] validation_0-auc:0.93092 validation_1-auc:0.83183
[33] validation_0-auc:0.93171 validation_1-auc:0.83129
[34] validation_0-auc:0.93271 validation_1-auc:0.83132
[35] validation_0-auc:0.93400 validation_1-auc:0.83161
[36] validation_0-auc:0.93479 validation_1-auc:0.83178
[37] validation_0-auc:0.93587 validation_1-auc:0.83120
[38] validation_0-auc:0.93657 validation_1-auc:0.83136
[39] validation_0-auc:0.93703 validation_1-auc:0.83185
[40] validation_0-auc:0.93769 validation_1-auc:0.83187
[41] validation_0-auc:0.93822 validation_1-auc:0.83139
[42] validation_0-auc:0.93873 validation_1-auc:0.83120
[43] validation_0-auc:0.93932 validation_1-auc:0.83127
[44] validation_0-auc:0.93978 validation_1-auc:0.83147
[45] validation_0-auc:0.94036 validation_1-auc:0.83218
[46] validation_0-auc:0.94048 validation_1-auc:0.83239
[47] validation_0-auc:0.94112 validation_1-auc:0.83245
[48] validation_0-auc:0.94181 validation_1-auc:0.83259
[49] validation_0-auc:0.94227 validation_1-auc:0.83293
[50] validation_0-auc:0.94278 validation_1-auc:0.83287
[51] validation_0-auc:0.94289 validation_1-auc:0.83283
[52] validation_0-auc:0.94327 validation_1-auc:0.83286
[53] validation_0-auc:0.94346 validation_1-auc:0.83277
[54] validation_0-auc:0.94356 validation_1-auc:0.83262
[55] validation_0-auc:0.94387 validation_1-auc:0.83290
[56] validation_0-auc:0.94409 validation_1-auc:0.83297
[57] validation_0-auc:0.94476 validation_1-auc:0.83262
[58] validation_0-auc:0.94512 validation_1-auc:0.83246
[59] validation_0-auc:0.94529 validation_1-auc:0.83265
[60] validation_0-auc:0.94542 validation_1-auc:0.83253
[61] validation_0-auc:0.94566 validation_1-auc:0.83228
[62] validation_0-auc:0.94589 validation_1-auc:0.83233
[63] validation_0-auc:0.94598 validation_1-auc:0.83211
[64] validation_0-auc:0.94598 validation_1-auc:0.83213
[65] validation_0-auc:0.94626 validation_1-auc:0.83215
[66] validation_0-auc:0.94647 validation_1-auc:0.83167
[67] validation_0-auc:0.94659 validation_1-auc:0.83147
[68] validation_0-auc:0.94664 validation_1-auc:0.83137
[69] validation_0-auc:0.94672 validation_1-auc:0.83122
[70] validation_0-auc:0.94741 validation_1-auc:0.83109
[71] validation_0-auc:0.94752 validation_1-auc:0.83093
[72] validation_0-auc:0.94770 validation_1-auc:0.83095
[73] validation_0-auc:0.94776 validation_1-auc:0.83093
[74] validation_0-auc:0.94798 validation_1-auc:0.83065
[75] validation_0-auc:0.94807 validation_1-auc:0.83056
[76] validation_0-auc:0.94837 validation_1-auc:0.83042
[77] validation_0-auc:0.94841 validation_1-auc:0.83046
[78] validation_0-auc:0.94849 validation_1-auc:0.83019
[79] validation_0-auc:0.94905 validation_1-auc:0.83002
[80] validation_0-auc:0.94916 validation_1-auc:0.82974
[81] validation_0-auc:0.94936 validation_1-auc:0.82947
[82] validation_0-auc:0.94958 validation_1-auc:0.82970
[83] validation_0-auc:0.94968 validation_1-auc:0.82960
[84] validation_0-auc:0.94972 validation_1-auc:0.82947
[85] validation_0-auc:0.94980 validation_1-auc:0.82936
[86] validation_0-auc:0.94989 validation_1-auc:0.82936
[0] validation_0-auc:0.82822 validation_1-auc:0.80085
[1] validation_0-auc:0.83452 validation_1-auc:0.80298
[2] validation_0-auc:0.83879 validation_1-auc:0.80562
[3] validation_0-auc:0.84177 validation_1-auc:0.80715
[4] validation_0-auc:0.84381 validation_1-auc:0.80986
[5] validation_0-auc:0.84400 validation_1-auc:0.80852
[6] validation_0-auc:0.84627 validation_1-auc:0.81028
[7] validation_0-auc:0.85133 validation_1-auc:0.81525
[8] validation_0-auc:0.85175 validation_1-auc:0.81625
[9] validation_0-auc:0.85165 validation_1-auc:0.81321
[10] validation_0-auc:0.85088 validation_1-auc:0.81170
[11] validation_0-auc:0.85254 validation_1-auc:0.81344
[12] validation_0-auc:0.85442 validation_1-auc:0.81525
[13] validation_0-auc:0.85776 validation_1-auc:0.81538
[14] validation_0-auc:0.85831 validation_1-auc:0.81600
[15] validation_0-auc:0.86014 validation_1-auc:0.81717
[16] validation_0-auc:0.86258 validation_1-auc:0.81812
[17] validation_0-auc:0.86392 validation_1-auc:0.82040
[18] validation_0-auc:0.86335 validation_1-auc:0.81748
[19] validation_0-auc:0.86497 validation_1-auc:0.81793
[20] validation_0-auc:0.86558 validation_1-auc:0.81926
[21] validation_0-auc:0.86545 validation_1-auc:0.81812
[22] validation_0-auc:0.86794 validation_1-auc:0.81943
[23] validation_0-auc:0.86995 validation_1-auc:0.82103
[24] validation_0-auc:0.87129 validation_1-auc:0.82201
[25] validation_0-auc:0.87218 validation_1-auc:0.82260
[26] validation_0-auc:0.87359 validation_1-auc:0.82260
[27] validation_0-auc:0.87491 validation_1-auc:0.82360
[28] validation_0-auc:0.87686 validation_1-auc:0.82454
[29] validation_0-auc:0.87887 validation_1-auc:0.82564
[30] validation_0-auc:0.87976 validation_1-auc:0.82536
[31] validation_0-auc:0.88176 validation_1-auc:0.82619
[32] validation_0-auc:0.88278 validation_1-auc:0.82792
[33] validation_0-auc:0.88363 validation_1-auc:0.82810
[34] validation_0-auc:0.88502 validation_1-auc:0.82839
[35] validation_0-auc:0.88603 validation_1-auc:0.82901
[36] validation_0-auc:0.88681 validation_1-auc:0.82852
[37] validation_0-auc:0.88729 validation_1-auc:0.82778
[38] validation_0-auc:0.88757 validation_1-auc:0.82783
6%|██▉ | 3/50 [01:56<30:20, 38.74s/trial, best loss: -0.8358993926439547]
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_5288\352038483.py in <module>
36
37 # fmin()함수를 호출. max_evals지정된 횟수만큼 반복 후 목적함수의 최소값을 가지는 최적 입력값 추출.
---> 38 best = fmin(fn=objective_func,
39 space=xgb_search_space,
40 algo=tpe.suggest,
C:\anaconda\lib\site-packages\hyperopt\fmin.py in fmin(fn, space, algo, max_evals, timeout, loss_threshold, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin, points_to_evaluate, max_queue_len, show_progressbar, early_stop_fn, trials_save_file)
538
539 if allow_trials_fmin and hasattr(trials, "fmin"):
--> 540 return trials.fmin(
541 fn,
542 space,
C:\anaconda\lib\site-packages\hyperopt\base.py in fmin(self, fn, space, algo, max_evals, timeout, loss_threshold, max_queue_len, rstate, verbose, pass_expr_memo_ctrl, catch_eval_exceptions, return_argmin, show_progressbar, early_stop_fn, trials_save_file)
669 from .fmin import fmin
670
--> 671 return fmin(
672 fn,
673 space,
C:\anaconda\lib\site-packages\hyperopt\fmin.py in fmin(fn, space, algo, max_evals, timeout, loss_threshold, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin, points_to_evaluate, max_queue_len, show_progressbar, early_stop_fn, trials_save_file)
584
585 # next line is where the fmin is actually executed
--> 586 rval.exhaust()
587
588 if return_argmin:
C:\anaconda\lib\site-packages\hyperopt\fmin.py in exhaust(self)
362 def exhaust(self):
363 n_done = len(self.trials)
--> 364 self.run(self.max_evals - n_done, block_until_done=self.asynchronous)
365 self.trials.refresh()
366 return self
C:\anaconda\lib\site-packages\hyperopt\fmin.py in run(self, N, block_until_done)
298 else:
299 # -- loop over trials and do the jobs directly
--> 300 self.serial_evaluate()
301
302 self.trials.refresh()
C:\anaconda\lib\site-packages\hyperopt\fmin.py in serial_evaluate(self, N)
176 ctrl = base.Ctrl(self.trials, current_trial=trial)
177 try:
--> 178 result = self.domain.evaluate(spec, ctrl)
179 except Exception as e:
180 logger.error("job exception: %s" % str(e))
C:\anaconda\lib\site-packages\hyperopt\base.py in evaluate(self, config, ctrl, attach_attachments)
890 print_node_on_error=self.rec_eval_print_node_on_error,
891 )
--> 892 rval = self.fn(pyll_rval)
893
894 if isinstance(rval, (float, int, np.number)):
~\AppData\Local\Temp\ipykernel_5288\352038483.py in objective_func(search_space)
21 X_val, y_val = X_train.iloc[val_index], y_train.iloc[val_index] #검증
22 # early stopping은 30회로 설정하고 추출된 학습과 검증 데이터로 XGBClassifier 학습 수행.
---> 23 xgb_clf.fit(X_tr, y_tr, early_stopping_rounds=30, eval_metric='auc',
24 eval_set=[(X_tr, y_tr), (X_val, y_val)])
25
C:\anaconda\lib\site-packages\xgboost\core.py in inner_f(*args, **kwargs)
504 for k, arg in zip(sig.parameters, args):
505 kwargs[k] = arg
--> 506 return f(**kwargs)
507
508 return inner_f
C:\anaconda\lib\site-packages\xgboost\sklearn.py in fit(self, X, y, sample_weight, base_margin, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set, base_margin_eval_set, feature_weights, callbacks)
1248 )
1249
-> 1250 self._Booster = train(
1251 params,
1252 train_dmatrix,
C:\anaconda\lib\site-packages\xgboost\training.py in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, xgb_model, callbacks)
186 Booster : a trained booster model
187 """
--> 188 bst = _train_internal(params, dtrain,
189 num_boost_round=num_boost_round,
190 evals=evals,
C:\anaconda\lib\site-packages\xgboost\training.py in _train_internal(params, dtrain, num_boost_round, evals, obj, feval, xgb_model, callbacks, evals_result, maximize, verbose_eval, early_stopping_rounds)
79 if callbacks.before_iteration(bst, i, dtrain, evals):
80 break
---> 81 bst.update(dtrain, i, obj)
82 if callbacks.after_iteration(bst, i, dtrain, evals):
83 break
C:\anaconda\lib\site-packages\xgboost\core.py in update(self, dtrain, iteration, fobj)
1678
1679 if fobj is None:
-> 1680 _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
1681 ctypes.c_int(iteration),
1682 dtrain.handle))
KeyboardInterrupt:
언더 샘플링과 오버 샘플링의 이해
레이블이 불균형한 분포를 가진 데이터, 신용카드 사기 데이터가 전체 데이터의 0.172%이다. 나머지는 정상거래.
사기 데이터를 찾고자 함. 데이터가 너무 작아서 결과가 제대로 안 나옴. 데이터를 만들어서 넣자. -> oversampling
undersampling : 많은 것을 줄이는 것, oversampling : 작은 것을 많은 것으로(현재 값기반으로 근처에 데이터를 만들어서 증식시킴)
imbalanced-learn 설치 필요
로그변환이 필요한 이유: 정규분포를 띄는 데이터가 수집이 안 될 수 있다.
정규뷴포를 띄는 데이터지만 데이터 분포를 확인해 보니 수집이 안 될 수 있다.
standarscale로 하면 데이터가 평균0, 분산1, 정균분포의 형태로 변경된다.
데이터 로그변환을 하면 데이터가 정규분포로 변하는 성능이 stsandascale보다 좋다.
이상치 데이터를 제거하고 확인
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
card_df = pd.read_csv('creditcard.csv')
card_df.head(3)
Time | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | ... | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | Amount | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | -1.359807 | -0.072781 | 2.536347 | 1.378155 | -0.338321 | 0.462388 | 0.239599 | 0.098698 | 0.363787 | ... | -0.018307 | 0.277838 | -0.110474 | 0.066928 | 0.128539 | -0.189115 | 0.133558 | -0.021053 | 149.62 | 0 |
1 | 0.0 | 1.191857 | 0.266151 | 0.166480 | 0.448154 | 0.060018 | -0.082361 | -0.078803 | 0.085102 | -0.255425 | ... | -0.225775 | -0.638672 | 0.101288 | -0.339846 | 0.167170 | 0.125895 | -0.008983 | 0.014724 | 2.69 | 0 |
2 | 1.0 | -1.358354 | -1.340163 | 1.773209 | 0.379780 | -0.503198 | 1.800499 | 0.791461 | 0.247676 | -1.514654 | ... | 0.247998 | 0.771679 | 0.909412 | -0.689281 | -0.327642 | -0.139097 | -0.055353 | -0.059752 | 378.66 | 0 |
3 rows × 31 columns
from sklearn.model_selection import train_test_split
# 인자로 입력받은 DataFrame을 복사 한 뒤 Time 컬럼만 삭제하고 복사된 DataFrame 반환
def get_preprocessed_df(df=None):
df_copy = df.copy()
df_copy.drop('Time', axis=1, inplace=True)
return df_copy
# 사전 데이터 가공 후 학습과 테스트 데이터 세트를 반환하는 함수.
def get_train_test_dataset(df=None):
# 인자로 입력된 DataFrame의 사전 데이터 가공이 완료된 복사 DataFrame 반환
df_copy = get_preprocessed_df(df)
# DataFrame의 맨 마지막 컬럼이 레이블, 나머지는 피처들
X_features = df_copy.iloc[:, :-1]
y_target = df_copy.iloc[:, -1]
# train_test_split( )으로 학습과 테스트 데이터 분할. stratify=y_target으로 Stratified 기반 분할
X_train, X_test, y_train, y_test = \
train_test_split(X_features, y_target, test_size=0.3, random_state=0,stratify=y_target) #random_state=0,stratify=y_target : 비율을 맞춰줌
# 학습과 테스트 데이터 세트 반환
return X_train, X_test, y_train, y_test
X_train, X_test, y_train, y_test = get_train_test_dataset(card_df)
print('학습 데이터 레이블 값 비율')
print(y_train.value_counts()/y_train.shape[0] * 100) #y_train.shape[0] = 데이터 전체 건수
print('테스트 데이터 레이블 값 비율')
print(y_test.value_counts()/y_test.shape[0] * 100)
학습 데이터 레이블 값 비율
0 99.828453
1 0.171547
Name: Class, dtype: float64
테스트 데이터 레이블 값 비율
0 99.829122
1 0.170878
Name: Class, dtype: float64
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import roc_auc_score
def get_clf_eval(y_test, pred=None, pred_proba=None):
confusion = confusion_matrix( y_test, pred)
accuracy = accuracy_score(y_test , pred)
precision = precision_score(y_test , pred)
recall = recall_score(y_test , pred)
f1 = f1_score(y_test,pred)
# ROC-AUC 추가
roc_auc = roc_auc_score(y_test, pred_proba)
print('오차 행렬')
print(confusion)
# ROC-AUC print 추가
print('정확도: {0:.4f}, 정밀도: {1:.4f}, 재현율: {2:.4f},\
F1: {3:.4f}, AUC:{4:.4f}'.format(accuracy, precision, recall, f1, roc_auc))
from sklearn.linear_model import LogisticRegression
lr_clf = LogisticRegression()
lr_clf.fit(X_train, y_train)
lr_pred = lr_clf.predict(X_test)
lr_pred_proba = lr_clf.predict_proba(X_test)[:, 1]
# 3장에서 사용한 get_clf_eval() 함수를 이용하여 평가 수행.
get_clf_eval(y_test, lr_pred, lr_pred_proba)
오차 행렬
[[85281 14]
[ 48 98]]
정확도: 0.9993, 정밀도: 0.8750, 재현율: 0.6712, F1: 0.7597, AUC:0.9743
# 인자로 사이킷런의 Estimator객체와, 학습/테스트 데이터 세트를 입력 받아서 학습/예측/평가 수행.
def get_model_train_eval(model, ftr_train=None, ftr_test=None, tgt_train=None, tgt_test=None):
model.fit(ftr_train, tgt_train)
pred = model.predict(ftr_test)
pred_proba = model.predict_proba(ftr_test)[:, 1]
get_clf_eval(tgt_test, pred, pred_proba)
from lightgbm import LGBMClassifier
lgbm_clf = LGBMClassifier(n_estimators=1000, num_leaves=64, n_jobs=-1, boost_from_average=False) #num_leaves : 리프노드의 최대갯수
get_model_train_eval(lgbm_clf, ftr_train=X_train, ftr_test=X_test, tgt_train=y_train, tgt_test=y_test)
# 오차 행렬 #LogisticRegression
# [[85281 14]
# [ 48 98]]
# 정확도: 0.9993, 정밀도: 0.8750, 재현율: 0.6712, F1: 0.7597, AUC:0.9743
# 오차 행렬 #LGBMClassifier
# [[85290 5]
# [ 25 121]]
# 정확도: 0.9996, 정밀도: 0.9603, 재현율: 0.8288, F1: 0.8897, AUC:0.9780
오차 행렬
[[85290 5]
[ 25 121]]
정확도: 0.9996, 정밀도: 0.9603, 재현율: 0.8288, F1: 0.8897, AUC:0.9780
import seaborn as sns
plt.figure(figsize=(8, 4))
plt.xticks(range(0, 30000, 1000), rotation=60)
sns.histplot(card_df['Amount'], bins=100, kde=True)
plt.show()
from sklearn.preprocessing import StandardScaler #StandardScaler - 정규분포 형태로
# 사이킷런의 StandardScaler를 이용하여 정규분포 형태로 Amount 피처값 변환하는 로직으로 수정.
def get_preprocessed_df(df=None):
df_copy = df.copy()
scaler = StandardScaler()
amount_n = scaler.fit_transform(df_copy['Amount'].values.reshape(-1, 1))
# 변환된 Amount를 Amount_Scaled로 피처명 변경후 DataFrame맨 앞 컬럼으로 입력
df_copy.insert(0, 'Amount_Scaled', amount_n) #insert 집어넣는 위치를 알 수 있다.
# 기존 Time, Amount 피처 삭제
df_copy.drop(['Time','Amount'], axis=1, inplace=True)
return df_copy
# Amount를 정규분포 형태로 변환 후 로지스틱 회귀 및 LightGBM 수행.
X_train, X_test, y_train, y_test = get_train_test_dataset(card_df)
X_train.head(1)
Amount_Scaled | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | ... | V19 | V20 | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
211605 | -0.350471 | -8.367621 | 7.402969 | -5.114191 | -2.966792 | -0.985904 | -1.660018 | 0.397816 | 1.00825 | 5.290976 | ... | -0.750795 | 3.589299 | -0.557927 | 0.349087 | 0.301734 | 0.66233 | 1.145939 | -0.012273 | 1.513736 | 0.669504 |
1 rows × 29 columns
print('### 로지스틱 회귀 예측 성능 ###')
lr_clf = LogisticRegression()
get_model_train_eval(lr_clf, ftr_train=X_train, ftr_test=X_test, tgt_train=y_train, tgt_test=y_test)
print('### LightGBM 예측 성능 ###')
lgbm_clf = LGBMClassifier(n_estimators=1000, num_leaves=64, n_jobs=-1, boost_from_average=False)
get_model_train_eval(lgbm_clf, ftr_train=X_train, ftr_test=X_test, tgt_train=y_train, tgt_test=y_test)
# 오차 행렬 #LogisticRegression
# [[85281 14]
# [ 48 98]]
# 정확도: 0.9993, 정밀도: 0.8750, 재현율: 0.6712, F1: 0.7597, AUC:0.9743
# 오차 행렬 #LGBMClassifier
# [[85290 5]
# [ 25 121]]
# 정확도: 0.9996, 정밀도: 0.9603, 재현율: 0.8288, F1: 0.8897, AUC:0.9780
### 로지스틱 회귀 예측 성능 ###
오차 행렬
[[85283 12]
[ 59 89]]
정확도: 0.9992, 정밀도: 0.8812, 재현율: 0.6014, F1: 0.7149, AUC:0.9727
### LightGBM 예측 성능 ###
오차 행렬
[[85290 5]
[ 35 113]]
정확도: 0.9995, 정밀도: 0.9576, 재현율: 0.7635, F1: 0.8496, AUC:0.9796
데이터(입력값)가 정규분포 형태인지 확인
정규분포 형태가 아니라면 데이터 수집이 덜 됐다.
log1p를 적용해서 정규분포 형태로 바꿈
회귀에서는 결정값이 정규분포 형태인지 확인
def get_preprocessed_df(df=None):
df_copy = df.copy()
# 넘파이의 log1p( )를 이용하여 Amount를 로그 변환 -> 정규분포 형태로 변경
amount_n = np.log1p(df_copy['Amount']) #원상복구가 가능하다
df_copy.insert(0, 'Amount_Scaled', amount_n)
df_copy.drop(['Time','Amount'], axis=1, inplace=True)
return df_copy
X_train, X_test, y_train, y_test = get_train_test_dataset(card_df)
print('### 로지스틱 회귀 예측 성능 ###')
get_model_train_eval(lr_clf, ftr_train=X_train, ftr_test=X_test, tgt_train=y_train, tgt_test=y_test)
print('### LightGBM 예측 성능 ###')
get_model_train_eval(lgbm_clf, ftr_train=X_train, ftr_test=X_test, tgt_train=y_train, tgt_test=y_test)
# 오차 행렬 #LogisticRegression
# [[85281 14]
# [ 48 98]]
# 정확도: 0.9993, 정밀도: 0.8750, 재현율: 0.6712, F1: 0.7597, AUC:0.9743
# 오차 행렬 #LGBMClassifier
# [[85290 5]
# [ 25 121]]
# 정확도: 0.9996, 정밀도: 0.9603, 재현율: 0.8288, F1: 0.8897, AUC:0.9780
### 로지스틱 회귀 예측 성능 ###
오차 행렬
[[85283 12]
[ 59 89]]
정확도: 0.9992, 정밀도: 0.8812, 재현율: 0.6014, F1: 0.7149, AUC:0.9727
### LightGBM 예측 성능 ###
오차 행렬
[[85290 5]
[ 35 113]]
정확도: 0.9995, 정밀도: 0.9576, 재현율: 0.7635, F1: 0.8496, AUC:0.9796
import seaborn as sns
plt.figure(figsize=(9, 9))
corr = card_df.corr() #상관관계
sns.heatmap(corr, cmap='RdBu')
<AxesSubplot:>
+1에 가까울 수록 값이 증가할 수록 다른 값이 증가
-1에 가까울 수록 값이 감소할 수록 다른 값이 감소
import numpy as np
def get_outlier(df=None, column=None, weight=1.5):
# fraud에 해당하는 column 데이터만 추출, 1/4 분위와 3/4 분위 지점을 np.percentile로 구함.
fraud = df[df['Class']==1][column]
quantile_25 = np.percentile(fraud.values, 25)
quantile_75 = np.percentile(fraud.values, 75)
# IQR을 구하고, IQR에 1.5를 곱하여 최대값과 최소값 지점 구함.
iqr = quantile_75 - quantile_25
iqr_weight = iqr * weight
lowest_val = quantile_25 - iqr_weight
highest_val = quantile_75 + iqr_weight
# 최대값 보다 크거나, 최소값 보다 작은 값을 아웃라이어로 설정하고 DataFrame index 반환.
outlier_index = fraud[(fraud < lowest_val) | (fraud > highest_val)].index
return outlier_index
weight=1.5
quantile_25 = np.percentile(fraud.values, 25) -> 25%해당하는 값 가져오기
quantile_75 = np.percentile(fraud.values, 75) -> 75%해당하는 값 가져오기
경계밖에 있는 값을 찾는다.
fraud < lowest_val : 아래쪽 경계값에 있는 값
fraud > highest_val : 위쪽 경계값에 있는 값
outlier_index = get_outlier(df=card_df, column='V14', weight=1.5)
print('이상치 데이터 인덱스:', outlier_index)
이상치 데이터 인덱스: Int64Index([8296, 8615, 9035, 9252], dtype='int64')
# get_processed_df( )를 로그 변환 후 V14 피처의 이상치 데이터를 삭제하는 로직으로 변경.
def get_preprocessed_df(df=None):
df_copy = df.copy()
amount_n = np.log1p(df_copy['Amount'])
df_copy.insert(0, 'Amount_Scaled', amount_n)
df_copy.drop(['Time','Amount'], axis=1, inplace=True)
# 이상치 데이터 삭제하는 로직 추가
outlier_index = get_outlier(df=df_copy, column='V14', weight=1.5)
df_copy.drop(outlier_index, axis=0, inplace=True)
return df_copy
X_train, X_test, y_train, y_test = get_train_test_dataset(card_df)
print('### 로지스틱 회귀 예측 성능 ###')
get_model_train_eval(lr_clf, ftr_train=X_train, ftr_test=X_test, tgt_train=y_train, tgt_test=y_test)
print('### LightGBM 예측 성능 ###')
get_model_train_eval(lgbm_clf, ftr_train=X_train, ftr_test=X_test, tgt_train=y_train, tgt_test=y_test)
#이상치 제거가 제일 결과에 영향을 많이 줌
# 오차 행렬 #LogisticRegression
# [[85281 14]
# [ 48 98]]
# 정확도: 0.9993, 정밀도: 0.8750, 재현율: 0.6712, F1: 0.7597, AUC:0.9743
# 오차 행렬 #LGBMClassifier
# [[85290 5]
# [ 25 121]]
# 정확도: 0.9996, 정밀도: 0.9603, 재현율: 0.8288, F1: 0.8897, AUC:0.9780
### 로지스틱 회귀 예측 성능 ###
오차 행렬
[[85281 14]
[ 48 98]]
정확도: 0.9993, 정밀도: 0.8750, 재현율: 0.6712, F1: 0.7597, AUC:0.9743
### LightGBM 예측 성능 ###
오차 행렬
[[85290 5]
[ 25 121]]
정확도: 0.9996, 정밀도: 0.9603, 재현율: 0.8288, F1: 0.8897, AUC:0.9780
#conda install -c conda-forge imbalanced-learn
from imblearn.over_sampling import SMOTE
smote = SMOTE(random_state=0) #개체생성
X_train_over, y_train_over = smote.fit_resample(X_train, y_train) #변경은 fit, transform #fit_resample : 외부에 있어서
print('SMOTE 적용 전 학습용 피처/레이블 데이터 세트: ', X_train.shape, y_train.shape)
print('SMOTE 적용 후 학습용 피처/레이블 데이터 세트: ', X_train_over.shape, y_train_over.shape)
print('SMOTE 적용 후 레이블 값 분포: \n', pd.Series(y_train_over).value_counts())
SMOTE 적용 전 학습용 피처/레이블 데이터 세트: (199362, 29) (199362,)
SMOTE 적용 후 학습용 피처/레이블 데이터 세트: (398040, 29) (398040,)
SMOTE 적용 후 레이블 값 분포:
0 199020
1 199020
Name: Class, dtype: int64
lr_clf = LogisticRegression()
# ftr_train과 tgt_train 인자값이 SMOTE 증식된 X_train_over와 y_train_over로 변경됨에 유의
get_model_train_eval(lr_clf, ftr_train=X_train_over, ftr_test=X_test, tgt_train=y_train_over, tgt_test=y_test)
# 오차 행렬 #LogisticRegression
# [[85281 14]
# [ 48 98]]
# 정확도: 0.9993, 정밀도: 0.8750, 재현율: 0.6712, F1: 0.7597, AUC:0.9743
오차 행렬
[[82937 2358]
[ 11 135]]
정확도: 0.9723, 정밀도: 0.0542, 재현율: 0.9247, F1: 0.1023, AUC:0.9737
import matplotlib.pyplot as plt #정밀도와 재현율 cross되는 커브
import matplotlib.ticker as ticker
from sklearn.metrics import precision_recall_curve
%matplotlib inline
def precision_recall_curve_plot(y_test , pred_proba_c1):
# threshold ndarray와 이 threshold에 따른 정밀도, 재현율 ndarray 추출.
precisions, recalls, thresholds = precision_recall_curve( y_test, pred_proba_c1)
# X축을 threshold값으로, Y축은 정밀도, 재현율 값으로 각각 Plot 수행. 정밀도는 점선으로 표시
plt.figure(figsize=(8,6))
threshold_boundary = thresholds.shape[0]
plt.plot(thresholds, precisions[0:threshold_boundary], linestyle='--', label='precision')
plt.plot(thresholds, recalls[0:threshold_boundary],label='recall')
# threshold 값 X 축의 Scale을 0.1 단위로 변경
start, end = plt.xlim()
plt.xticks(np.round(np.arange(start, end, 0.1),2))
# x축, y축 label과 legend, 그리고 grid 설정
plt.xlabel('Threshold value'); plt.ylabel('Precision and Recall value')
plt.legend(); plt.grid()
plt.show()
precision_recall_curve_plot( y_test, lr_clf.predict_proba(X_test)[:, 1] )
lgbm_clf = LGBMClassifier(n_estimators=1000, num_leaves=64, n_jobs=-1, boost_from_average=False)
get_model_train_eval(lgbm_clf, ftr_train=X_train_over, ftr_test=X_test,
tgt_train=y_train_over, tgt_test=y_test)
# 오차 행렬 #LGBMClassifier
# [[85290 5]
# [ 25 121]]
# 정확도: 0.9996, 정밀도: 0.9603, 재현율: 0.8288, F1: 0.8897, AUC:0.9780
오차 행렬
[[85283 12]
[ 22 124]]
정확도: 0.9996, 정밀도: 0.9118, 재현율: 0.8493, F1: 0.8794, AUC:0.9814