sklearn.model_selection.train_test_split

SeungHyun·2023년 12월 15일

목록 보기

1/1

아래 내용은 sklearn 1.3.2 버전으로 작성됨.

0. 바로 사용하기

X_train, X_test, y_train, y_test = train_test_split(X, 
													y, 
													test_size=0.3, 
													random_state=42,)

X: train_data
y: target_data
test_size: 원본 데이터에서 test data가 될 비율
random_state: 동일한 난수 시퀀스를 생성하기위한 random seed

1. 기본형

X_train, X_test, y_train, y_test = train_test_split(X, 
													y, 
													test_size=0.3,
                                                    train_size=0.7,
													random_state=42,
                                                    shuffle=True,
                                                    stratify=y)

2. 기능

데이터를 입력받아 train - test 데이터로 나누어줌.

3. 파라미터

X

train-test로 나눌 데이터
- target데이터를 포함한 하나의 array 객체로 전달 가능

y

train-test로 나눌 데이터의 target 데이터

test_size

test_size = 0.25: test data가 될 비율
default: 0.25

train_size

train_size = 0.75: train data가 될 비율
defalut: 1 - test_size
보통 test_size만 사용

random_state

random_state = 42: 동일한 난수 시퀀스를 생성하기 위한 random seed

shuffle

shuffle = True: 데이터를 split 하기 이전에 섞음
default: True

stratify

stratify = y: 원본 데이터 y의 class 비율을 train-test에도 적용하여 나눔
train과 test의 y 클래스 비율이 동일
shuffle = True일 경우 stratify 파라미터 설정 불가

ref

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn-model-selection-train-test-split

SeungHyun

어디로 가야하오

sklearn.model_selection.train_test_split

sklearn

아래 내용은 sklearn 1.3.2 버전으로 작성됨.

0. 바로 사용하기

1. 기본형

2. 기능

3. 파라미터

X

y

test_size

train_size

random_state

shuffle

stratify

ref

0개의 댓글