W2

ewillwin·2023년 4월 15일

Artificial Intelligence Project

목록 보기

1/6

train.corr()['Survived']

'Survived' Column과 다른 Column들 간의 상관관계
1에 가까울 수록 강한 양의 상관관계, -1에 가까울 수록 강한 음의 상관관계, 0에 가까울수록 두 변수 간에는 상관관계가 없음

# transform numeric variable age to categorical variable 
bins = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
labels = ['<10', '<20', '<30', '<40', '<50', '<60', '<70', '<80', '<90']
train['Age_cat'] = pd.cut(train['Age'], bins, right=False, labels=labels)

numeric 변수를 categorical 변수로 변환

# aggregate how many survived passenger were according to sex and ages
train.groupby(['Sex', 'Age_cat'])['Survived'].sum().unstack('Age_cat')

unstack()은 pivot 기능을 함
'Age_cat' Column이 열로 들어감 -> 2차원 데이터 형태로 변환됨

# aggregate how many survived passenger were according to sex and ages
train.groupby(['Sex', 'Age_cat'])['Survived'].sum().unstack('Age_cat')

# aggregate how many passenger were according to sex and ages
train.groupby(['Sex', 'Age_cat'])['Survived'].count().unstack('Age_cat')

# Ratio =  Number of survived paseengers / Number of paseengers
round(train.groupby(['Sex', 'Age_cat'])['Survived'].sum().unstack('Age_cat')/train.groupby(['Sex', 'Age_cat'])['Survived'].count().unstack('Age_cat'), 2)

'Sex', 'Age'별 생존자의 수를 전체 passenger의 수로 나누어 'Sex', 'Age'별 생존자의 비율을 표시

ewillwin

💼 Software Engineer @ LG Electronics | 🎓 SungKyunKwan Univ. CSE

다음 포스트

W2

Artificial Intelligence Project

W3

0개의 댓글