데이터의 개별 속성 파악하기
1. Survived Column
titanic_df['Survived'].value_counts()
0 549
1 342
Name: Survived, dtype: int64
- 생존자수와 사망자수를
Barplot
으로 그려보기 sns.countplot()
sns.countplot(x = 'Survived', data = titanic_df)
plt.show()

2. Pclass
titanic_df[['Pclass', 'Survived']].groupby(['Pclass']).count()

titanic_df[['Pclass', 'Survived']].groupby(['Pclass']).sum()

titanic_df[['Pclass', 'Survived']].groupby(['Pclass']).mean()

sns.heatmap(titanic_df[['Pclass', 'Survived']].groupby(['Pclass']).mean())
plt.plot()

3. Sex
titanic_df.groupby(['Sex', 'Survived'])['Survived'].count()
Sex Survived
female 0 81
1 233
male 0 468
1 109
Name: Survived, dtype: int64
sns.catplot
을 이용한 성별에 다른 생존자 비율 시각화
sns.catplot(x = 'Sex', col = 'Survived', kind = 'count', data = titanic_df)
plt.show()

4. Age
titanic_df.describe()['Age']
count 714.000000
mean 29.699118
std 14.526497
min 0.420000
25% 20.125000
50% 28.000000
75% 38.000000
max 80.000000
Name: Age, dtype: float64
fig, ax = plt.subplots(1,1,figsize = (10,5))
sns.kdeplot(x = titanic_df[titanic_df.Survived == 1]['Age'], ax = ax)
sns.kdeplot(x = titanic_df[titanic_df.Survived == 0]['Age'], ax = ax)
plt.legend(['Survived', 'Dead'])
plt.show()

Appendix 1. Sex + Pclass vs Survived
- 성별에 따른 객실 등급별 생존자 수 추이 파악
sns.catplot(x = 'Pclass', y = 'Survived', hue = 'Sex', kind = 'point', data = titanic_df)
plt.show()

Appendix 2. Age + Pclass
titanic_df['Age'][titanic_df.Pclass == 1]. plot(kind = 'kde')
titanic_df['Age'][titanic_df.Pclass == 2]. plot(kind = 'kde')
titanic_df['Age'][titanic_df.Pclass == 3]. plot(kind = 'kde')
plt.legend(['1st class', '2nd class', '3rd class'])
plt.show()

