https://www.kaggle.com/competitions/titanic/overview
survival: 생존 유무, target 값 (0: 사망, 1: 생존)
pclass: 티켓 클래스 (1: 1st, 2: 2nd, 3: 3rd)
sex: 성별
age: 나이
sibsp: sibiling + spouse (=> 함께 탑승한 형제자매, 배우자 수 총합)
parch: parent + children (=> 함께 탑승한 부모, 자녀 수 총합)
ticket: ticket number
fare: 탑승 요금
cabin: 객실 번호
embarked: 탑승 항구
from google.colab import files
myfile = files.upload()
Saving test.csv to test.csv
Saving train.csv to train.csv
import pandas as pd
import numpy as np
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
train.head()
test.head()
train.shape #데이터의 행, 열 값 출력
test.shape
train.isnull() #결측치 값 유무 확인 (False: 결측치값x, True: 결측치값o)
train.isnull().sum() #결측치값 쉽게 알아볼 수 있음
test.isnull().sum()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set() #setting seaborn default for plots
=> %matplotlib inline: 시각화 한 결과를 바로 볼 수 있게 해줌
def bar_chart(feature):
survived = train[train['Survived'] == 1][feature].value_counts()
dead = train[train['Survived'] == 0][feature].value_counts()
df = pd.DataFrame([survived, dead])
df.index = ['Survived', 'Dead']
df.plot(kind = 'bar', stacked = True, figsize = (10,5))
bar_chart('Sex')
bar_chart('Pclass')
bar_chart('SibSp')
bar_chart('Parch')
bar_chart('Embarked')