시리즈 or 데이터프레임의 데이터 중에서 특정 조건식을 만족하는 원소만 따로 추출하는 개념
데이터프레임의 불린 인덱싱 : DataFrame 객체[불린 시리즈]
ex) 6.9_filter_boolean.py
import seaborn as sns
titanic = sns.load_dataset('titanic')
# 10<=나이<20 데이터
mask = (titanic.age >= 10) & (titanic.age <20)
df_teenage = titanic.loc[mask,:]
print(df_teenage)
# 나이<10, 성별=여자 인데이터
mask2 = (titanic.age<10) & (titanic.sex =='female')
df_female_under10 = titanic.loc[mask2,:]
print(df_female_under10)
# 나이<10 or 나이>=60인 데이터중에서 age,sex,alone 열만선택
mask3 = (titanic.age<10) | (titanic.age >=60)
df_under10_morethan60 = titanic.loc[mask3,['age','sex','alone']]
print(df_under10_morethan60)
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone 9 1 2 female 14.0 1 0 30.0708 C Second child False NaN Cherbourg yes False 14 0 3 female 14.0 0 0 7.8542 S Third child False NaN Southampton no True 22 1 3 female 15.0 0 0 8.0292 Q Third child False NaN Queenstown yes True 27 0 1 male 19.0 3 2 263.0000 S First man True C Southampton no False 38 0 3 female 18.0 2 0 18.0000 S Third woman False NaN Southampton no False survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone 10 1 3 female 4.0 1 1 16.7000 S Third child False G Southampton yes False 24 0 3 female 8.0 3 1 21.0750 S Third child False NaN Southampton no False 43 1 2 female 3.0 1 2 41.5792 C Second child False NaN Cherbourg yes False 58 1 2 female 5.0 1 2 27.7500 S Second child False NaN Southampton yes False 119 0 3 female 2.0 4 2 31.2750 S Third child False NaN Southampton no False age sex alone 7 2.00 male False 10 4.00 female False 16 2.00 male False 24 8.00 female False 33 66.00 male True .. ... ... ... 831 0.83 male False 850 4.00 male False 851 74.00 male True 852 9.00 female False 869 4.00 male False
isin()메소드를 활용한 필터링 : DataFrame의 열 객체.isin(추출 값의 리스트)
import pandas as pd
import seaborn as sns
titanic = sns.load_dataset('titanic')
# IPython 디스플레이 설정 변경 - 출력할 최대 열의 개수
pd.set_option('display.max_columns',10)
mask3 = titanic['sibsp'] == 3
mask4 = titanic['sibsp'] == 4
mask5 = titanic['sibsp'] == 5
df_boolean = titanic[mask3|mask4|mask5]
print(df_boolean.head())
# isin() 메소드 활용한 필터링
isin_filter = titanic['sibsp'].isin([3,4,5])
df_isin = titanic[isin_filter]
print(df_isin.head())