🔍코드 리뷰
https://colab.research.google.com/drive/1QRT-rhxyQfBclT1hikOf7pT50sxUJNQ-?usp=sharing
📢실기 위주로 공부하기!📢
📌작업유형1 기초-데이터전처리(2)
#quantile
✔데이터 이상치 구하기
import seaborn as sns
sns.get_dataset_names()
df = sns.load_dataset('planets')
print(df.head())
#정상: Q25-IQR*1.5 ~ Q75+IQR*1.5
import seaborn as sns
sns.get_dataset_names()
df = sns.load_dataset('planets')
print(df.head())
#정상: Q25-IQR*1.5 ~ Q75+IQR*1.5
method number orbital_period mass distance year
0 Radial Velocity 1 269.300 7.10 77.40 2006
1 Radial Velocity 1 874.774 2.21 56.95 2008
2 Radial Velocity 1 763.000 2.60 19.84 2011
3 Radial Velocity 1 326.030 19.40 110.62 2007
4 Radial Velocity 1 516.220 10.50 119.47 2009
Q25 = df['orbital_period'].quantile(0.25)
Q75 = df['orbital_period'].quantile(0.75)
IQR = Q75 - Q25
min = Q25-IQR*1.5
max = Q75+IQR*1.5
df_outlier = df[(df['orbital_period']<=min) | (df['orbital_period']>=max)]
print(df_outlier)
print(Q25)
method number orbital_period mass distance year
6 Radial Velocity 1 1773.4 4.64 18.15 2002
14 Radial Velocity 3 2391.0 0.54 14.08 2001
15 Radial Velocity 3 14002.0 1.64 14.08 2009
19 Radial Velocity 5 4909.0 3.53 12.53 2002
32 Eclipse Timing Variations 1 10220.0 6.05 NaN 2009
.. ... ... ... ... ... ...
920 Microlensing 1 3500.0 NaN NaN 2005
921 Microlensing 2 1825.0 NaN NaN 2008
922 Microlensing 2 5100.0 NaN NaN 2008
937 Imaging 1 730000.0 NaN NaN 2006
944 Pulsar Timing 1 36525.0 NaN NaN 2003
[126 rows x 6 columns]
5.4425405