[22.11.26]비전공자의 빅데이터분석기사 실기 일주일 안에 완성하기(3)

jinz.develog·2022년 11월 26일

🔍참고영상
https://www.inflearn.com/course/%EB%B9%85%EB%B6%84%EA%B8%B0-%EC%8B%9C%ED%97%98%EC%8B%A4%EA%B8%B0-%ED%8C%8C%EC%9D%B4%EC%8D%AC

🔍코드 리뷰
https://colab.research.google.com/drive/1QRT-rhxyQfBclT1hikOf7pT50sxUJNQ-?usp=sharing

📢실기 위주로 공부하기!📢

📌작업유형1 기초-데이터전처리(2)

#quantile
✔데이터 이상치 구하기

import seaborn as sns
sns.get_dataset_names()
df = sns.load_dataset('planets')
print(df.head())
#정상: Q25-IQR*1.5 ~ Q75+IQR*1.5
import seaborn as sns
sns.get_dataset_names()
df = sns.load_dataset('planets')
print(df.head())
#정상: Q25-IQR*1.5 ~ Q75+IQR*1.5

            method  number  orbital_period   mass  distance  year
0  Radial Velocity       1         269.300   7.10     77.40  2006
1  Radial Velocity       1         874.774   2.21     56.95  2008
2  Radial Velocity       1         763.000   2.60     19.84  2011
3  Radial Velocity       1         326.030  19.40    110.62  2007
4  Radial Velocity       1         516.220  10.50    119.47  2009

Q25 = df['orbital_period'].quantile(0.25)
Q75 = df['orbital_period'].quantile(0.75)
IQR = Q75 - Q25 
min = Q25-IQR*1.5 
max = Q75+IQR*1.5
df_outlier = df[(df['orbital_period']<=min) | (df['orbital_period']>=max)]
print(df_outlier)
print(Q25)

                        method  number  orbital_period  mass  distance  year
6              Radial Velocity       1          1773.4  4.64     18.15  2002
14             Radial Velocity       3          2391.0  0.54     14.08  2001
15             Radial Velocity       3         14002.0  1.64     14.08  2009
19             Radial Velocity       5          4909.0  3.53     12.53  2002
32   Eclipse Timing Variations       1         10220.0  6.05       NaN  2009
..                         ...     ...             ...   ...       ...   ...
920               Microlensing       1          3500.0   NaN       NaN  2005
921               Microlensing       2          1825.0   NaN       NaN  2008
922               Microlensing       2          5100.0   NaN       NaN  2008
937                    Imaging       1        730000.0   NaN       NaN  2006
944              Pulsar Timing       1         36525.0   NaN       NaN  2003

[126 rows x 6 columns]
5.4425405

0개의 댓글