[파이썬] 예제로 익히는 Python - 1회차 복습(2)

HEY! MIN·2024년 10월 22일

2-5. 테이블 결측치 확인하기

#컬럼별로 결측치(데이터가 없는) 확인하기 
df.isna().sum()
df.isnull().sum() # 동일한 기능

#또 다른 예시
df.isna() #B칼럼에 true로 결측치 있음을 확임
df['B'].isna() #B컬럼만 확인
df['B'].isna() #B컬럼만 확인
df[df['B'].isna()] #B컬럼을 표로 가져와 확인

2-6. 컬럼 가져오기

특정 컬럼 1개 가져오기

#방법1: 속성. 사용
df.Category
#방법2: [] 연산자 사용
df['Category']
#방법3: iloc 사용
# : 은 모든 행을 가져오겠다는 의미이며 dataframe 의 인덱스 번호 4번(카테고리)컬럼을 가져오겠다는 희미
df.iloc[:,4]

특정 컬럼 여러 개 가져오기

#방법1: [[]] 연산자 사용
# []를 하나 사용하면 결과값이 series 형태로 반환되어 key error 가발생되며, [[]] 는 dataframe 으로 반환되어 에러가 나지 않습니다. 
df[['Category','Selling Price']]
#방법2: iloc 사용
# : 은 모든 행을 가져오겠다는 의미이며 dataframe 의 인덱스 번호 4번,7번 컬럼을 가져오겠다는 희미
df.iloc[:,[4,7]]

2-8. 조건에 부합하는 데이터 가져오기

#조건에 만족하는 행은 정상출력 ,아닌 행은 NaN 으로 반환 
df2.where(df2['Age']>50)

# 조건에 부합하는 데이터만 가져오고 싶을 때 
mask = ((df2['Age']>50) & (df2['Gender']=='Male'))

2-9. 테이블 그루핑

[예제1]: df2 테이블을 활용한 gender 기준 customer id count하기

1. SQL 활용
select Gender, count(Customer ID)
from df2
group by Gender

2. Python 활용
df2.groupby('Gender')['Customer ID'].count()

[예제2]: df2 테이블을 활용한 gender,Location 기준 customer id count하기

1. SQL 활용
select Gender, Location, count(Customer ID)
from df2
group by Gender, Location

2. Python 활용
df2.groupby(['Gender','Location'])['Customer ID'].count()

[예제3]: df2 테이블을 활용한 Location 기준 Age distinct count하기

1. SQL 활용
select Location, count(distinct Age)
from df2
group by Location

2. Python 활용
df2.groupby('Location')['Age'].nunique()

[예제4]: df2 테이블을 활용한 Location 기준 Age distinct count 및 정렬

1. SQL 활용
select Location, count(distinct Age) as cnt
from df2
group by Location
order by cnt desc

2. Python 활용
df2.groupby('Location')['Age'].count().sort_values(ascending=False)

HEY! MIN

It's a, it's the Pleasure Shop

이전 포스트

[파이썬] 예제로 익히는 Python - 1회차 복습(1)

다음 포스트