100 pandas puzzles No.21 ~ 25 (2)

오유찬·2026년 3월 23일

목록 보기

11/12

21.For each animal type and each number of visits, find the mean age. In other words, each row is an animal, each column is a number of visits and the values are the mean ages (hint: use a pivot table).

row : animal, column : 방문횟수, 평균나이

pivot table

pandas.pivot_tabe(data 
				, index = None # 각 행은 무엇으로 정의할지
                , columns = None # 각 열 정의
                , values = None # 각 Cell 어떤 숫자로 계산할지
                , aggfunc = 'mean', 'sum', 'nunique', 'std' # 계산 방법
                # 추가 옵션
                , fill_value,  margins, drop_na, margins_name, observed
)

pivot_table로 table 확인하는데 원하지 않는 데이터가 많을 때, query 함수로 원하는 데이터만 조회할 수 있다.
query('컬럼명 == 원하는 조건')
활용 예시

r1.query('age == [ 2, 3]').pivot_table(index = ['animal'], columns = ['visits'], values = ['age'], aggfunc = [np.mean])

answer

df.pivot_table(
    index='animal', columns = 'visits', values = 'age', aggfunc = 'mean')

22. 중복되지 않는 값 출력하기?

df.drop_duplicates(subset=None, keep='first', inplace=False, ignore_index=False)
Return DataFrame with duplicates rows removed.
keep : determines which duplicates to keep (first, last, false - drop all)
subset : only consider certain columns for identifying duplicates

answer

# 01. drop_duplicates
df.drop_duplicates(subset='A')

# 02. loc 함수, shift 함수
df.loc[df['A'].shift() != df['A']]

02번 풀이가 뭔가 싶을 수 있는데, shift()의 period defalt 값이 1로 shift 함수는 값을 한 칸씩 아래로 미는 함수다. 1칸씩 밑으로 밀었을 때, 서로 다른 값들이 있는 행만 선택해서 출력하면 중복값이 필터링된다. 이는 값들이 정렬되어 있기에 가능한 풀이다.

23. given a dataframe of numeric values, how do you subtract the row mean from each element in the row?

행의 각 요소에 행 평균값을 빼봐라

df = pd.DataFrame(np.random.random(size=(5,3)))

df.sub((df.mean(axis = 1) , axis = 0))

혹시 axis가 헷갈리다면

axis = 0(default) : 행을 따라 아래로 계산(위에서 아래)
axis = 1 : 열을 따라 옆으로 계산(왼쪽에서 오른쪽으로)

24. suppose you have dataframe with 10 columns of real numbers. which column of numbers has the smallest sum? return that column's label.

# 합이 가장 작은 컬럼의 값 
df.sum(axis = 0).min(axis = 0)

# 01. loc으로 위에서 찾은 값을 조건으로 설정하여 column label 불러올 수 있다.
df.loc[:, df.sum() == df.sum().min()].columns[0]

# 02. idxmin()
df.sum().idxmin()

25. how do you count how many unique rows a DataFrame has (i.e. ignore all rows that are duplicates)?

unique한 행의 개수?

len(df.drop_duplicates(keep=False))

unique한 행을 찾는 것이기 때문에 중복되는 행은 전부 삭제해야 한다. -> keep=False
len()을 DataFrame에 사용하면 df의 행 개수를 반환한다.
df.shape[0]을 사용해서 행 개수를 구할 수도 있다.

오유찬

열심히 하면 재밌다

이전 포스트

100-pandas-puzzles(NO.01~20)

다음 포스트

100 pandas puzzles No.21 ~ 25 (2)

pandas

21.For each animal type and each number of visits, find the mean age. In other words, each row is an animal, each column is a number of visits and the values are the mean ages (hint: use a pivot table).

pivot table

answer

22. 중복되지 않는 값 출력하기?

23. given a dataframe of numeric values, how do you subtract the row mean from each element in the row?

24. suppose you have dataframe with 10 columns of real numbers. which column of numbers has the smallest sum? return that column's label.

25. how do you count how many unique rows a DataFrame has (i.e. ignore all rows that are duplicates)?

100-pandas-puzzles(NO.01~20)

100 padas puzzles - No. 26~27(3)

0개의 댓글