Pandas: DataFrame 컬럼 길이가 다를 때

calico·2025년 11월 25일

목록 보기

118/184

pandas DataFrame 은 모든 컬럼의 길이가 동일해야만 생성됩니다.

길이가 다르면 에러(ValueError)가 발생합니다.

모델 입력(X)과 라벨(y) 의 개수도 반드시 동일해야 합니다. 다르면 학습 자체가 불가능합니다.

1. DataFrame 컬럼 길이가 다를 때

예를 들어 age 가 8개, income 이 7개라면:

ValueError: arrays must all be same length

즉, DataFrame 조차 만들어지지 않습니다.

예시:

X.shape → (8, 2)
y.shape → (7,)

이렇게 되면 모델에 fit 할 때:

ValueError: Found input variables with inconsistent numbers of samples

이라는 에러가 납니다.

일반적으로 아래 방법 중 하나를 사용합니다.

데이터 길이가 달라진 이유가 결측치 때문이면:

df = df.dropna()
X = df[['age', 'income']]
y = df['label']

예:

df['income'] = df['income'].fillna(df['income'].median())

print(len(df))
print(len(X))
print(len(y))

All views expressed here are solely my own and do not represent those of any affiliated organization.