#10 판다스 DataFrame의 변환, 컬럼 세트 생성/수정, 삭제 및 Index 객체 소개

박수경·2021년 9월 25일

Machine Learning

목록 보기

10/18

변환형태	설명
list -> DataFrame	df_list = pd.DataFrame(list,columns=col_name1) DataFarame 생성 인자로 리스트 객체와 매핑되는 컬럼명들을 입력
ndarray -> DataFrame	df_array2 = pd.DataFrame(array2, columns=col_name2) DataFrame 생성 인자로 ndarray와 매핑되는 컬럼명들을 입력
dict -> DataFrame	dict = {'col1' : [1, 11], 'col2' : [2, 22], 'col3' : [3, 33]} df_dict = pd.DataFrame(dict) 딕셔너리의 Key로 컬럼명을 Value를 리스트 형식으로 입력
DataFrame -> ndarray	DataFrame 객체의 values 속성을 이용하여 ndarray 변환
DataFrame -> list	DataFrame 객체의 values속성을 이용하여 먼저 ndarray로 변환 후 tolist()를 이용하요 list로 변환
DataFrame -> dict	DataFrame 객체의 to_dict()를 이용하여 변환

DataFrame.drop(labels=None, axix=0, index=None, columns=None, level=None, inplace=False, errors='raise')

axis: DataFrame의 로우를 삭제할 때는 axis=0, 컬럼을 삭제할 때는 axis=1으로 설정.
원본 DataFrame은 유지하고 드롭된 DataFrame으 새롭게 객체 변수로 받고 싶다면 inplace=False로 설정(디폴트 값은 False).

ex. titanic_drop_df = titanic_df.drop('Age_0', axis=1, inplace=False)
원본 DataFrame에 드롭된 결과를 적용할 경우에는 inplace=True를 적용.

ex. titanic_df.drop('Age_0', axis=1, inplace=True)
원본 DataFrame에서 드롭된 DataFrame을 다시 원본 DataFrame 객체 변수로 할당하면 원본 DataFrame에서 드롭된 결과를 적용할 경우와 같음.(단, 기존 원본 DataFrame 객체 변수는 메모리에서 추후 제거됨).

ex. titanic_df = titanic_df.drop('Age_0', axis=1, inplace=False)

판다스의 Index 객체는 RDVMS의 PK와 유사하게 DataFrame, Series의 레코드를 고유하게 식별하는 객체.
DataFrame, Series에서 Index 객체만 추출하려면 DataFrame.index 또는 series.index 속성을 통해 가능.
Series객체는 Index객체를 포함하지만 Series객체를 연산 함수를 적용할 때 Index는 연산에서 제외됨. Index는 오직 식별용으로만 사용.
DataFrame 및 Series에 reset_index() 메서드를 수행하면 새롭게 인덱스를 연속 숫자 형으로 할당하며 기존 인덱스는 'index'라는 새로운 컬럼명으로 추가.

유니콘을 위하여