[Python] Pandas에서 데이터 read 시 인코딩 문제 (Pandas read files without encoding errors)

이향기·2021년 9월 9일

신문 기사 데이터를 읽던 중 파일 인코딩 문제 때문에 한글 깨짐이 발생했다.
문제는 행별 데이터마다 인코딩이 조금씩 달라서 어떤 인코딩을 사용하더라도 에러를 피할 수가 없었다는 것이다ㅠ.

그래서 대부분의 행이 어떤 인코딩을 사용하는지 알 수 있다면, 아래와 같이 인코딩 에러를 무시하고 데이터를 읽어들이는 방법이 있다.

아래는 pandas.read_table 공식 문서 의 내용 중 encoding 관련 내용이다.

pandas 최신 버전으로 update하기

-위의 문서에 따르면, pandas의 버전이 1.3.0 이상이여야 지원하는 기능이라고 나온다.

! conda update pandas

아래와 같은 옵션으로 파일 읽기

-encoding_errors = 'igonre' 옵션이 핵심!

data = pd.read_table(os.path.join(data_path, 'THHTSF060H00.dat'), sep = ",", header = None, encoding = "cp949", engine = "python", encoding_errors = 'ignore')

이향기

Data science & Machine learning, baking and reading(≪,≫)

이전 포스트

Docker 사용법 : 기본 명령어 (Introduction to docker) (install docker, pull images, run container)

다음 포스트

[Python] Pandas에서 데이터 read 시 인코딩 문제 (Pandas read files without encoding errors)

pandas 최신 버전으로 update하기

아래와 같은 옵션으로 파일 읽기

Docker 사용법 : 기본 명령어 (Introduction to docker) (install docker, pull images, run container)

[Python] tqdm, enumerate, pandas의 iterrows() 동시에 쓰기

0개의 댓글