UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 107692: invalid continuation byte

Sejin Jeong·2024년 4월 21일

error

목록 보기
1/2
post-thumbnail

[해결 전]
197 server 에서,
VSCODE 에서,
jeongsj@ubuntu:~$ pip install simpletransformers
jeongsj@ubuntu:~$ !wget https://raw.githubusercontent.com/korquad/korquad.github.io/master/dataset/KorQuAD_v1.0_train.json -O KorQuAD_v1.0_train.json
jeongsj@ubuntu:~$ !wget https://raw.githubusercontent.com/korquad/korquad.github.io/master/dataset/KorQuAD_v1.0_dev.json -O KorQuAD_v1.0_dev.json

# python_file.py
import json 
with open('KorQuAD_v1.0_train.json', 'r') as f:
    train_data = json.load(f) 
train_data = [item for topic in train_data['data'] for item in topic['paragraphs'] ]

print(train_data[0:10])

jeongsj@ubuntu:~$ python python_file.py

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 107692: invalid continuation byte

[원인]
jupyter notebook(command 환경)과 vscode(prompt 환경)의 명령어 차이
jupyter notebook: !(느낌표)를 붙인다.
vscode: !(느낌표)를 붙이지 않는다.

[해결 방법]
197 server 에서,
VSCODE 에서,
jeongsj@ubuntu:~$ pip install simpletransformers
jeongsj@ubuntu:~$ wget https://raw.githubusercontent.com/korquad/korquad.github.io/master/dataset/KorQuAD_v1.0_train.json -O KorQuAD_v1.0_train.json
('-O KorQuAD_v1.0_train.json' 생략 가능)
jeongsj@ubuntu:~$ wget https://raw.githubusercontent.com/korquad/korquad.github.io/master/dataset/KorQuAD_v1.0_dev.json -O KorQuAD_v1.0_dev.json
('-O KorQuAD_v1.0_dev.json' 생략 가능)

# python_file.py
import json 
with open('KorQuAD_v1.0_train.json', 'r') as f:
    train_data = json.load(f) 
train_data = [item for topic in train_data['data'] for item in topic['paragraphs'] ]

print(train_data[0:10])

jeongsj@ubuntu:~$ python python_file.py

[해결 후]

profile
Soli Deo Gloria. / Sapere Aude.

0개의 댓글