[PyTorch] 튜토리얼 (2)

rkqhwkrn·2023년 8월 8일
0

Python

목록 보기
11/13

PyTorch torchtext 이용하기

Torchtext 란?

  • PyTorch에서 제공하는 라이브러리
  • NLP만을 위한 dataloader로 자연어나 텍스트 처리를 더 간단하고 쉽게 수행하도록 도움을 줌

Torchtext 시작하기

1) 환경구성

  • Google Colab에서 실습을 진행함
  • Torchtext의 데이터셋 (ex WikiText-2) 을 불러오기 위해서는 torchdata를 설치해야 함
  • 이때, torch 버전, torchtext 버전, torchdata 버전이 호환되어야 함
  • 이외에도 부가적인 모듈을 설치해줘야 함
pip install torch==1.12.0
pip install folium==0.2.1
pip install torchdata==0.4.0
pip install torchtext==0.13.0
pip install 'portalocker>=2.0.0'
  • 버전 확인 방법
pip show torch
pip show torchdata
pip show torchtext

2) 데이터셋 불러오기

  • WikiText-2 의 데이터셋을 불러옴
  • split 옵션을 이용하여 필요한 부분만 가져올 수 있음
from torchtext.datasets import WikiText2

train = WikiText2(split='train')

for i, text in enumerate(train):
	if i == 5: break
    print(text)

실행결과

 = Valkyria Chronicles III = 

 

 Senjō no Valkyria 3 : <unk> Chronicles ( Japanese : 戦場のヴァルキュリア3 , lit . Valkyria of the Battlefield 3 ) , commonly referred to as Valkyria Chronicles III outside Japan , is a tactical role @-@ playing video game developed by Sega and Media.Vision for the PlayStation Portable . Released in January 2011 in Japan , it is the third game in the Valkyria series . <unk> the same fusion of tactical and real @-@ time gameplay as its predecessors , the story runs parallel to the first game and follows the " Nameless " , a penal military unit serving the nation of Gallia during the Second Europan War who perform secret black operations and are pitted against the Imperial unit " <unk> Raven " . 

 The game began development in 2010 , carrying over a large portion of the work done on Valkyria Chronicles II . While it retained the standard features of the series , it also underwent multiple adjustments , such as making the game more <unk> for series newcomers . Character designer <unk> Honjou and composer Hitoshi Sakimoto both returned from previous entries , along with Valkyria Chronicles II director Takeshi Ozawa . A large team of writers handled the script . The game 's opening theme was sung by May 'n . 

다음과 같이 train 데이터셋의 일부가 출력되는 것을 확인할 수 있었으며, unk 토큰은 unknown token을 의미함

0개의 댓글