
Masked Language Modeling(MLM)
Transformer 라이브러리에서 지원하는 pipeline을 사용합니다.
:github에 전체 코드가 있습니다.
git clone https://github.com/MachuEngine/BERT-TextAnalysis.git
def fill_mask():
"""
마스킹된 토큰을 예측하는 언어 모델(BERT) 데모
- bert-base-multilingual-cased 모델 사용
"""
mask_filler = pipeline("fill-mask", model="bert-base-multilingual-cased")
masked_text = "I drank [MASK] today."
predictions = mask_filler(masked_text)
print(f"Input: {masked_text}")
print("Predictions:")
for pred in predictions:
print(
f"- {pred['sequence']} "
f"(score={pred['score']:.4f}, token={pred['token_str']})"
)
Input: I drank [MASK] today.
Predictions:
- I drank it today. (score=0.1909, token=it)
- I drank you today. (score=0.0367, token=you)
- I drank things today. (score=0.0223, token=things)
- I drank water today. (score=0.0188, token=water)
- I drank in today. (score=0.0178, token=in)