Hugging Face: Tasks

InSung-Na·2023년 5월 8일

Part 12. Natural Language Processing

목록 보기

5/6

해당 글은 제로베이스데이터스쿨 학습자료를 참고하여 작성되었습니다

HuggingFace

https://github.com/huggingface/transformers
NLP분야의 스타트업
다양한 트랜스포머 모델(transformer.models)과 학습 스크립트(transformer.Trainer)를 제공하는 모듈
개발자가 자연어 처리 애플리케이션과 서비스를 빠르고 효율적으로 구축하고 배포할 수 있도록 함

Pipeline

특정 작업을 수행하거나 특정 목표를 달성하기 위해 선형 또는 순차적 방식으로 연결된 일련의 프로세스
preprocess -> 모델 -> post process 생성
최초 실행시 모델 다운로드
pipeline(task, model, config, tokenizer, ...)
- task : 내가 원하는 작업(ex. 'sentiment-analysis', 'zero-shot-classification', ...)
- model : 사용모델 (Default=task에 적절한 모델 할당)

!pip install transformers
!pip install datasets

from transformers import pipeline

감정분석

해당 문장이 긍정인지 부정인지 판별

classifier = pipeline('sentiment-analysis')
classifier.model
classifier("I've been waiting for a HuggingFace course my whole life.")
-------------------------------------------------------
[{'label': 'POSITIVE', 'score': 0.9598049521446228}]

Zero-shot 분류

Zero-shot-Learning : 모델이 이전에 본 적이 없는 개체나 개념을 인식하도록 훈련되는 기계 학습
Zero-shot-Classification : 해당 클래스에 대한 명시적인 훈련 없이 이전에 본 적이 없는 클래스 분류

classifier = pipeline("zero-shot-classification")   # Default Model = facebook/bart-large-mnli
classifier(
    "This is a course about the transformers library",          # 문제
    candidate_labels = ["education", "politics", "business"],    # 정답보기
)
-----------------------------------------------------------------------------------------------------------------
{'sequence': 'This is a course about the transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.9192408919334412, 0.060778193175792694, 0.01998087950050831]}

classifier(
    "This is a course about the transformers library",          # 문제
    candidate_labels = ["course", "library", "game", "This"],    # 정답보기
)
----------------------------------------------------------------
{'sequence': 'This is a course about the transformers library',
 'labels': ['course', 'library', 'This', 'game'],
 'scores': [0.732907235622406,
  0.19588284194469452,
  0.06776876002550125,
  0.003441136097535491]}

생성모델

문제로 주어진 문장을 기반으로 그 다음 문장을 생성

generator = pipeline("text-generation")	# Default Model = gpt2
generator("In this course, we will teach you how to ")
----------------------------------------------------------------
[{'generated_text': 'In this course, we will teach you how to \xa0create simple, beautiful, dynamic design diagrams, and how to create them with a variety of basic software tools. We will make use of our favorite tools like Sketch, L.A. Sketch'}]

세부조정

num_return_sequences : 문장갯수
max_length : 문장 길이

generator("In this course, we will teach you how to ", num_return_sequences=5, max_length=20)
---------------------------------------------------------------------
[{'generated_text': 'In this course, we will teach you how to \xa0communicate with fellow listeners.\n2'},
 {'generated_text': 'In this course, we will teach you how to \xa0install the new web application for PHP 5'},
 {'generated_text': 'In this course, we will teach you how to \xa0understand the most basic \xa0of'},
 {'generated_text': 'In this course, we will teach you how to \xa0compete with the enemy. By taking'},
 {'generated_text': 'In this course, we will teach you how to \xa0explicitly use the\xa0cargo'}]

list_ = ["In this course, we will teach you how to ", "This is a course about the transformers library"]

for sentence in list_:
    print(generator(sentence, num_return_sequences=1, max_length=20))
--------------------------------------------------------------------------------------------
[{'generated_text': 'In this course, we will teach you how to \xa0help others get involved in social media.'}]
[{'generated_text': 'This is a course about the transformers library.\n\nThis is part of the Meej'}]

HuggingFace 사이트에 있는 모델사용

사용할 Task, Libraries 등 또는 Filter을 설정하고 원하는 모델을 찾기
모델 포스팅 글에서 사용법 확인하고 적용하기

generator = pipeline("text-generation", model="huggingtweets/dril")
generator("My dream is ", num_return_sequences=5)
-------------------------------------------------------------------------------------------------
[{'generated_text': 'My dream is ive invented. ive invented what is basically the most popular movie ever made and I need over $10,000 to make it go away. Thank you.'},
 {'generated_text': 'My dream is ive been to be the next "Powerball jack"'},
 {'generated_text': 'My dream is ive come to the realization that i have the power of unlimited consciousness. I would get a brain if i could simply convince all the guys in the house where i live to stop smoking pot that i can still tell the difference between a man and a woman and I would become completely Normal'},
 {'generated_text': "My dream is ive gotten over 1000 jobs. That's what my self believes. Ive fucked over 1000 people"},
 {'generated_text': 'My dream is \ue001� That the \ue003� of humanity \ue006nh is To See The \ue006nht That Is As The \ue001� Of My Dream.'}]

사이트에서도 실행가능

Mask Filling

<mask>에 들어갈 단어 맞추기

unmasker = pipeline("fill-mask")	# Default Model = distilroberta-base
unmasker("This coures will teach you all about <mask> models", top_k=5)
---------------------------------------------------------------------------------------------------
[{'score': 0.040895454585552216,
  'token': 745,
  'token_str': ' building',
  'sequence': 'This coures will teach you all about building models'},
 {'score': 0.03127061203122139,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This coures will teach you all about mathematical models'},
 {'score': 0.025371771305799484,
  'token': 774,
  'token_str': ' role',
  'sequence': 'This coures will teach you all about role models'},
 {'score': 0.01844116672873497,
  'token': 265,
  'token_str': ' business',
  'sequence': 'This coures will teach you all about business models'},
 {'score': 0.015211271122097969,
  'token': 3034,
  'token_str': ' computer',
  'sequence': 'This coures will teach you all about computer models'}]

그룹 엔티티

학습되지 않은 단어의 클래스 찾기
Sylvain: Person, Hugging Face: Organization, Brooklyn: Location

ner = pipeline("ner", grouped_entities=True) # Default Model = dbmdz/bert-large-cased-finetuned-conll03-english
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
--------------------------------------------------------------------------------
[{'entity_group': 'PER',
  'score': 0.9981694,
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': 0.9796019,
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': 0.9932106,
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]

QnA

Query text : 나는 집에 들어갔다. 그런데 배고파서 햄버거를 먹었다
Question : 누가 햄버거를 먹었나?
Answer : 나

question_answer = pipeline("question-answering") # Default Model = distilbert-base-cased-distilled-squad
question_answer(
    question="what's my name?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn.",
)
-------------------------------------------
{'score': 0.9988495111465454, 'start': 11, 'end': 18, 'answer': 'Sylvain'}

summary

한계점 : 긴 문장에서 일부분을 추출해서 요약

summarizer = pipeline("summarization") # Default Model = sshleifer/distilbart-cnn-12-6
summarizer(
    """
    National Commercial Bank (NCB), 
    Saudi Arabia’s largest lender by assets, 
    agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.
    NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, 
    valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, 
    at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework 
    agreement in June.The offer is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and 
    about 24% higher than the level the shares traded at before the talks were made public. Bloomberg 
    News first reported the merger discussions.The new bank will have total assets of more than $220 billion, 
    creating the Gulf region’s third-largest lender. The entity’s $46 billion market capitalization nearly matches 
    that of Qatar National Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion of 
    assets.
    """
)
---------------------------------------------------------------------------------------------------------------------------
[{'summary_text': " Saudi Arabia's largest lender National Commercial Bank agrees to buy rival Samba Financial Group for $15 billion . NCB will pay 28.45 riyals ($7.58) for each Samba share, valuing it at about 55.7 billion . The new bank will have total assets of more than $220 billion, creating the Gulf region’s third-largest lender ."}]