241025 TIL NLTK

윤수용·2024년 10월 25일

TIL 스파르타코딩클럽

TIL

목록 보기

39/113

NLTK (Natural Language Toolkit)

NLTK

자연어 처리를 위한 파이썬 라이브러리

주요 기능

텍스트 토큰화 (Tokenization): 문장을 토큰으로 나누어 해석

from nltk.tokenize import word_tokenize

text = "This is an example sentence."
tokens = word_tokenized(text)
print(tokens)   # ['This', 'is', 'an', 'example', 'sentence', '.']

불용어 제거 (Stopwords Removal): 의미가 적은 단어 필터링

from nltk.corpus import stopwords

stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word.lower() not in stop_words]
print(filtered_tokens)  # ['example', 'sentence', '.']

어간 추출 및 표제어 추출 (Stemming and Lemmatization): 단어의 원형 또는 어근으로 변환
- Stemming: 단순히 접미사 제거
- Lemmatization: 문법적 의미를 고려한 원형 추출

윤수용

잘 먹고 잘 살자

이전 포스트

241021 TIL Github로 팀프로젝트 하는 법

다음 포스트

241025 TIL NLTK

TIL

NLTK (Natural Language Toolkit)

NLTK

주요 기능

241021 TIL Github로 팀프로젝트 하는 법

241031 TIL Hugging Face

0개의 댓글

관련 채용 정보