๐Ÿ“•Week2 day4(์›น์Šคํฌ๋ž˜ํ•‘ ๊ธฐ์ดˆ)

๋ฐ•์ค€ํฌยท2023๋…„ 8์›” 31์ผ

ํ”„๋กœ๊ทธ๋ž˜๋จธ์Šค

๋ชฉ๋ก ๋ณด๊ธฐ
11/28
post-thumbnail

์‹œ๊ฐํ™”๋กœ ๊ฒฐ๊ณผ ์š”์•ฝํ•˜๊ธฐ -Seaborn


์‹œ๊ฐํ™” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ seaborn

seaborn์€ ํŒŒ์ด์ฌ์˜ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ˆ˜๋ คํ•œ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๊บพ์€์„  ๊ทธ๋ž˜ํ”„(Line Plot)
    ๋‘ ๋ณ€์ˆ˜์˜ ๊ฐ’์— ๋”ฐ๋ฅธ ์ถ”์ด๋ฅผ ์„ ์œผ๋กœ ์ด์€ ๊ทธ๋ž˜ํ”„์ž…๋‹ˆ๋‹ค. .lineplot()๋ฅผ ์ด์šฉํ•ด์„œ ์ด๋ฅผ ๊ทธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
sns.lineplot(x=[1, 3, 2, 4], y=[4, 3, 2, 1])

์œ„์˜ ๋ช…๋ น์–ด๋ฅผ ์ž‘์„ฑํ•˜๋ฉด ๋ฐ‘์— ๊ทธ๋ž˜ํ”„๊ฐ€ ๋‚˜์˜ต๋‹ˆ๋‹ค.

  • ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„(Bar Plot)
    ๋ฒ”์ฃผํ˜• ๋ฐ์ดํ„ฐ์˜ "๊ฐ’"๊ณผ ๊ทธ ๊ฐ’์˜ ํฌ๊ธฐ๋ฅผ ์ง์‚ฌ๊ฐํ˜•์œผ๋กœ ๋‚˜ํƒ€๋‚ธ ๊ทธ๋ฆผ์ž…๋‹ˆ๋‹ค.
    .bar()๋ฅผ ์ด์šฉํ•ด์„œ ์ด๋ฅผ ๊ทธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
sns.barplot(x=[1,2,3,4],y=[0.7,0.2,0.1,0.05])

Plot์˜ ์†์„ฑ


seaborn์€ ํŒŒ์ด์ฌ์˜ ์‹œ๊ฐํ™” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ matplotlib์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋งŒ๋“ค์–ด์กŒ์Šต๋‹ˆ๋‹ค.
matplotlib.pyplot์˜ ์†์„ฑ์„ ๋ณ€๊ฒฝํ•ด์„œ ๊ทธ๋ž˜ํ”„์— ๋‹ค์–‘ํ•œ ์š”์†Œ๋ฅผ ๋ณ€๊ฒฝ/์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

import matplotlib.pyplot as plt

#plt.figure(figsize = (x, y)) : ๊ทธ๋ž˜ํ”„์˜ ํฌ๊ธฐ ๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
plt.figure(figsize = (20,10))# ์•ž์—๋‹ค ์ ์–ด์ค˜์•ผ๋จ

# plt.title() : ๊ทธ๋ž˜ํ”„์— ์ œ๋ชฉ ์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค
sns.barplot(x=[1,2,3,4],y=[0.7,0.2,0.1,0.05])
plt.title("Bar Plot")

# plt._label() : ๊ทธ๋ž˜ํ”„์˜ ์ถ•์— ์„ค๋ช… ์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค
plt.xlabel("this is x-label")
plt.ylabel("this is y-label")

#plt._lim() : ๊ทธ๋ž˜ํ”„์˜ ์ถ•์˜ ๋ฒ”์œ„ ๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
plt.ylim(2,3)

Wordcloud


wordcloud ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
wordcloud๋Š” ํŒŒ์ด์ฌ์˜ ํ…์ŠคํŠธ ํด๋ผ์šฐ๋“œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ…์ŠคํŠธ ๊ตฌ๋ฆ„์„ ๊ทธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. konlpy๋Š” ํ•œ๊ตญ์–ด ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ, ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์—์„œ ๋ช…์‚ฌ ๋“ฑ์„ ๋ฝ‘์•„ ๋‚ด๋Š” ๋ฐ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

WordCloud๋ฅผ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  1. KoNLPy ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ ํ•œ๊ตญ์–ด ๋ฌธ์žฅ์„ ์ „์ฒ˜๋ฆฌ
  2. Counter๋ฅผ ์ด์šฉํ•ด ๋นˆ๋„์ˆ˜ ์ธก์ •
  3. WordCloud๋ฅผ ์ด์šฉํ•ด ์‹œ๊ฐํ™”

# ์‹œ๊ฐํ™”์— ์“ฐ์ด๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
import matplotlib.pyplot as plt
from wordcloud import WordCloud

# ํšŸ์ˆ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋”•์…”๋„ˆ๋ฆฌ ์ƒ์„ฑ
from collections import Counter

# ๋ฌธ์žฅ์—์„œ ๋ช…์‚ฌ๋ฅผ ์ถ”์ถœํ•˜๋Š” ํ˜•ํƒœ์†Œ ๋ถ„์„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
from konlpy.tag import Hannanum

national_anthem ="์›Œ๋“œํด๋ผ์šฐ๋“œ ๋งŒ๋“œ๋Š”๋ฐ ์‚ฌ์šฉํ•  ๋ฌธ์žฅ"

# Hannanum ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•œ ํ›„, .nouns()๋ฅผ ํ†ตํ•ด ๋ช…์‚ฌ๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.

hannanum = Hannanum()
nouns = hannanum.nouns(national_anthem)
words = [noun for noun in nouns if len(noun) > 1]

counter = Counter(words)

wordcloud = WordCloud(
    font_path = "C:/Windows/Fonts/gulim.ttc",
    background_color = "white",
    width = 1000,
    height = 1000,
)

img = wordcloud.generate_from_frequencies(counter)
plt.imshow(img)

์• ๊ตญ๊ฐ€ ๊ฐ€์‚ฌ๋ฅผ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ ๋ฐ‘์—์™€ ๊ฐ™์€ ๊ทธ๋ฆผ์ด ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.

๋‹ค์Œ์€ ์ง์ ‘ ์›น์‚ฌ์ดํŠธ๋ฅผ ์Šคํฌ๋ž˜ํ•‘ํ•ด์„œ ๊ทธ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์‹œ๊ฐํ™”๋ฅผ ํ•ด๋ณด์•˜๋‹ค.

for i in range(1,6):
    res = requests.get("https://hashcode.co.kr/?page={}".format(i), {"User-Agent":user_agent})
    soup = BeautifulSoup(res.text,"html.parser")

    parsed_datas = soup.find_all("li", "question-list-item")
    for data in parsed_datas:
        questions.append(data.h4.text.strip())
    time.sleep(0.5)

words = []
hannanum = Hannanum()

for question in questions:
    nouns = hannanum.nouns(question)#1๋ฒˆ ๋ฐ˜๋ณตํ•  ๋•Œ ๋ช…์‚ฌ
    words += nouns#๋ˆ„์  ๋ช…์‚ฌ

counter = Counter(words)
wordcloud = WordCloud(
    font_path = "C:/Windows/Fonts/gulim.ttc",
    background_color = "white",
    width = 1000,
    height = 1000,
)

img = wordcloud.generate_from_frequencies(counter)
plt.imshow(img)


๐Ÿ’ก์‹œ๊ฐํ™” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๊ธ€์ž์˜ ๋นˆ๋„๋ฅผ ๋‚˜ํƒ€๋‚ด ์ฃผ๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ถœ๋ ฅํ•ด๋ณด์•˜๊ณ , ๋” ๋‚˜์•„๊ฐ€์„œ ์ง์ ‘ ํฌ๋กค๋งํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์‹œ๊ฐํ™” ํ•ด๋ณด์•˜๋‹ค.

profile
๊ฒŒ์„๋ €๋˜ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๊ณต๋ถ€

0๊ฐœ์˜ ๋Œ“๊ธ€