๐Ÿ“š ๊ต๋ณด๋ฌธ๊ณ  ๋ฐ์ดํ„ฐ๋กœ ์‹œ์ž‘ํ•˜๋Š” Python ๋ถ„์„ & ์‹œ๊ฐํ™” ํ”„๋กœ์ ํŠธ ๐Ÿ€

๊น€๋™ํ˜ยท2025๋…„ 1์›” 14์ผ
3

๐Ÿ“š ๊ต๋ณด๋ฌธ๊ณ  ๋ฐ์ดํ„ฐ๋กœ ์‹œ์ž‘ํ•˜๋Š” Python ๋ถ„์„ & ์‹œ๊ฐํ™” ํ”„๋กœ์ ํŠธ ๐Ÿ€

์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ๊ต๋ณด๋ฌธ๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด KoNLPy์™€ WordCloud๋ฅผ ํ™œ์šฉํ•ด ํ‚ค์›Œ๋“œ ์ถ”์ถœ๋ถ€ํ„ฐ ์›Œ๋“œํด๋ผ์šฐ๋“œ ์‹œ๊ฐํ™”๊นŒ์ง€ ํ•œ ๋ฒˆ์— ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค!
๐Ÿ’ก NanumGothic ํฐํŠธ๋งŒ ์“ฐ๋ฉด ๊ธ€์ž๊ฐ€ ๊นจ์ง€๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ๋Š”๋ฐ, ๋‹ค๋ฅธ ํฐํŠธ๋ฅผ Fallback(๋Œ€์ฒด ํฐํŠธ)์œผ๋กœ ์„ค์ •ํ•ด์„œ ์—๋Ÿฌ๋ฅผ ์ตœ์†Œํ™”ํ•ด๋ด…์‹œ๋‹ค!


1๏ธโƒฃ ๋ฐ์ดํ„ฐ ์ค€๋น„ํ•˜๊ธฐ ๐Ÿ›๏ธ

๐Ÿ“ท ๊ต๋ณด๋ฌธ๊ณ ์—์„œ ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ

๋จผ์ € ๊ต๋ณด๋ฌธ๊ณ  ์‚ฌ์ดํŠธ์—์„œ ์›ํ•˜๋Š” ์นดํ…Œ๊ณ ๋ฆฌ ๋ฐ์ดํ„ฐ๋ฅผ ์—‘์…€๋กœ ์ถ”์ถœํ–ˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด๋ณผ๊ฒŒ์š”. (์‹ค์ œ๋กœ๋Š” ์›น ํฌ๋กค๋ง, ๋‹ค์šด๋กœ๋“œ, ๋˜๋Š” ์ง์ ‘ ์ •๋ฆฌ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.)

๊ต๋ณด๋ฌธ๊ณ  ์Šคํฌ๋ฆฐ์ƒท
(์œ„ ์ด๋ฏธ์ง€๋Š” ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค. ์‹ค์ œ ๊ต๋ณด๋ฌธ๊ณ  ์‚ฌ์ดํŠธ์—์„œ ๋ฐ์ดํ„ฐ ํ™•์ธ!)

ํ•ด๋‹น ์—‘์…€ ํŒŒ์ผ์„ ํ”„๋กœ์ ํŠธ ํด๋”์— ๋„ฃ์–ด ๊ต๋ณด๋ฌธ๊ณ _์นดํ…Œ๊ณ ๋ฆฌ_์ƒํ’ˆ๋ฆฌ์ŠคํŠธ.xlsx ๋ผ๋Š” ์ด๋ฆ„์œผ๋กœ ์ €์žฅํ•ด๋‘ก๋‹ˆ๋‹ค.


2๏ธโƒฃ ์ฝ”๋“œ ์ „์ฒด ์„ค๋ช… ๐ŸŽ€

์•„๋ž˜๋Š” NanumGothic ํฐํŠธ๋ฅผ ์šฐ์„  ์‚ฌ์šฉํ•˜๋˜, ํฐํŠธ์— ๋ˆ„๋ฝ๋œ ๊ธ€๋ฆฌํ”„๊ฐ€ ์žˆ์œผ๋ฉด ๋‹ค๋ฅธ ํฐํŠธ๋กœ ๋Œ€์ฒดํ•˜๋„๋ก ์„ค์ •ํ•œ ์ „์ฒด ์ฝ”๋“œ์˜ˆ์š”. ์ด๋Ÿฐ ๋ฐฉ์‹์œผ๋กœ "missing from current font" ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

import os
import pandas as pd
from konlpy.tag import Okt
from collections import Counter
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud
import matplotlib.font_manager as fm

# -----------------------------
# 1. ํฐํŠธ ์„ค์ •
# -----------------------------
# (1) NanumGothic ํฐํŠธ (์šฐ์„  ์‚ฌ์šฉ)
# (2) ๋งŒ์•ฝ ํŠน์ • ๊ธ€๋ฆฌํ”„ ๋ˆ„๋ฝ ์‹œ AppleGothic, Malgun Gothic ๋“ฑ fallback

# ๊ฐ•์ œ๋กœ NanumGothic ํฐํŠธ ๋“ฑ๋ก
nanum_font_path = '/Users/kimdonghyuk/Library/Fonts/NanumGothic-Regular.ttf'
fm.fontManager.addfont(nanum_font_path)

# ์—ฌ๋Ÿฌ ํฐํŠธ๋ฅผ ์ˆœ์„œ๋Œ€๋กœ ์ง€์ •ํ•˜์—ฌ ๋ˆ„๋ฝ๋œ ๊ธ€๋ฆฌํ”„ ๋ฌธ์ œ ์™„ํ™”
plt.rcParams['font.family'] = [
    'NanumGothic',   # ์šฐ์„  ์‹œ๋„
    'AppleGothic',   # macOS ๊ธฐ๋ณธ ํ•œ๊ธ€ ํฐํŠธ (fallback)
    'Malgun Gothic', # Windows ๊ธฐ๋ณธ ํ•œ๊ธ€ ํฐํŠธ (fallback)
    'sans-serif'     # ๊ธฐ๋ณธ sans-serif ํฐํŠธ
]
plt.rcParams['axes.unicode_minus'] = False  # ์Œ์ˆ˜ ๊ธฐํ˜ธ ๊นจ์ง ๋ฐฉ์ง€

print("๐Ÿ“ข ํฐํŠธ ์„ค์ • ์™„๋ฃŒ: NanumGothic -> AppleGothic -> Malgun Gothic ์ˆœ์œผ๋กœ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค!")

# -----------------------------
# 2. ์—‘์…€ ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
# -----------------------------
file_path = '๊ต๋ณด๋ฌธ๊ณ _์นดํ…Œ๊ณ ๋ฆฌ_์ƒํ’ˆ๋ฆฌ์ŠคํŠธ.xlsx'  # ์—‘์…€ ํŒŒ์ผ ๊ฒฝ๋กœ
df = pd.read_excel(file_path)
print(f"โœ… ๋ฐ์ดํ„ฐ ๋กœ๋“œ ์„ฑ๊ณต: {file_path}")

# -----------------------------
# 3. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ
# -----------------------------
required_columns = ['์ƒํ’ˆ๋ช…', '์ •๊ฐ€', 'ํŒ๋งค๊ฐ€', 'ํ• ์ธ์œจ', '์ ๋ฆฝ์œจ', '์ ๋ฆฝ์˜ˆ์ •ํฌ์ธํŠธ', '์ถœํŒ์‚ฌ']

# ํ•„์š”ํ•œ ์ปฌ๋Ÿผ์ด ๋ชจ๋‘ ์žˆ๋Š”์ง€ ์ฒดํฌ
missing_cols = [col for col in required_columns if col not in df.columns]
if missing_cols:
    raise KeyError(f"๋‹ค์Œ ์ปฌ๋Ÿผ๋“ค์ด ๋ˆ„๋ฝ๋˜์—ˆ์Šต๋‹ˆ๋‹ค: {missing_cols}")

df = df[required_columns]

numeric_columns = ['์ •๊ฐ€', 'ํŒ๋งค๊ฐ€', 'ํ• ์ธ์œจ', '์ ๋ฆฝ์œจ', '์ ๋ฆฝ์˜ˆ์ •ํฌ์ธํŠธ']
for col in numeric_columns:
    df[col] = df[col].astype(str).str.replace(',', '').str.replace('%', '').astype(float)

print("โœ… ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ์™„๋ฃŒ")

# -----------------------------
# 4. ํ‚ค์›Œ๋“œ ์ถ”์ถœ (KoNLPy)
# -----------------------------
okt = Okt()
stopwords = ['๊ธฐ์ˆ ', '์ด๋ก ', '๊ธฐ๋ณธ์„œ', '๋ฌธ์ œ์ง‘', '์‹ค๋ฌด', '๊ธฐ์ถœ', 'ํ•„๊ธฐ', '์‹ค๊ธฐ']  # ๋ถ„์„ ์ œ์™ธํ•  ๋‹จ์–ด๋“ค

all_nouns = []
for title in df['์ƒํ’ˆ๋ช…']:
    nouns = okt.nouns(title)  # ๋ช…์‚ฌ๋งŒ ์ถ”์ถœ
    filtered = [n for n in nouns if n not in stopwords and len(n) > 1]
    all_nouns.extend(filtered)

print("โœ… ํ‚ค์›Œ๋“œ ์ถ”์ถœ ์™„๋ฃŒ")

# -----------------------------
# 5. ๋นˆ๋„์ˆ˜ ๋ถ„์„
# -----------------------------
word_counts = Counter(all_nouns)
top_keywords = word_counts.most_common(20)
keywords_df = pd.DataFrame(top_keywords, columns=['ํ‚ค์›Œ๋“œ', '๋นˆ๋„์ˆ˜'])

print("โœ… ๋นˆ๋„์ˆ˜ ๋ถ„์„ ์™„๋ฃŒ")
print(keywords_df.head(10))

# -----------------------------
# 6. ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”
# -----------------------------
sns.set(style="whitegrid")

# (1) ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„ ์‹œ๊ฐํ™”
plt.figure(figsize=(12, 8))
sns.barplot(x='๋นˆ๋„์ˆ˜', y='ํ‚ค์›Œ๋“œ', data=keywords_df, palette='viridis')
plt.title('๐Ÿ”Ž ์ƒ์œ„ 20๊ฐœ ํ‚ค์›Œ๋“œ ๋นˆ๋„์ˆ˜')
plt.xlabel('๋นˆ๋„์ˆ˜')
plt.ylabel('ํ‚ค์›Œ๋“œ')
plt.show()

# (2) ์›Œ๋“œํด๋ผ์šฐ๋“œ ์ƒ์„ฑ
try:
    wordcloud = WordCloud(
        font_path=nanum_font_path,  # NanumGothic ํฐํŠธ ์ง€์ •
        background_color='white',
        width=800,
        height=600
    ).generate_from_frequencies(dict(word_counts))

    plt.figure(figsize=(12, 8))
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis('off')
    plt.title('๐ŸŒŸ ํ‚ค์›Œ๋“œ ์›Œ๋“œํด๋ผ์šฐ๋“œ')
    plt.show()
except ValueError as e:
    print(f"โš ๏ธ ์›Œ๋“œํด๋ผ์šฐ๋“œ ์ƒ์„ฑ ์ค‘ ๋ฌธ์ œ ๋ฐœ์ƒ: {e}")

3๏ธโƒฃ ์ฝ”๋“œ๋ฅผ ๋‹จ๊ณ„๋ณ„๋กœ ์‚ดํŽด๋ณด๊ธฐ ๐Ÿง

  1. ํฐํŠธ ์„ค์ •

    • matplotlib์˜ rcParams['font.family']์— ์—ฌ๋Ÿฌ ํฐํŠธ๋ฅผ ๋ฆฌ์ŠคํŠธ๋กœ ๋„ฃ์–ด๋‘๋ฉด, ์•ž์„  ํฐํŠธ์—์„œ ๊ธ€๋ฆฌํ”„๊ฐ€ ์—†์œผ๋ฉด ๋’ค์ชฝ ํฐํŠธ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด missing from current font ์—๋Ÿฌ๋ฅผ ํฌ๊ฒŒ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  2. ์—‘์…€ ํŒŒ์ผ ์ฝ๊ธฐ

    • pandas๋กœ ๊ต๋ณด๋ฌธ๊ณ ์—์„œ ์ถ”์ถœํ•œ ์—‘์…€ ํŒŒ์ผ์„ ์ฝ์–ด์˜ต๋‹ˆ๋‹ค.
  3. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

    • ํ•„์š”ํ•œ ์ปฌ๋Ÿผ๋งŒ ์„ ํƒํ•˜๊ณ , ์‰ผํ‘œ(,)์™€ ํผ์„ผํŠธ(%)๋ฅผ ์ œ๊ฑฐํ•ด ์ˆซ์žํ˜•์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  4. KoNLPy๋กœ ํ‚ค์›Œ๋“œ ์ถ”์ถœ

    • Okt ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ๋ฅผ ์‚ฌ์šฉํ•ด ๋ช…์‚ฌ๋ฅผ ์ถ”์ถœํ•œ ๋’ค, ๋ถˆ์šฉ์–ด ๋ฆฌ์ŠคํŠธ(stopwords)์—์„œ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.
  5. ๋นˆ๋„์ˆ˜ ๋ถ„์„ & ์‹œ๊ฐํ™”

    • collections.Counter๋กœ ํ‚ค์›Œ๋“œ ๋นˆ๋„์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , Seaborn ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„๋กœ ์ƒ์œ„ 20๊ฐœ ํ‚ค์›Œ๋“œ ๋นˆ๋„์ˆ˜๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.
    • WordCloud๋กœ ์›Œ๋“œํด๋ผ์šฐ๋“œ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ํ‚ค์›Œ๋“œ๋ฅผ ์ง๊ด€์ ์œผ๋กœ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

4๏ธโƒฃ ์‹คํ–‰ ๊ฒฐ๊ณผ ๋ฏธ๋ฆฌ๋ณด๊ธฐ ๐Ÿ”Ž

  • ์ƒ์œ„ 20๊ฐœ ํ‚ค์›Œ๋“œ ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„
    ๊ฐ ํ‚ค์›Œ๋“œ๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋งŽ์ด ๋“ฑ์žฅํ–ˆ๋Š”์ง€ ํ•œ๋ˆˆ์— ๋ณผ ์ˆ˜ ์žˆ์–ด์š”.
  • ์›Œ๋“œํด๋ผ์šฐ๋“œ
    ๋นˆ๋„๊ฐ€ ๋†’์€ ๋‹จ์–ด์ผ์ˆ˜๋ก ํฌ๊ฒŒ ํ‘œ์‹œ๋˜๋Š” ๊ตฌ๋ฆ„ ํ˜•ํƒœ์˜ ์‹œ๊ฐํ™”๋กœ, ์ธ๊ธฐ ํ‚ค์›Œ๋“œ๋ฅผ ์ง๊ด€์ ์œผ๋กœ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

5๏ธโƒฃ ๋งˆ๋ฌด๋ฆฌ โœจ

์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ๊ต๋ณด๋ฌธ๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜๋ฉฐ, ํ•œ๊ธ€ ํฐํŠธ ์ด์Šˆ๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด Fallback ํฐํŠธ๋ฅผ ์ง€์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ดํŽด๋ดค์–ด์š”! ๐Ÿ˜Ž

  • ๋งŒ์•ฝ ์—ฌ์ „ํžˆ missing from current font ๋ฉ”์‹œ์ง€๊ฐ€ ๋œฌ๋‹ค๋ฉด, matplotlib ์บ์‹œ ์‚ญ์ œ์™€

โœจ
์ผ๋ณธ ๋งฅ์—์„œ ํ•œ๊ธ€ ํฐํŠธ ์„ค์ •์ด ์–ด๋ ค์›Œ ํ•œ๊ธ€์„ ํฌ๊ธฐํ–ˆ์ง€๋งŒ, ๋ถ„์„๊ณผ ์‹œ๊ฐํ™”๋ฅผ ๋๊นŒ์ง€ ์™„์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค!
ํฐํŠธ ๋ฌธ์ œ๋กœ ๊ณ ์ƒํ•˜์‹œ๋Š” ๋ถ„๋“ค์€ ์ €์ฒ˜๋Ÿผ ์˜์–ด ๊ธฐ๋ฐ˜์œผ๋กœ ๋Œ€์ฒดํ•˜๊ฑฐ๋‚˜, ํ•œ๊ธ€ ํฐํŠธ๋ฅผ ์„ค์น˜ํ•ด ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์„ ์ฐพ์œผ์‹œ๋ฉด ์ข‹๊ฒ ์Šต๋‹ˆ๋‹ค..

๊ถ๊ธˆํ•œ ์ ์ด๋‚˜ ์ถ”๊ฐ€ ์˜๊ฒฌ์ด ์žˆ๋‹ค๋ฉด ๋Œ“๊ธ€๋กœ ๋‚จ๊ฒจ์ฃผ์„ธ์š”!
์—ฌ๋Ÿฌ๋ถ„์˜ ๋ถ„์„์—๋„ ๋„์›€์ด ๋˜์—ˆ๊ธธ ๋ฐ”๋ž๋‹ˆ๋‹ค. ๐Ÿ™Œ

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜: ๊ต๋ณด๋ฌธ๊ณ 
์ฐธ๊ณ  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ:

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค! ๐Ÿ’– ์ฆ๊ฑฐ์šด ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋˜์„ธ์š”!

profile
๐Ÿฑ ๋„์ฟ„์—์„œ ํ™œ๋™ ์ค‘์ธ ์›น ๊ฐœ๋ฐœ์ž ๐Ÿ‡ฏ๐Ÿ‡ต๐Ÿ’ป ๐Ÿง‘โ€๐Ÿ’ป ์ตœ๊ทผ์—๋Š” ์š”์ฆ˜IT์—์„œ ์ž‘๊ฐ€๋กœ๋„ ํ™œ๋™ ์ค‘์ž…๋‹ˆ๋‹ค! ๐Ÿ“ ์š”์ฆ˜IT ๊ธ€ ๋ชจ์Œ: https://yozm.wishket.com/magazine/@donghyuk65/

0๊ฐœ์˜ ๋Œ“๊ธ€