๐Ÿ—ณ๏ธ ๋„ค์ด๋ฒ„ ๊ฒ€์ƒ‰ API ๋ฐœ๊ธ‰ํ•˜์—ฌ ํ‚ค์›Œ๋“œ ์–ธ๊ธ‰๋Ÿ‰ ์ถ”์ถœ | ์žฅ๋ž˜ ์ •์น˜ ์ง€๋„์ž ํŠธ๋ Œ๋“œ ๋ถ„์„ ํ”„๋กœ์ ํŠธ #3

Tough Cookieยท2025๋…„ 3์›” 12์ผ
0

๐Ÿ’ก JSON vs XML ์ค‘ JSON์„ ์„ ํƒํ•œ ์ด์œ 

  • ๊ฐ€๋…์„ฑ๊ณผ ์‚ฌ์šฉ ํŽธ์˜์„ฑ : JSON์€ {key: value} ํ˜•์‹์ด๋ผ ์‚ฌ๋žŒ์ด ์ฝ๊ณ  ์“ฐ๊ธฐ ์‰ฝ๊ณ , XML๋ณด๋‹ค ๊ตฌ์กฐ๊ฐ€ ๊ฐ„๋‹จํ•จ, Python์—์„œ๋Š” dict๋กœ ์‰ฝ๊ฒŒ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ์–ด ๋‹ค๋ฃจ๊ธฐ ํŽธ๋ฆฌํ•จ.
  • ๋น ๋ฅธ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ์†๋„ : JSON์€ Python์˜ json.loads()๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฐ”๋กœ dict๋กœ ๋ณ€ํ™˜๋จ โ†’ ํŒŒ์‹ฑ ์†๋„๊ฐ€ ๋น ๋ฆ„, XML์€ ElementTree ๊ฐ™์€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์จ์•ผ ํ•˜๊ณ , ํƒœ๊ทธ๋ฅผ ํŒŒ์‹ฑํ•ด์•ผ ํ•˜๋ฏ€๋กœ ๋” ๋А๋ฆผ.
  • ๋ฐ์ดํ„ฐ ํฌ๊ธฐ ๋ฐ ๋„คํŠธ์›Œํฌ ํšจ์œจ์„ฑ : JSON์€ ์—†์ด {} ํ˜•์‹์ด๋ผ XML๋ณด๋‹ค ๋ฐ์ดํ„ฐ ํฌ๊ธฐ๊ฐ€ ์ž‘์Œ โ†’ ๋„คํŠธ์›Œํฌ ์ „์†ก ๋น„์šฉ ์ ˆ๊ฐ, ๊ฐ™์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๋‚ผ ๋•Œ JSON์ด ๋” ๊ฐ€๋ณ๊ธฐ ๋•Œ๋ฌธ์— API ์‘๋‹ต ์†๋„๊ฐ€ ๋” ๋น ๋ฅผ ์ˆ˜ ์žˆ์Œ.

๐Ÿ€ ๋„ค์ด๋ฒ„ ๊ฐœ๋ฐœ์ž์„ผํ„ฐ API ๋ฐœ๊ธ‰


1. ๊ณต์‹ ์‚ฌ์ดํŠธ


2. key ๋ฐœ๊ธ‰ ์‹ ์ฒญ

  • ๋‰ด์Šค์™€ ๋ธ”๋กœ๊ทธ๋Š” '๊ฒ€์ƒ‰' ์นดํ…Œ๊ณ ๋ฆฌ์ด๊ณ , ํ•˜๋ฃจ 25,000ํšŒ ํ˜ธ์ถœ์ด ๊ฐ€๋Šฅํ•œ, ๋น„๋กœ๊ทธ์ธ ์˜คํ”ˆ API ์ž…๋‹ˆ๋‹ค.
  • ์œ„ ์„ค๋ช…์„ ์ฐธ๊ณ ํ•˜์—ฌ API ๋ฐœ๊ธ‰์„ ์œ„ํ•ด์„œ ํ•„์š”ํ•œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค.
  • http://localhost:3000 8080 ํฌํŠธ๋Š” airflow๊ฐ€ ์ฐจ์ง€ํ•˜์˜€์œผ๋ฏ€๋กœ, ํ”ํžˆ ์“ฐ์ด๋Š” 3000ํฌํŠธ๋ฅผ ์ง€์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

3. key ๋ฐœ๊ธ‰ ์™„๋ฃŒ

  • ์œ„์™€ ๊ฐ™์ด ๋ฐœ๊ธ‰๋ฐ›์€ ํ‚ค๋ฅผ docker์˜ settings.py ํŒŒ์ผ์— ํ•˜๋“œ์ฝ”๋”ฉํ• ์ง€, ํ˜น์€ ๋‹ค๋ฅธ ๋” ๋‚˜์€ ๋ฐฉ๋ฒ•์ด ์žˆ๋Š”์ง€ ๊ณ ๋ฏผํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
CLIENT_ID = 'YOUR_CLIENT_ID'  # ๋„ค์ด๋ฒ„ ๊ฐœ๋ฐœ์ž ์„ผํ„ฐ์—์„œ ๋ฐœ๊ธ‰๋ฐ›์€ ํด๋ผ์ด์–ธํŠธ ์•„์ด๋””
CLIENT_SECRET = 'YOUR_CLIENT_SECRET'  # ๋„ค์ด๋ฒ„ ๊ฐœ๋ฐœ์ž ์„ผํ„ฐ์—์„œ ๋ฐœ๊ธ‰๋ฐ›์€ ํด๋ผ์ด์–ธํŠธ ์‹œํฌ๋ฆฟ

๐Ÿ€ ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ ํ™•์ธํ•˜๊ธฐ


1. ๋„ค์ด๋ฒ„ ๊นƒํ—ˆ๋ธŒ api ๋ช…์„ธ์„œ

  • ๋„ค์ด๋ฒ„์˜ ๊ณต์‹ ๊นƒํ—ˆ๋ธŒ์— ๋“ค์–ด๊ฐ€๋ฉด api์˜ ํŒŒ๋ผ๋ฏธํ„ฐ์™€ ์‘๋‹ต ์˜ˆ์‹œ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋‰ด์Šค : naver/naver-openapi-guide/ko/service-apis/search/news
  • ๋ธ”๋กœ๊ทธ : naver-openapi-guide/ko/service-apis/search/blog/blog
  • 25๋…„ 3์›” ๊ธฐ์ค€ ์ œ๊ณต๋˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์ ์–ด ์กฐ์ž‘์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด ๋งŽ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜๋„ ํ•˜๋ฃจ 25,000๋ฒˆ ํ˜ธ์ถœ์ด ๊ฐ€๋Šฅํ•˜๋‹ˆ ๋งŽ์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์„ ์ˆ˜๋Š” ์žˆ์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
  • ๋„ค์ด๋ฒ„ ๋‰ด์Šค ๋ฐ ๋ธ”๋กœ๊ทธ api๋Š” ํŠน์ • "ํ‚ค์›Œ๋“œ"๋ฅผ ๊ฒ€์ƒ‰ํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” ๊ฒƒ์ด ํŠน์ง•์ž…๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ์ €๋Š” ์žฅ๋ž˜ ์ •์น˜ ์ง€๋„์ž ํ›„๋ณด์ž๋“ค์˜ ์ด๋ฆ„์„ ํ‚ค์›Œ๋“œ๋กœ ์„ค์ •ํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

2. ์ƒ˜ํ”Œ ์ฝ”๋“œ

  • 25๋…„๋„ 3์›” ๊ธฐ์ค€ ์˜ˆ์ œ ์ฝ”๋“œ๋Š” ๋ธ”๋กœ๊ทธ ๊ฒ€์ƒ‰์˜ ๊ฒฝ์šฐ๋งŒ ์ œ๊ณต๋˜์–ด ์žˆ์ง€๋งŒ, ๋‰ด์Šค๋„ ๊ฑฐ์˜ ๋™์ผํ•œ ๋ฐฉ์‹์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
# ๋„ค์ด๋ฒ„ ๊ฒ€์ƒ‰ API ์˜ˆ์ œ - ๋ธ”๋กœ๊ทธ ๊ฒ€์ƒ‰
import os
import sys
import urllib.request
client_id = "YOUR_CLIENT_ID"
client_secret = "YOUR_CLIENT_SECRET"
encText = urllib.parse.quote("๊ฒ€์ƒ‰ํ•  ๋‹จ์–ด")
url = "https://openapi.naver.com/v1/search/blog?query=" + encText # JSON ๊ฒฐ๊ณผ
# url = "https://openapi.naver.com/v1/search/blog.xml?query=" + encText # XML ๊ฒฐ๊ณผ
request = urllib.request.Request(url)
request.add_header("X-Naver-Client-Id",client_id)
request.add_header("X-Naver-Client-Secret",client_secret)
response = urllib.request.urlopen(request)
rescode = response.getcode()
if(rescode==200):
    response_body = response.read()
    print(response_body.decode('utf-8'))
else:
    print("Error Code:" + rescode)

๐Ÿ€ PostgreSQL ํ…Œ์ด๋ธ” ์ฝ”๋“œ ์งœ๊ธฐ


1. ๋‰ด์Šค ํ…Œ์ด๋ธ”

  • ๋‰ด์Šค ๋ฐ์ดํ„ฐ ๋ฐ˜ํ™˜ ์ƒ˜ํ”Œ
  • ๊ณ ๋ ค์‚ฌํ•ญ 1 : original_link๊ฐ€ PK๊ฒฉ์ธ๋ฐ(๊ธฐ์‚ฌ๊ฐ€ ์‚ญ์ œ๋˜๋Š” ๊ฒฝ์šฐ๋„ ์žˆ๊ธฐ ๋•Œ๋ฌธ) ์ค‘๋ณต ๋ฐ์ดํ„ฐ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•ด UNIQUE NOT NULL ์กฐ๊ฑด์ด ๋“ค์–ด๊ฐ€์•ผ ํ•œ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์ƒ˜ํ”Œ์„ ๋ณด๋‹ˆ NULL๊ฐ’์ด ์žˆ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด Transform ๊ณผ์ •์—์„œ ํ•ด๊ฒฐํ•˜๋„๋ก ํ•˜์ž. ELT๊ฐ€ ์•„๋‹Œ ETL๋กœ ๊ฐ€์•ผ ํ•œ๋‹ค.
  • ๊ณ ๋ ค์‚ฌํ•ญ 2 : title์— ํ›„๋ณด์ž์˜ ์ด๋ฆ„, ์ฆ‰ ํ‚ค์›Œ๋“œ๊ฐ€ ํฌํ•จ๋˜์ง€ ์•Š์€ ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋‹ค. ์ด๊ฒƒ๋„ Transform์—์„œ ๊ฑธ๋Ÿฌ์•ผ๊ฒ ์ง€..?
  • ๊ณ ๋ ค์‚ฌํ•ญ 3 : pubdate๋Š” ๋‰ด์Šค ๋ฐœํ–‰์ผ์ธ๋ฐ, ํ˜•์‹์ด ๋ง˜์— ๋“ค์ง€ ์•Š๋Š”๋‹ค. yyyy-mm-dd ํ˜•์‹์œผ๋กœ ๋ฐ”๊พธ๋„๋ก ํ•œ๋‹ค.
-- naver_news
CREATE TABLE IF NOT EXISTS naver_news (
    kid SERIAL PRIMARY KEY,
    keyword TEXT NOT NULL,
    title TEXT NOT NULL,
    original_link TEXT UNIQUE NOT NULL,  -- ์ค‘๋ณต ๋ฐ์ดํ„ฐ ๋ฐฉ์ง€
    naver_link TEXT UNIQUE NOT NULL,
    description TEXT,
    published_date DATE NOT NULL,
    FOREIGN KEY (keyword) REFERENCES candidate_info(candidate_name) ON DELETE CASCADE
);

2. ๋ธ”๋กœ๊ทธ ํ…Œ์ด๋ธ”

  • ๋ธ”๋กœ๊ทธ ๋ฐ์ดํ„ฐ ๋ฐ˜ํ™˜ ์ƒ˜ํ”Œ
  • ๊ณ ๋ ค์‚ฌํ•ญ 1 : ๋ธ”๋กœ๊ทธ ๋ฐ์ดํ„ฐ๋Š” ์ฒด๊ฐ์ƒ ๋‰ด์Šค๋ณด๋‹ค ์ •ํ™•๋„๊ฐ€ ๋” ๊ดœ์ฐฎ์€ ๊ฒƒ ๊ฐ™๋‹ค. ๋‰ด์Šค๋Š” ์—ฌ๋Ÿฌ ์‹ ๋ฌธ์‚ฌ ์›น์‚ฌ์ดํŠธ ์ •๋ณด๋ฅผ ๊ฐ€์ ธ์™”์ง€๋งŒ ๋ธ”๋กœ๊ทธ๋Š” ๋„ค์ด๋ฒ„ ์˜ค๋ฆฌ์ง€๋‚ ์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๊ทธ๋ ‡์ง€๋งŒ ํ˜น์‹œ ๋ชฐ๋ผ title์— ํ‚ค์›Œ๋“œ ํฌํ•จ์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•˜๊ณ  ์ ์žฌํ•˜๋„๋ก ํ•œ๋‹ค.
  • ๊ณ ๋ ค์‚ฌํ•ญ 2 : ๋ธ”๋กœ๊ทธ ์‚ฌ์šฉ์ž๋“ค์ด ์ด๋ชจ์ง€๋ฅผ ์จ์„œ ์ž๋ฃŒํ˜•์ด TEXT์ธ ์ปฌ๋Ÿผ์— ์ด๋ชจ์ง€๊ฐ€ ๋“ค์–ด๊ฐ„๋‹ค. GPT์—๊ฒŒ ๋ฌผ์–ด๋ณธ ๊ฒฐ๊ณผ PostgreSQL์˜ TEXT ์ž๋ฃŒํ˜•์€ UTF-8์„ ๊ธฐ๋ณธ ์ธ์ฝ”๋”ฉ์œผ๋กœ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ชจํ‹ฐ์ฝ˜์„ ์ •์ƒ์ ์œผ๋กœ ์ €์žฅํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค. ๋‹คํ–‰
  • ๊ณ ๋ ค์‚ฌํ•ญ 3 : ๋‚ ์งœ ํ˜•์‹
-- naver_blog
CREATE TABLE IF NOT EXISTS naver_blog (
    kid SERIAL PRIMARY KEY,
    keyword TEXT NOT NULL,
    title TEXT NOT NULL,
    link TEXT UNIQUE NOT NULL,  -- ์ค‘๋ณต ๋ฐ์ดํ„ฐ ๋ฐฉ์ง€
    description TEXT,
    blogger_name TEXT,
    blogger_link TEXT,
    post_date DATE NOT NULL,
    FOREIGN KEY (keyword) REFERENCES candidate_info(candidate_name) ON DELETE CASCADE
);
profile
Data Engineering

0๊ฐœ์˜ ๋Œ“๊ธ€