개발일지 31(수정필요) - TV시리즈, 드라마 타입 해결

tk7580·2025년 6월 26일

개인프로젝트 - 공공데이터를 활용한 웹앱개발자 양성 과정

목록 보기

37/44

개발일지 30: 작품 타입 분류 체계 재정의 및 스크립트 로직 수정

날짜

2025년 6월 26일

작업자

(사용자 닉네임)

1. 문제 정의: 'TV 시리즈'와 '드라마' 타입의 모호성

기존의 TV Series 타입은 그 경계가 모호하여 데이터 분류에 혼선을 초래했다. 예를 들어, '코난 쇼' 같은 토크쇼와 '왕좌의 게임' 같은 서사 중심의 드라마, 그리고 '죠죠의 기묘한 모험' 같은 애니메이션 시리즈가 모두 'TV'라는 카테고리에 묶일 수 있었다.

특히, '아케인'처럼 작품의 매체는 Animation이면서 형식은 Drama인 경우, 기존의 단순 규칙 기반 로직으로는 장르 태그에 'Drama'가 포함되어 있다는 이유만으로 Drama 타입을 부여하게 되어, '죠죠의 기묘한 모험'과 같은 액션 중심의 작품과 동일하게 취급되는 문제가 있었다.

이는 서비스의 핵심인 작품 분류의 정확도를 떨어뜨리는 근본적인 문제로 판단하여, 타입 분류 체계의 전면 재검토를 결정했다.

2. 새로운 타입 분류 전략 수립

프로젝트의 핵심 정체성을 '서사가 있는 작품'에 집중하기로 결정하고, 아래와 같이 타입 분류 전략을 최종 확정했다.

'TV 시리즈' 타입 폐지: '코난 쇼'와 같은 비-서사 프로그램은 프로젝트 범위에서 제외하기로 결정함에 따라, 의미가 모호해진 'TV Series' 타입을 마스터 데이터(work_type)에서 완전히 제거했다.
'Drama' 타입의 재정의: 이제 'Drama'는 '여러 편으로 구성된 서사 구조를 가진 작품'으로 명확히 정의한다. 이는 실사(Live-Action)와 애니메이션(Animation) 모두에 적용될 수 있다.
2단계 파이프라인 도입 (역할 분리):
- 1차 수집기 (tmdb_collector, anilist_collector): API가 제공하는 Movie, Animation, Live-Action 등 객관적인 사실 정보만 수집 및 저장한다. 'Drama' 여부는 이 단계에서 판단하지 않는다.
- 2차 분류기 (type_fixer.py): 1차 수집된 데이터를 바탕으로, LLM(Gemini)에게 "이 작품이 서사가 중심인 본격 드라마인가?"와 같이 미묘한 판단을 요청하여 'Drama' 타입을 추가하는 역할을 전담한다.

3. 스크립트 코드 수정 내역

위 전략에 따라, 1차 수집기들에서 'Drama' 타입을 규칙으로 자동 부여하는 로직을 모두 제거했다.

3-1. `tmdb_collector.py` 수정본

normalize_tmdb_data 함수에서 TMDB의 'Drama' 장르(ID 18) 태그를 기반으로 'Drama' 타입을 부여하던 로직을 삭제했다. 이제 TV 매체는 'Animation' 또는 'Live-Action'으로만 우선 분류된다.

# tmdb_collector.py (전체 코드)

import os
import requests
import json
import mysql.connector
from mysql.connector import Error
from dotenv import load_dotenv, find_dotenv
import time
import argparse

def find_work_by_external_id(cursor, source_name, source_id):
    query = "SELECT workId FROM work_identifier WHERE sourceName = %s AND sourceId = %s LIMIT 1"
    cursor.execute(query, (source_name, str(source_id)))
    result = cursor.fetchone()
    return result[0] if result else None

def get_or_create_genre_ids(cursor, connection, genre_names):
    genre_ids = []
    if not genre_names: return genre_ids
    select_query = "SELECT id FROM genre WHERE name = %s"
    insert_query = "INSERT INTO genre (name, regDate, updateDate) VALUES (%s, NOW(), NOW())"
    for name in genre_names:
        cursor.execute(select_query, (name,))
        result = cursor.fetchone()
        if result:
            genre_ids.append(result[0])
        else:
            cursor.execute(insert_query, (name,))
            genre_ids.append(cursor.lastrowid)
    connection.commit()
    return genre_ids

def link_genres_to_work(cursor, connection, work_id, genre_ids):
    if not genre_ids: return
    delete_query = "DELETE FROM work_genre WHERE workId = %s"
    insert_query = "INSERT INTO work_genre (workId, genreId, regDate) VALUES (%s, %s, NOW())"
    try:
        cursor.execute(delete_query, (work_id,))
        data_to_insert = [(work_id, genre_id) for genre_id in genre_ids]
        cursor.executemany(insert_query, data_to_insert)
        connection.commit()
    except Error as e:
        print(f"  [오류] 작품-장르 연결 중 DB 에러: {e}")
        connection.rollback()

def get_type_ids_from_names(cursor, type_names):
    type_ids = []
    if not type_names: return type_ids
    format_strings = ','.join(['%s'] * len(type_names))
    query = f"SELECT id FROM work_type WHERE name IN ({format_strings})"
    cursor.execute(query, tuple(type_names))
    results = cursor.fetchall()
    for row in results:
        type_ids.append(row[0])
    return type_ids

def link_types_to_work(cursor, connection, work_id, type_ids):
    if not type_ids: return
    delete_query = "DELETE FROM work_type_mapping WHERE workId = %s"
    insert_query = "INSERT INTO work_type_mapping (workId, typeId, regDate) VALUES (%s, %s, NOW())"
    try:
        cursor.execute(delete_query, (work_id,))
        data_to_insert = [(work_id, type_id) for type_id in type_ids]
        cursor.executemany(insert_query, data_to_insert)
        connection.commit()
    except Error as e:
        print(f"  [오류] 작품-타입 연결 중 DB 에러: {e}")
        connection.rollback()

def find_or_create_series(cursor, connection, item_data):
    title_kr = item_data.get('titleKr')
    title_original = item_data.get('titleOriginal')
    
    collection_info = item_data.get('collection')
    if collection_info and collection_info.get('name'):
        series_title = collection_info['name']
        cursor.execute("SELECT id FROM series WHERE titleKr = %s OR titleOriginal = %s LIMIT 1", (series_title, series_title))
        result = cursor.fetchone()
        if result: return result[0]
        
        poster_path = f"[https://image.tmdb.org/t/p/w500](https://image.tmdb.org/t/p/w500){collection_info.get('poster_path')}" if collection_info.get('poster_path') else None
        series_data = (series_title, series_title, None, poster_path, None, None, None, None)
        insert_query = "INSERT INTO series (titleKr, titleOriginal, description, thumbnailUrl, coverImageUrl, author, studios, publisher, regDate, updateDate) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, NOW(), NOW())"
        cursor.execute(insert_query, series_data)
        return cursor.lastrowid
            
    series_data = (title_kr, title_original, item_data.get('description'), item_data.get('thumbnailUrl'), None, None, item_data.get('studios'), None)
    insert_query = "INSERT INTO series (titleKr, titleOriginal, description, thumbnailUrl, coverImageUrl, author, studios, publisher, regDate, updateDate) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, NOW(), NOW())"
    cursor.execute(insert_query, series_data)
    return cursor.lastrowid

def get_api_data(url):
    try:
        response = requests.get(url)
        response.raise_for_status()
        return response.json()
    except Exception as e:
        print(f"  [오류] API 요청 실패: {url} - {e}")
        return None

def normalize_tmdb_data(item_detail, media_type):
    genres = [genre['name'] for genre in item_detail.get('genres', [])]
    types = []
    is_animation = 16 in [g['id'] for g in item_detail.get('genres', [])]

    if media_type == 'movie':
        types.append('Movie')
        if is_animation:
            types.append('Animation')
        else:
            types.append('Live-Action')
    elif media_type == 'tv':
        if is_animation:
            types.append('Animation')
        else:
            types.append('Live-Action')
    
    trailer_key = None
    if 'videos' in item_detail and item_detail['videos']['results']:
        for video in item_detail['videos']['results']:
            if video.get('site') == 'YouTube' and video.get('type') == 'Trailer':
                trailer_key = video.get('key')
                if video.get('official'):
                    break
    
    trailer_url = f"[https://www.youtube.com/watch?v=](https://www.youtube.com/watch?v=){trailer_key}" if trailer_key else None

    base_data = {
        'id': item_detail.get('id'), 'description': item_detail.get('overview'),
        'thumbnailUrl': f"[https://image.tmdb.org/t/p/w500](https://image.tmdb.org/t/p/w500){item_detail.get('poster_path')}" if item_detail.get('poster_path') else None,
        'genres': genres, 'types': types,
        'collection': item_detail.get('belongs_to_collection'),
        'studios': ", ".join([c['name'] for c in item_detail.get('production_companies', [])]),
        'trailerUrl': trailer_url
    }
    
    if media_type == 'movie':
        creators = ", ".join([p['name'] for p in item_detail.get('credits', {}).get('crew', []) if p.get('job') == 'Director'])
        base_data.update({'titleKr': item_detail.get('title'), 'titleOriginal': item_detail.get('original_title'), 'releaseDate': item_detail.get('release_date') or None, 'episodes': 1, 'duration': item_detail.get('runtime'), 'creators': creators})
    elif media_type == 'tv':
        creators = ", ".join([p['name'] for p in item_detail.get('created_by', [])])
        base_data.update({'titleKr': item_detail.get('name'), 'titleOriginal': item_detail.get('original_name'), 'releaseDate': item_detail.get('first_air_date') or None, 'isCompleted': item_detail.get('status') == 'Ended', 'episodes': item_detail.get('number_of_episodes'), 'duration': item_detail.get('episode_run_time')[0] if item_detail.get('episode_run_time') else None, 'creators': creators})
    return base_data

def upsert_item(cursor, connection, item_summary, media_type, api_key):
    if media_type == 'tv' and 16 in item_summary.get('genre_ids', []):
        print(f"  [SKIP] '{item_summary.get('name')}' -> 애니메이션 TV 시리즈이므로 anilist_collector에서 처리합니다.")
        return

    source_name = f"TMDB_{media_type.upper()}"
    tmdb_id = item_summary.get('id')
    if not tmdb_id: return None

    detail_url = f"[https://api.themoviedb.org/3/](https://api.themoviedb.org/3/){media_type}/{tmdb_id}?api_key={api_key}&language=ko-KR&append_to_response=videos,credits"
    detail_data = get_api_data(detail_url)
    if not detail_data: return None

    normalized_data = normalize_tmdb_data(detail_data, media_type)
    
    print("-" * 40)
    print(f"  >> '{normalized_data.get('titleKr')}' 처리 시작")
    
    work_id = find_work_by_external_id(cursor, source_name, tmdb_id)

    if work_id:
        update_query = "UPDATE work SET titleKr=%s, titleOriginal=%s, releaseDate=%s, description=%s, thumbnailUrl=%s, studios=%s, creators=%s, episodes=%s, duration=%s, isCompleted=%s, trailerUrl=%s, updateDate=NOW() WHERE id=%s"
        params = (
            normalized_data['titleKr'], normalized_data['titleOriginal'], normalized_data.get('releaseDate'), 
            normalized_data['description'], normalized_data['thumbnailUrl'], normalized_data['studios'],
            normalized_data.get('creators'), normalized_data.get('episodes'), normalized_data.get('duration'),
            normalized_data.get('isCompleted'), normalized_data.get('trailerUrl'),
            work_id
        )
        cursor.execute(update_query, params)
        print(f"  [업데이트] (workId: {work_id}) -> 정보 업데이트 완료")
    else:
        series_id = find_or_create_series(cursor, connection, normalized_data)
        insert_query = "INSERT INTO work (seriesId, regDate, updateDate, titleKr, titleOriginal, releaseDate, description, thumbnailUrl, studios, creators, episodes, duration, isCompleted, trailerUrl) VALUES (%s, NOW(), NOW(), %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)"
        params = (
            series_id, normalized_data['titleKr'], normalized_data['titleOriginal'], normalized_data.get('releaseDate'), 
            normalized_data['description'], normalized_data['thumbnailUrl'], normalized_data['studios'],
            normalized_data.get('creators'), normalized_data.get('episodes'), normalized_data.get('duration'),
            normalized_data.get('isCompleted'), normalized_data.get('trailerUrl')
        )
        cursor.execute(insert_query, params)
        work_id = cursor.lastrowid
        cursor.execute("INSERT INTO work_identifier (workId, sourceName, sourceId, regDate, updateDate) VALUES (%s, %s, %s, NOW(), NOW())", (work_id, source_name, str(tmdb_id)))
        print(f"  [신규 저장] (workId: {work_id}) -> work, series, identifier 저장 완료")

    link_genres_to_work(cursor, connection, work_id, get_or_create_genre_ids(cursor, connection, normalized_data.get('genres', [])))
    link_types_to_work(cursor, connection, work_id, get_type_ids_from_names(cursor, normalized_data.get('types', [])))
    print(f"    - 타입 및 장르 정보 처리 완료: {normalized_data.get('types')}")
    
    connection.commit()

def main():
    parser = argparse.ArgumentParser(description="TMDB에서 영화 및 TV 시리즈 정보를 수집합니다.")
    parser.add_argument('--pages', type=int, default=1, help="각 목록에서 수집할 페이지 수 (페이지당 20개)")
    parser.add_argument('--type', type=str, choices=['movie', 'tv'], help="수집할 미디어 타입 (movie 또는 tv)")
    parser.add_argument('--endpoint', type=str, default='popular', choices=['popular', 'top_rated'], help="수집할 목록 (popular 또는 top_rated)")
    args = parser.parse_args()

    load_dotenv(find_dotenv())
    api_key = os.getenv('TMDB_API_KEY')
    db_host = os.getenv('DB_HOST'); db_user = os.getenv('DB_USER'); db_password = os.getenv('DB_PASSWORD'); db_database = os.getenv('DB_DATABASE')

    if not all([api_key, db_host, db_user, db_password, db_database]):
        print("에러: .env 파일에 필요한 모든 정보(API키, DB접속정보)를 설정해주세요.")
        return

    connection = None
    try:
        connection = mysql.connector.connect(host=db_host, user=db_user, password=db_password, database=db_database, port=3306)
        cursor = connection.cursor(dictionary=True, buffered=True)
        print("데이터베이스에 성공적으로 연결되었습니다.")
        
        media_types_to_process = [args.type] if args.type else ['movie', 'tv']
        
        for media_type in media_types_to_process:
            for page in range(1, args.pages + 1):
                list_url = f"[https://api.themoviedb.org/3/](https://api.themoviedb.org/3/){media_type}/{args.endpoint}?api_key={api_key}&language=ko-KR&page={page}"
                print(f"\n===== '{media_type.upper()}' / '{args.endpoint}' 데이터 처리 시작 (페이지: {page}) =====")
                list_data = get_api_data(list_url)
                if not list_data or not list_data.get('results'):
                    print("  [정보] 목록을 가져올 수 없습니다. 다음으로 넘어갑니다.")
                    continue
                
                print(f"  > {len(list_data['results'])}개 항목 처리 시작...")
                for item_summary in list_data['results']:
                    upsert_item(cursor, connection, item_summary, media_type, api_key)
                    time.sleep(0.5)

    except Error as e:
        print(f"데이터베이스 작업 중 에러 발생: {e}")
    finally:
        if connection and connection.is_connected():
            cursor.close()
            connection.close()
            print("\n데이터베이스 연결이 종료되었습니다.")

if __name__ == "__main__":
    main()

3-2. data_reconciler.py 수정본

determine_types_from_anilist 함수에서 AniList의 장르 목록에 'Drama'가 포함되어 있는지 확인하는 로직을 삭제했다. 이제 AniList 수집기는 작품의 형식(Movie 등)과 매체(Animation)만 판단한다.

# data_reconciler.py (전체 코드)
import os
import requests
import json
import time
from dotenv import load_dotenv, find_dotenv
import mysql.connector
from mysql.connector import Error

load_dotenv(find_dotenv())
API_URL = '[https://graphql.anilist.co](https://graphql.anilist.co)'

def get_db_connection():
    try:
        connection = mysql.connector.connect(
            host=os.getenv('DB_HOST'), user=os.getenv('DB_USER'),
            password=os.getenv('DB_PASSWORD'), database=os.getenv('DB_DATABASE'), port=3306)
        return connection
    except Error as e:
        print(f"DB 연결 오류: {e}")
        return None

def find_or_create_series_in_db(cursor, title):
    search_term = f"%{title}%"
    cursor.execute("SELECT id FROM series WHERE titleKr LIKE %s OR titleOriginal LIKE %s LIMIT 1", (search_term, search_term))
    result = cursor.fetchone()
    if result:
        print(f"-> 기존 시리즈 '{title}'(을)를 DB에서 찾았습니다. (seriesId: {result['id']})")
        return result['id']
    else:
        print(f"-> DB에 없는 새로운 시리즈 '{title}'. 신규 생성합니다.")
        cursor.execute("INSERT INTO series (regDate, updateDate, titleKr) VALUES (NOW(), NOW(), %s)", (title,))
        new_series_id = cursor.lastrowid
        print(f"   - 신규 시리즈 생성 완료. (seriesId: {new_series_id})")
        return new_series_id

def find_work_id_by_anilist_id(cursor, anilist_id):
    cursor.execute("SELECT workId FROM work_identifier WHERE sourceName = 'ANILIST_ANIME' AND sourceId = %s", (str(anilist_id),))
    result = cursor.fetchone()
    return result['workId'] if result else None

def get_full_details_from_anilist(anilist_id):
    query = '''
    query ($id: Int) {
      Media (id: $id, type: ANIME) {
        id
        title { romaji english native }
        format
        status
        description(asHtml: false)
        startDate { year month day }
        episodes
        duration
        genres
        studios(isMain: true) { nodes { name } }
        coverImage { extraLarge }
        trailer { id site }
      }
    }
    '''
    variables = {'id': anilist_id}
    print(f"   (AniList에서 ID '{anilist_id}'의 상세 정보 조회...)")
    response = requests.post(API_URL, json={'query': query, 'variables': variables})
    response.raise_for_status()
    return response.json()['data']['Media']

def fetch_anime_with_relations(anime_id):
    query = '''
    query ($id: Int) {
      Media (id: $id, type: ANIME) {
        id
        title { romaji english native }
        relations { edges { relationType(version: 2) node { id } } }
      }
    }
    '''
    variables = {'id': anime_id}
    response = requests.post(API_URL, json={'query': query, 'variables': variables})
    response.raise_for_status()
    return response.json()['data']['Media']
    
def get_type_ids_from_names(cursor, type_names):
    type_ids = []
    if not type_names: return type_ids
    format_strings = ','.join(['%s'] * len(type_names))
    query = f"SELECT id FROM work_type WHERE name IN ({format_strings})"
    cursor.execute(query, tuple(type_names))
    results = cursor.fetchall()
    for row in results:
        type_ids.append(row[0])
    return type_ids

def link_types_to_work(cursor, connection, work_id, type_ids):
    if not type_ids: return
    delete_query = "DELETE FROM work_type_mapping WHERE workId = %s"
    insert_query = "INSERT INTO work_type_mapping (workId, typeId, regDate) VALUES (%s, %s, NOW())"
    try:
        cursor.execute(delete_query, (work_id,))
        data_to_insert = [(work_id, type_id) for type_id in type_ids]
        cursor.executemany(insert_query, data_to_insert)
        connection.commit()
    except Error as e:
        print(f"  [오류] 작품-타입 연결 중 DB 에러: {e}")
        connection.rollback()

def determine_types_from_anilist(anilist_data):
    types = {'Animation'}
    anilist_format = anilist_data.get('format')
    if anilist_format == 'MOVIE':
        types.add('Movie')
    # 'Drama' 타입은 이제 LLM이 판단하므로 여기서 부여하지 않음
    return list(types)

def process_series_from_entry_point(entry_anilist_id):
    print(f"\n{'='*20} [ 시리즈 처리 시작 (시작 ID: {entry_anilist_id}) ] {'='*20}")
    connection = get_db_connection()
    if not connection: return
    cursor = connection.cursor(dictionary=True)

    try:
        main_work_data = fetch_anime_with_relations(entry_anilist_id)
        works_to_process_ids = [main_work_data['id']]
        for edge in main_work_data.get('relations', {}).get('edges', []):
            if edge.get('relationType') == 'SEQUEL':
                works_to_process_ids.append(edge['node']['id'])
        print(f"✅ 총 {len(works_to_process_ids)}개 작품(시즌) 처리 대상 확정: {works_to_process_ids}")

        series_id = find_or_create_series_in_db(cursor, main_work_data['title']['english'] or main_work_data['title']['romaji'])
        connection.commit()

        for anilist_id in works_to_process_ids:
            work_id_in_db = find_work_id_by_anilist_id(cursor, anilist_id)
            details = get_full_details_from_anilist(anilist_id)
            title_kr = details['title']['english'] or details['title']['romaji']
            print(f"\n--- '{title_kr}' (AniList ID: {anilist_id}) 처리 ---")

            type_names = determine_types_from_anilist(details)
            type_ids = get_type_ids_from_names(cursor, type_names)

            if work_id_in_db:
                print(f"  [UPDATE]: DB에 workId '{work_id_in_db}'(으)로 존재. 정보 보강 실행.")
                update_query = "UPDATE work SET seriesId = %s, description = %s, episodes = %s, duration = %s, studios = %s, isCompleted = %s, updateDate = NOW(), titleKr = %s, titleOriginal = %s, releaseDate = %s, thumbnailUrl = %s, trailerUrl = %s WHERE id = %s"
                start_date_str = f"{details['startDate']['year']}-{(details['startDate']['month'] or 1):02d}-{(details['startDate']['day'] or 1):02d}"
                studios_str = ", ".join(node['name'] for node in details.get('studios', {}).get('nodes', []))
                trailer_url = f"[https://www.youtube.com/watch?v=](https://www.youtube.com/watch?v=){details['trailer']['id']}" if details.get('trailer') and details['trailer']['site'] == 'youtube' else None
                params = (series_id, details.get('description'), details.get('episodes'), details.get('duration'), studios_str, 1 if details.get('status') == 'FINISHED' else 0, title_kr, details['title']['native'], start_date_str, details.get('coverImage', {}).get('extraLarge'), trailer_url, work_id_in_db)
                cursor.execute(update_query, params)
                link_types_to_work(cursor, connection, work_id_in_db, type_ids)
                print(f"  -> 기존 work(id:{work_id_in_db}) 정보/타입 업데이트 완료: {type_names}")
            else:
                print(f"  [INSERT]: DB에 없는 작품. 신규 추가 실행.")
                start_date_str = f"{details['startDate']['year']}-{(details['startDate']['month'] or 1):02d}-{(details['startDate']['day'] or 1):02d}"
                studios_str = ", ".join(node['name'] for node in details.get('studios', {}).get('nodes', []))
                trailer_url = f"[https://www.youtube.com/watch?v=](https://www.youtube.com/watch?v=){details['trailer']['id']}" if details.get('trailer') and details['trailer']['site'] == 'youtube' else None
                insert_query = "INSERT INTO work (seriesId, regDate, updateDate, titleKr, titleOriginal, releaseDate, episodes, duration, studios, description, thumbnailUrl, trailerUrl, isCompleted) VALUES (%s, NOW(), NOW(), %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)"
                params = (series_id, title_kr, details['title']['native'], start_date_str, details.get('episodes'), details.get('duration'), studios_str, details.get('description'), details.get('coverImage', {}).get('extraLarge'), trailer_url, 1 if details.get('status') == 'FINISHED' else 0)
                cursor.execute(insert_query, params)
                new_work_id = cursor.lastrowid
                cursor.execute("INSERT INTO work_identifier (workId, sourceName, sourceId, regDate, updateDate) VALUES (%s, 'ANILIST_ANIME', %s, NOW(), NOW())", (new_work_id, str(anilist_id)))
                link_types_to_work(cursor, connection, new_work_id, type_ids)
                print(f"  -> 신규 work 추가 완료 (새 workId: {new_work_id}), 타입: {type_names}")
        
        connection.commit()
    except Exception as e:
        print(f"오류 발생: {e}")
        connection.rollback()
    finally:
        if connection.is_connected():
            cursor.close()
            connection.close()
            print("\nDB 연결이 종료되었습니다.")

if __name__ == "__main__":
    process_series_from_entry_point(21459)

4. 결론 및 기대 효과

위 수정사항을 통해 데이터 파이프라인의 역할 분담이 더욱 명확해졌다.

1차 수집기 (tmdb, anilist): 객관적이고 구조화된 정보(포맷, 매체, 제작사, 예고편 등)를 최대한 수집한다.
2차 분류기 (type_fixer): '드라마'와 같이 미묘한 판단이 필요한 타입을 LLM을 통해 지능적으로 부여한다.
3차 보강기 (llm_enricher): 한글 제목, 줄거리처럼 누락되기 쉬운 정보를 LLM을 통해 창의적으로 채워 넣는다.

이로써, 각 스크립트의 책임이 명확해지고 데이터의 품질과 일관성을 장기적으로 유지보수하기 좋은 구조가 완성되었다.

tk7580

이전 포스트

개발일지 30(수정필요) - 명령줄 인수 기능 추가

다음 포스트

개발일지 31(수정필요) - TV시리즈, 드라마 타입 해결

개인프로젝트 - 공공데이터를 활용한 웹앱개발자 양성 과정

개발일지 30: 작품 타입 분류 체계 재정의 및 스크립트 로직 수정

날짜

작업자

1. 문제 정의: 'TV 시리즈'와 '드라마' 타입의 모호성

2. 새로운 타입 분류 전략 수립

3. 스크립트 코드 수정 내역

3-1. `tmdb_collector.py` 수정본

3-2. data_reconciler.py 수정본

4. 결론 및 기대 효과

개발일지 30(수정필요) - 명령줄 인수 기능 추가

개발일지 32(수정필요) - 데이터 수집 파이프라인 정리

0개의 댓글

개발일지 31(수정필요) - TV시리즈, 드라마 타입 해결

개인프로젝트 - 공공데이터를 활용한 웹앱개발자 양성 과정

개발일지 30: 작품 타입 분류 체계 재정의 및 스크립트 로직 수정

날짜

작업자

1. 문제 정의: 'TV 시리즈'와 '드라마' 타입의 모호성

2. 새로운 타입 분류 전략 수립

3. 스크립트 코드 수정 내역

3-1. tmdb_collector.py 수정본

3-2. data_reconciler.py 수정본

4. 결론 및 기대 효과

개발일지 30(수정필요) - 명령줄 인수 기능 추가

개발일지 32(수정필요) - 데이터 수집 파이프라인 정리

0개의 댓글

3-1. `tmdb_collector.py` 수정본