[0606] TIL 40์ผ์ฐจ

nikevapormaxยท2022๋…„ 6์›” 6์ผ
0

TIL

๋ชฉ๋ก ๋ณด๊ธฐ
39/116

๐Ÿ˜‚ Django Project

๐Ÿ˜ญ Machine Learning part

  • google Colab์—์„œ ์ง„ํ–‰ํ–ˆ๋˜ ์œ ์ € ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง๊ณผ ์•„์ดํ…œ ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง, ์ž ์žฌ ์š”์ธ ํ˜‘์—… ํ•„ํ„ฐ๋ง์„ pyCharm์œผ๋กœ ์˜ฎ๊ฒจ์˜ค๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ์ƒ๊ฐํ•ด ๋ณด์•˜๋‹ค.
  • ์ €๋ฒˆ์ฒ˜๋Ÿผ modelํ™”๋ฅผ ์‹œ์ผœ์„œ ๋“ค๊ณ  ์™€์•ผ ์ƒ๊ฐํ•˜๋ฉด์„œ ๋„ˆ๋ฌด ๋ง‰๋ง‰ํ•ด์„œ ํŠœํ„ฐ๋‹˜๊ป˜ ์งˆ๋ฌธ์„ ๋“œ๋ ธ๊ณ , ์ฝ”๋“œ๋ฅผ ์Šฌ์ฉ ๋ณด์‹œ๋”๋‹ˆ ๊ตณ์ด ๋ชจ๋ธํ™”ํ•  ํ•„์š”์—†๋‹ค๊ณ  ํ•˜์…จ๋‹ค.
  • ๊ทธ๋ ‡๋‹ค๋ฉด ๋‚˜์—๊ฒŒ ์ฃผ์–ด์ง€๋Š” ์„ ํƒ์ง€๋Š” 2๊ฐœ๋ผ๊ณ  ์ƒ๊ฐํ•œ๋‹ค.
    • ํ•จ์ˆ˜ํ™”
    • ํด๋ž˜์Šคํ™”
  • ํด๋ž˜์Šค๋กœ ๋งŒ๋“ค์–ด์„œ ์‚ฌ์šฉํ•˜๋ฉด ์ง„์งœ ๋„ˆ๋ฌด ์ข‹๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค. ๊ณ„์† ์—ฐ์Šตํ•ด์•ผ ๊ฒ ๋‹ค๋Š” ์ƒ๊ฐ์ด ๋งŽ์ด ๋“œ๋Š” ๋ถ€๋ถ„์ด๊ณ , ํŠœํ„ฐ๋‹˜๋“ค์˜ ์ฝ”๋“œ ์ค‘ ํด๋ž˜์Šค๋กœ ์ž‘์„ฑ๋˜์–ด ์žˆ๋Š” ๊ฒƒ์„ ๋ณด๋ฉด ์™œ ์ด๋Ÿฐ ์ƒ๊ฐ์„ ๋ชปํ–ˆ์ง€๋ผ๊ณ  ๋Š๋ผ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ํ•˜์ง€๋งŒ ์ผ๋‹จ์€ ํ•จ์ˆ˜ํ™”์— ์ดˆ์ ์„ ๋‘์—ˆ๋‹ค. (์‹œ๊ฐ„์ด ํ—ˆ๋ฝํ•˜๊ณ  ๋‚ด ์ง€์‹์ด ํ—ˆ๋ฝํ•œ๋‹ค๋ฉด ๋ฆฌํŒฉํ† ๋ง์€ ๋ฌด์กฐ๊ฑด ํ•  ๊ฒƒ์ด๋‹ค.)
  • ์ง€๊ธˆ ์šฐ๋ฆฌ์˜ ํ”„๋กœ์ ํŠธ์—์„œ ๋‚ด๊ฐ€ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ์€ ํ•„ํ„ฐ๋ง์€ ์•„์ดํ…œ ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง๊ณผ ์ž ์žฌ ์š”์ธ ํ˜‘์—… ํ•„ํ„ฐ๋ง์ด๋‹ค. ํ˜„์žฌ ์ž ์žฌ ์š”์ธ ํ˜‘์—… ํ•„ํ„ฐ๋ง์€ ๋ชจ๋“ˆ import์—์„œ ์˜ค๋ฅ˜๊ฐ€ ์ข€ ์žˆ๊ณ , ๊ตฌ๊ธ€๋ง์„ ํ•ด๋ณด์•˜์œผ๋‚˜ ํ•ด๊ฒฐ์ด ๋˜์ง€ ์•Š์•„ ๋‚ด์ผ ์งˆ๋ฌธ์„ ๋“œ๋ฆด ์ƒ๊ฐ์ด๋‹ค.
  • ์šฐ์„ ์ ์œผ๋กœ ์•„์ดํ…œ ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์„ ํ•จ์ˆ˜ํ™”ํ•˜์˜€๋‹ค.
    • ์—ฌ๊ธฐ์„œ ์• ๋ฅผ ์ข€ ๋จน์€ ๋ถ€๋ถ„์ด ์žˆ๋‹ค. ๋ฐ”๋กœ ์˜ํ™” ์ด๋ฆ„์„ ์ถœ๋ ฅํ•˜๋Š” ๊ฒƒ์ด๋‹ค.
    • ํ˜„์žฌ ํ•ด๋‹น ํ•„ํ„ฐ๋ง์˜ ์ดˆ์ ์€ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„์ด๋‹ค.
    • ๊ฒฐ๊ณผ๋ฅผ ์ž˜ ๊ตฌํ•˜๊ณ  ๋‹จ์ˆœํžˆ movie_list๋ฅผ ์ถœ๋ ฅํ•˜๊ฒŒ ๋˜๋ฉด ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๊ฐ€ ๋†’์€ ์˜ํ™”๋ณ„๋กœ 20๊ฐ€์ง€๊ฐ€ ์ถœ๋ ฅ๋œ๋‹ค. ์ด๋•Œ๋Š” ์˜ํ™”์ด๋ฆ„์ด index๋กœ ๊ฐ™์ด ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค.
    • ์ฒ˜์Œ์— Series ํ˜•ํƒœ๋ฅผ ์ƒ๊ฐํ•˜์ง€ ๋ชปํ•˜๊ณ  ๋‹จ์ˆœํžˆ ๋ฆฌ์ŠคํŠธ์—์„œ ๊ฐ’์„ ๊บผ๋‚ผ ๋•Œ์ฒ˜๋Ÿผ ์ถœ๋ ฅํ•˜๋ ค ํ•˜๋‹ค๊ฐ€ ํ•˜์ง€ ๋ชปํ•˜๊ณ  ๊ตฌ๊ธ€๋ง์„ ํ–ˆ๋‹ค.
    • ๋ฆฌํ„ด๊ฐ’์ด Series์ž„์„ ๊นจ๋‹ซ๊ณ , index๋ฅผ ๋ฆฌํ„ด๊ฐ’์œผ๋กœ ์ฃผ์—ˆ๊ณ  ์˜ํ™”์˜ ์ด๋ฆ„์„ 20๊ฐœ ๋ฝ‘์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค.
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

movies = pd.read_csv('movies.csv')
ratings = pd.read_csv('ratings.csv')

def item_based_filtering(movie):
    movie_ratings = pd.merge(ratings, movies, on='movieId')

    user_title = movie_ratings.pivot_table('rating', index='title', columns='userId')
    user_title = user_title.fillna(0)

    item_based_collab = cosine_similarity(user_title, user_title)
    item_based_collab = pd.DataFrame(item_based_collab, index=user_title.index, columns=user_title.index)

    # ์ฃผ์–ด์งˆ ์˜ํ™” ์ค‘ cosine-similarity ๊ฐ’์ด ๊ฐ€์žฅ ํฐ ์ˆœ์œผ๋กœ 20๋ฒˆ์งธ๊นŒ์ง€ ๋ณ€์ˆ˜์— ์ €์žฅ
    movie_list = item_based_collab[movie].sort_values(ascending=False)[1:20]

    return movie_list.index

m = item_based_filtering('Dark Knight, The (2008)')
for i in m:
    print(i)
  • ์œ ์ € ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์€ ํ˜„์žฌ ๋‚ด ๊ณ„ํš์—์„œ๋Š” ์“ฐ์ผ ๊ฐ€๋Šฅ์„ฑ์ด ์—†๋‹ค๊ณ  ์ƒ๊ฐํ•˜์ง€๋งŒ ํ˜น์‹œ ์ด๊ฒƒ์„ ์“ฐ๋Š” ๊ฒƒ์ด ์ •๋‹ต์ผ ์ˆ˜๋„ ์žˆ์–ด ํ•จ์ˆ˜ํ™”ํ•˜๋Š” ๊น€์— ๊ฐ™์ด ์ง„ํ–‰ํ•ด๋ณด์•˜๋‹ค.
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

movies = pd.read_csv('movies.csv')
ratings = pd.read_csv('ratings.csv')


def user_based_filtering(movie):
    # ratings์™€ movies๋ฅผ movieId๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์กฐ์ธํ•œ๊ฑฐ๋ผ ์ƒ๊ฐํ•˜๋ฉด ๋จ
    movie_ratings = pd.merge(ratings, movies, on='movieId')

    title_user = movie_ratings.pivot_table('rating', index='userId', columns='title')
    title_user = title_user.fillna(0)

    user_based_collab = cosine_similarity(title_user, title_user)
    user_based_collab = pd.DataFrame(user_based_collab, index=title_user.index, columns=title_user.index)

    chosen_user = user_based_collab[5].sort_values(ascending=False)[:10].index[1]
    result = title_user.query(f'userId == {chosen_user}').sort_values(ascending=False, by=chosen_user, axis=1)

    # ๊ธฐ์ค€ ์œ ์ €์ธ 5๋ฒˆ ์œ ์ €์™€ ์œ ์‚ฌํ•œ ์ˆœ์„œ๋ฅผ ๋‚˜ํƒ€๋ƒ„(์•„๊นŒ ์œ„์—์„œ 10๋ช… ๋งจ ์ฒ˜์Œ์— ๋ฝ‘์€ ๊ทธ ๊ฐ’์„ ๋‹ค์‹œ ๋ฝ‘์•„์„œ ๋ฆฌ์ŠคํŠธ๋กœ ๋งŒ๋“ ๊ฑฐ์ž„)
    user_index_list = user_based_collab[5].sort_values(ascending=False)[:10].index.tolist()
    print(f'๊ธฐ์ค€ ์œ ์ €์™€ ์œ ์‚ฌํ•œ ์ˆœ์„œ(๋งจ ์•ž์ด ๊ธฐ์ค€ ์œ ์ €) : {user_index_list}')
    # ์ด๊ฑฐ๋Š” ์œ„์™€ ๋˜‘๊ฐ™์€ ๋ฐ ์ˆœ์„œ๊ฐ€ ์•„๋‹Œ ๊ฐ€์ค‘์น˜๋ฅผ ๋ฆฌ์ŠคํŠธํ™”ํ•œ๊ฑฐ์ž„
    user_weight_list = user_based_collab[5].sort_values(ascending=False)[:10].tolist()
    print(f'๊ฐ€์ค‘์น˜ ๊ฐ’ : {user_weight_list}')

    movie_title = movie
    weighted_sum, weighted_user = [], []

    # 0๋ฒˆ์€ ์ž๊ธฐ ์ž์‹ ์ด๋‹ˆ๊นŒ 1๋ฒˆ๋ถ€ํ„ฐ 10๋ฒˆ๊นŒ์ง€ ๋Œ๋ฆฌ๋Š”๊ฑฐ์•ผ!
    for i in range(1, 10):
        value = title_user[movie_title][user_index_list[i]]
        if int(value) != 0:
            # 5๋ฒˆ ์œ ์ €๋ž‘ ์œ ์‚ฌํ•œ ์‚ฌ๋žŒ๋“ค์ด ๋‹จ ํ‰์ ์—๋‹ค๊ฐ€ ๊ทธ ์‚ฌ๋žŒ๋“ค์˜ ์œ„์— ์žˆ๋Š” ๊ฐ€์ค‘์น˜๋ฅผ ๊ณฑํ•œ ๊ฐ’
            weighted_sum.append(title_user[movie_title][user_index_list[i]] * user_weight_list[i])
            # user_weight_list๋ฅผ ๋‹ค์‹œ ํ•œ ๋ฒˆ ๋งŒ๋“œ๋Š”๊ฒƒ๊ณผ ๊ฐ™๋‹ค.
            weighted_user.append(user_weight_list[i])

    print(weighted_sum)
    print(weighted_user)
    pred_rating = sum(weighted_sum) / sum(weighted_user)

    return pred_rating

# 5๋ฒˆ ์œ ์ €์˜ Batman Forever (1995) ํ‰์  ์˜ˆ์ธก์น˜
print(user_based_filtering("Batman Forever (1995)"))
profile
https://github.com/nikevapormax

0๊ฐœ์˜ ๋Œ“๊ธ€