P와 Q를 학습한 후 P x Q^T를 통해 보이지 않는 평점을 예측
사용자 행렬(P) 업데이트:
아이템 행렬(Q) 업데이트:
def get_rmse(R, P, Q, non_zeros):
error = 0
full_pred_matrix = np.dot(P, Q.T)
x_non_zero_ind = [non_zero[0] for non_zero in non_zeros]
y_non_zero_ind = [non_zero[1] for non_zero in non_zeros]
R_non_zeros = R[x_non_zero_ind, y_non_zero_ind]
full_pred_matrix_non_zeros = full_pred_matrix[x_non_zero_ind, y_non_zero_ind]
mse = mean_squared_error(R_non_zeros, full_pred_matrix_non_zeros)
rmse = np.sqrt(mse)
return mse
def matrix_factorization(R,K,steps = 200, learning_rate=0.01, r_lambda = 0.01):
num_users, num_items = R.shape
np.random.seed(1)
P = np.random.normal(scale=1./K, size=(num_users, K))
Q = np.random.normal(scale=1./K, size=(num_items, K))
non_zeros = [(i, j, R[i,j]) for i in range(num_users) for j in range(num_items) if R[i,j] > 0]
for step in range(steps):
for i, j, r in non_zeros:
eij = r - np.dot(P[i, :], Q[j, :].T)
P[i,:] = P[i,:] + learning_rate*(eij * Q[j, :] - r_lambda*P[i,:])
Q[j,:] = Q[j,:] + learning_rate*(eij * P[i, :] - r_lambda*Q[j,:]) # <- Q 업데이트 시 Q[i,:] 말고 Q[j,:] 주의!
rmse = get_rmse(R,P,Q,non_zeros)
if (step % 10) == 0:
print(f"### iteration step : {step}, RMSE : {rmse}")
return P, Q
import pandas as pd
import numpy as np
movies = pd.read_csv('./movies.csv')
ratings = pd.read_csv('./ratings.csv')
ratings = ratings[['userId', 'movieId', 'rating']]
ratings_matrix = ratings.pivot_table('rating', index='userId', columns='movieId')
# title 컬럼을 얻기 이해 movies 와 조인 수행
rating_movies = pd.merge(ratings, movies, on='movieId')
# columns='title' 로 title 컬럼으로 pivot 수행.
ratings_matrix = rating_movies.pivot_table('rating', index='userId', columns='title')
pivot_table로 user x movie 형태로 변환 P, Q = matrix_factorization(ratings_matrix.values, K=50, steps=200, learning_rate=0.01, r_lambda = 0.01)
pred_matrix = np.dot(P, Q.T)
out:
### iteration step : 0 rmse : 8.428744210189846
### iteration step : 10 rmse : 0.5345336287556501
### iteration step : 20 rmse : 0.25504927389945564
### iteration step : 30 rmse : 0.13464253769141935
### iteration step : 40 rmse : 0.08475660660649217
### iteration step : 50 rmse : 0.06110664290542235
### iteration step : 60 rmse : 0.04828870941971258
### iteration step : 70 rmse : 0.04047612643952627
### iteration step : 80 rmse : 0.03526271381807897
### iteration step : 90 rmse : 0.03153949519066271
### iteration step : 100 rmse : 0.02874234001559589
### iteration step : 110 rmse : 0.02655884622287304
### iteration step : 120 rmse : 0.02480282354043634
### iteration step : 130 rmse : 0.023356384569296406
### iteration step : 140 rmse : 0.022141257793697803
### iteration step : 150 rmse : 0.021103502876817563
### iteration step : 160 rmse : 0.02020480521296684
### iteration step : 170 rmse : 0.019417242940681048
### iteration step : 180 rmse : 0.018720008700939508
### iteration step : 190 rmse : 0.018097288248668287
ratings_pred_matrix = pd.DataFrame(data=pred_matrix, index= ratings_matrix.index,
columns = ratings_matrix.columns)
ratings_pred_matrix.head(3)
out:
title '71 (2014) 'Hellboy': The Seeds of Creation (2004) 'Round Midnight (1986) 'Salem's Lot (2004) 'Til There Was You (1997) 'Tis the Season for Love (2015) 'burbs, The (1989) 'night Mother (1986) (500) Days of Summer (2009) *batteries not included (1987) ... Zulu (2013) [REC] (2007) [REC]² (2009) [REC]³ 3 Génesis (2012) anohana: The Flower We Saw That Day - The Movie (2013) eXistenZ (1999) xXx (2002) xXx: State of the Union (2005) ¡Three Amigos! (1986) À nous la liberté (Freedom for Us) (1931)
userId
1 3.377643 4.505318 3.861761 4.809262 4.230209 1.419098 3.622598 2.588018 5.194875 4.107374 ... 1.505721 4.367506 3.908597 2.927652 2.946106 3.674696 3.158186 2.244369 4.046990 0.907779
2 3.284211 3.978786 3.509599 4.386566 4.480689 1.371289 4.377122 2.157676 3.437670 3.868025 ... 1.136970 3.545436 3.462178 2.752496 2.581022 4.440165 3.081573 1.772660 4.339104 0.781266
3 2.637018 2.227526 1.834213 2.671847 2.630876 0.955755 2.277016 1.336127 3.275131 2.961405 ... 0.709452 1.913354 2.583782 2.021059 1.895893 1.642332 3.054965 1.175833 2.707816 0.479738
3 rows × 9719 columns
| 항목 | 설명 |
|---|---|
| 목적 | 비어 있는 평점을 예측하여 추천 시스템에 활용 |
| 핵심 공식 | |
| 방법 | SGD를 이용해 P, Q를 학습하며 반복적으로 최적화 |
| 업데이트 핵심 | 오차() * 상대 행렬 방향으로 조금씩 업데이트 + 정규화 |
| 활용 분야 | 넷플릭스, 왓챠, 쿠팡, 11번가, 유튜브 등 추천 시스템 |