Recommendation System

been_29·2024년 9월 12일

한국경제신문 with Toss bank MLOps 과정

목록 보기

18/26

💡 Recommendation System

A machine learning-based technique that provides optimal suggestions based on user preferences, behavior, and past interaction data

🎨 Collaborative Filtering

Recommendations generated based on user interactions

User-based Collaborative Filtering

Definition : Based on the assumption that 'users with similar preferences are likely to like similar items,' the algorithm recommends new items by utilizing the item preferences of similar users
How it works
1. User-Item Interaction Matrix generation : Create a User-Item Interaction Matrix based on the ratings or interaction data that each user gives to each item
  
  $\begin{bmatrix} r_{1,1} & r_{1,2} & r_{1,3} & \cdots & r_{1,n} \\ r_{2,1} & r_{2,2} & r_{2,3} & \cdots & r_{2,n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ r_{m,1} & r_{m,2} & r_{m,3} & \cdots & r_{m,n} \end{bmatrix}$
  - Here, $r_{i,j}$ is the rating user $i$ gave to item $j$
2. Calculate Similarity : Calculate the similarity between users, usaually using Cosine Similarity or Pearson Correlation
  - Cosine Similarity formula
  $\text{sim}(u, v) = \frac{\sum_{i} r_{u,i} \cdot r_{v,i}}{\sqrt{\sum_{i} r_{u,i}^2} \cdot \sqrt{\sum_{i} r_{v,i}^2}}$
  - Here, $r_{u,i}$ is the rating user $u$ gave to item $i$
  - Pearson Correlation formula
  $\text{sim}(u, v) = \frac{\sum_{i} (r_{u,i} - \bar{r_u})(r_{v,i} - \bar{r_v})}{\sqrt{\sum_{i} (r_{u,i} - \bar{r_u})^2} \cdot \sqrt{\sum_{i} (r_{v,i} - \bar{r_v})^2}}$
  - Here, $\bar{r}_u$ is the average rating given by user $u$
3. Selecting the most similar users : Based on the calculated similarity values, the most similar users are selected using the K-NN method
4. Prediction and recommendation generation : Based on the ratings from the similar users, the predicted rating for items that the current user hasn't rated yet is calculated
5. Recommendation generation : Recommend the items with the highest predicted ratings

Code Example

Python code for a recommendation system using User-based Collaborative Filtering
Cosine Similarity is used to calculate the similarity between users, and items are recommended based on the ratings of similar users
Calculate items to recommend to user 0 and recommends the items with the highest predicted ratings among the ones user 0 hasn’t rated yet

import numpy as np
 from sklearn.metrics.pairwise import cosine_similarity

 # User-Item Rating Matrix
 user_item_matrix = np.array([[5, 3, 0, 1],
                             [4, 0, 0, 1],
                             [1, 1, 0, 5],
                             [1, 0, 0, 4],
                             [0, 1, 5, 4]])

 # Calculate similarity between users (Cosine Similarity)
user_similarity = cosine_similarity(user_item_matrix)

 # User-based Collaborative Filtering function
def recommend(user_id, user_similarity, user_item_matrix, k=2):
     # Select users with high similarity
    similar_users = np.argsort(-user_similarity[user_id])[:k]
    # Ratings of similar users
    similar_users_ratings = user_item_matrix[similar_users]
    # Predict ratings by averaging the ratings of similar users
    predicted_ratings = similar_users_ratings.mean(axis=0)
   
    # Recommend items that the user has not rated yet, with the highest predicted ratings
    unrated_items = np.where(user_item_matrix[user_id] == 0)[0]
    recommendations = np.argsort(-predicted_ratings[unrated_items])
    return unrated_items[recommendations]

 # Calculate recommended items for user 0
recommendations = recommend(user_id=0, user_similarity=user_similarity,    user_item_matrix=user_item_matrix)
print(f"User 0's recommended items: {recommendations}")

Item-based Collaborative Filtering

Definition : Based on the assumption that 'users who like similar items are likely to like other similar items,' the algorithm calculates the similarity between items and makes recommendations based on that similarity
How it works
1. User-Item Interaction Matrix generation : Create a User-Item Interaction Matrix based on the ratings that users have given to each item
2. Calculating item similarity : Mainly uses Cosine Similarity and Pearson Correlation to calculate the similarity between items
3. Selecting the most similar items : Find items similar to those the user likes based on the calculated similarity values
4. Prediction and recommendation generation : For items the user hasn’t rated yet, predict the ratings using the ratings of similar items
Overall, it's similar to User-based Collaborative Filtering, but focuses on items instead of users

🎨 Content-Based Filtering

Finding items with similar features by analyzing the user's past behavior and preference information

How it works
1. Feature extraction of items: Extract the attributes of an item and represent them as a 'Feature Vector'
  ex) For a movie, attributes like genre, director, and cast can be converted into a vector
  
  $X_i = [x_{i1}, x_{i2}, \dots, x_{in}]$
  - Here, $X_i$ is the feature vector of item $i$ , and each feature $x_{i1}$ represents a specific attribute of the item
  - ex) $x_{i1}$ is the genre, $x_{i2}$ is the director, and $x_{i3}$ is the cast
2. Creating a user profile: Generate a user profile vector that reflects the user's preferences by combining the Feature Vectors of items they have rated
3. Calculating similarity between items and users: Calculate the similarity between the generated user profile vector and the feature vector of the item to recommend - usually using cosine similarity
  - Cosine Similarity is calculated by dividing the dot product of two vectors by the product of their magnitudes; the closer the result is to 1, the more similar the two vectors are
4. Generating recommendations: Based on the calculated similarity, recommend the most similar items that the user has not rated yet

Code Example

Recommend similar movies based on the user's movie preferences using the Content-based Filtering method

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# The feature vector of a movie (e.g., [genre, director, cast])
movie_features = np.array([[1, 1, 0],  # Movie 1
                           [0, 1, 1],  # Movie 2
                           [1, 0, 1],  # Movie 3
                           [0, 0, 1]]) # Movie 4

# The indices of movies rated by the user (e.g., the user likes Movie 1 and Movie 3)
user_ratings = np.array([5, 0, 4, 0])  # Ratings for movies rated by the user

# User profile vector calculation (weighted sum of the feature vectors of rated movies)
user_profile = user_ratings @ movie_features

# Calculation of similarity between the user profile and the feature vectors of each movie
similarities = cosine_similarity([user_profile], movie_features)[0]

# Recommend movies with high similarity from those the user has not rated yet
unrated_movies = np.where(user_ratings == 0)[0]
recommended_movies = np.argsort(-similarities[unrated_movies])

print(f"Recommended movies: {unrated_movies[recommended_movies]}")

User Profile Vector

Process of generating the user profile vector
1. Prepare the Feature Vector of items: A vector that quantifies the attributes of items
  ex) Assume each movie has the following attributes
  
  Item Genre Director Cast
  Movie A 1 1 0
  Movie B 0 1 1
  Movie C 1 0 1
- The feature vectors of each item can be expressed as: $X_A = [1, 1, 0], \quad X_B = [0, 1, 1], \quad X_C = [1, 0, 1]$
1. Prepare user rating information: Information representing how much a user prefers each item
  ex) Assume the user rated the following three movies as follows:
  
  Item Rating
  Movie A 5
  Movie B 0
  Movie C 4
2. Calculate the user profile vector: Multiply the feature vector of each item by the user’s rating, sum them up, and normalize
- The user profile vector $U_u$ is the weighted average of the feature vectors of the items rated by the user
  
  $U_u = \frac{1}{N_u} \sum_{i \in I_u} r_{u,i} \cdot X_i$
  - $U_u$ is the user $u$ 's profile vector
  - $N_u$ is the number of items rated by the user
  - $I_u$ is the set of items rated by the user
  - $r_{u,i}$ is the rating that user $u$ gave to item $i$
  - $X_i$ is the feature vector of item $i$

Item	Genre	Director	Cast
Movie A	1	1	0
Movie B	0	1	1
Movie C	1	0	1

Item	Rating
Movie A	5
Movie B	0
Movie C	4

Concrete example
1. Prepare the feature vectors of items: The feature vectors for movies A, B, and C are as follows:
  $X_A = [1, 1, 0], \quad X_B = [0, 1, 1], \quad X_C = [1, 0, 1]$
2. Prepare user rating information: The ratings the user gave to movies A, B, and C are as follows:
  $r_{u,A} = 5, \quad r_{u,B} = 0, \quad r_{u,C} = 4$
3. Calculate the user profile vector
  - Calculate the user profile vector:
    $U_u = \frac{1}{N_u} \left( r_{u,A} \cdot X_A + r_{u,B} \cdot X_B + r_{u,C} \cdot X_C \right)$
  - Here, $N_u=2$ (the number of items rated by the user), and the feature vector of each item is multiplied by the user's rating:
    $U_u = \frac{1}{2} \left( 5 \cdot [1, 1, 0] + 0 \cdot [0, 1, 1] + 4 \cdot [1, 0, 1] \right)$
  - After calculating:
    $U_u = \frac{1}{2} \left( [5, 5, 0] + [0, 0, 0] + [4, 0, 4] \right) U_u = \frac{1}{2} [9, 5, 4] U_u = [4.5, 2.5, 2.0]$
  - Therefore, this user’s profile vector is $[4.5, 2.5, 2.0]$
Code Example

import numpy as np

 # Item feature vectors (e.g., [Genre, Director, Cast])
 item_features = np.array([[1, 1, 0],  # Movie A
                           [0, 1, 1],  # Movie B
                           [1, 0, 1]]) # Movie C

 # Ratings given by the user
 user_ratings = np.array([5, 0, 4])  # Ratings for Movie A, Movie B, Movie C

 # Calculate user profile vector (weighted sum of the feature vectors of rated items)
 user_profile = np.dot(user_ratings, item_features) / np.count_nonzero(user_ratings)
 
 print(f"User profile vector: {user_profile}")

🎨 Hybrid Methods

A method that combines multiple recommendation algorithms to complement the weaknesses of individual algorithms and improve the performance and diversity of the recommendation system

Weighted Hybrid

A method that generates the final recommendation by assigning weights to the predicted scores of each algorithm
How it works: Calculate the result of each recommendation algorithm, and apply weights to those results to compute the final rating
Example formula (Combining Collaborative Filtering and Content-based Filtering)

$\hat{r}_{u,i} = w_1 \cdot \hat{r}_{u,i}^{CF} + w_2 \cdot \hat{r}_{u,i}^{CBF}$
- $\hat{r}_{u,i}$ is the predicted rating that user $u$ will give to item $i$
- $\hat{r}^{CF}_{u,i}$ is the predicted rating calculated by Collaborative Filtering
- $\hat{r}^{CBF}_{u,i}$ is the predicted rating calculated by Content-based Filtering
- $w_1$ and $w_2$ are the weights of each algorithm, and $w_1 + w_2 = 1$
Application example
- You can calculate the combined result of Collaborative Filtering and Content-based Filtering using the weighted method
- ex) If the user has a lot of past rating data, more weight is given to Collaborative Filtering; otherwise, more weight is given to Content-based Filtering

Switching Hybrid

Selecting different recommendation algorithms depending on specific situations
How it works
1. When Cold Start problems occur: Use Content-based Filtering
2. When there is sufficient interaction data: Use Collaborative Filtering
Application example
- For newly registered users, provide recommendations that match the user profile using Content-based Filtering
- For existing users, recommend items liked by other users using Collaborative Filtering

Mixed Hybrid

A method that runs multiple recommendation algorithms in parallel and combines their results to generate the final recommendation
How it works
- Run two or more recommendation algorithms in parallel and combine their results to create the final recommendation
  ex) Combine the items recommended by Collaborative Filtering and Content-based Filtering into a single list
Example Formula

$R(u) = \text{Combine}(R_{CF}(u), R_{CBF}(u))$
- $R(u)$ : The final list of items recommended to user $u$
- $R_{CF}(u)$ : The list of items recommended by Collaborative Filtering
- $R_{CBF}(u)$ : The list of items recommended by Content-based Filtering
- Combine: The method of merging the two lists
Application example
- Recommend items based on the user's interaction data using Collaborative Filtering
- Analyze the characteristics of items using Content-based Filtering to provide recommendations
- Combine the results of both algorithms to recommend items similar to those the user already likes

Cascade Hybrid

A method that performs the first filtering with one algorithm and then applies another algorithm to the filtered results to generate the final recommendation
How it works: Perform the first filtering, and then use another algorithm to generate the final recommendation list from the filtered items
Example Formula

$R(u) = \text{CF}( \text{CBF}(u) )$
- $R(u)$ : The final list of items recommended to user $u$
- $CF(CBF(u))$ : Apply Collaborative Filtering to the items filtered by Content-based Filtering
Application example
- Use Content-based Filtering to select candidate items for recommendation
- Use Collaborative Filtering to finalize the recommended items based on interaction data

Feature Augmentation

A method where the results or features from one algorithm are used as inputs to another algorithm
How it works: Use the feature information calculated by one algorithm as input to another algorithm to generate recommendations
Example Formula

$X_{new} = \text{Augment}(X, \text{CBF}(X))$
- $X_{new}$ : Item feature data augmented by the results of Content-based Filtering
- $CBF(X)$ : The result of Content-based Filtering
Application example
- First analyze the user profile and item feature vectors using Content-based Filtering
- Apply that information to Collaborative Filtering to achieve better recommendation results