Recommendation System

been_29·2024년 9월 12일
post-thumbnail

💡 Recommendation System

A machine learning-based technique that provides optimal suggestions based on user preferences, behavior, and past interaction data


🎨 Collaborative Filtering

Recommendations generated based on user interactions

User-based Collaborative Filtering

  • Definition : Based on the assumption that 'users with similar preferences are likely to like similar items,' the algorithm recommends new items by utilizing the item preferences of similar users

  • How it works

    1. User-Item Interaction Matrix generation : Create a User-Item Interaction Matrix based on the ratings or interaction data that each user gives to each item

      [r1,1r1,2r1,3r1,nr2,1r2,2r2,3r2,nrm,1rm,2rm,3rm,n]\begin{bmatrix} r_{1,1} & r_{1,2} & r_{1,3} & \cdots & r_{1,n} \\ r_{2,1} & r_{2,2} & r_{2,3} & \cdots & r_{2,n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ r_{m,1} & r_{m,2} & r_{m,3} & \cdots & r_{m,n} \end{bmatrix}
      • Here, ri,jr_{i,j} is the rating user ii gave to item jj
    2. Calculate Similarity : Calculate the similarity between users, usaually using Cosine Similarity or Pearson Correlation

      • Cosine Similarity formula
      sim(u,v)=iru,irv,iiru,i2irv,i2\text{sim}(u, v) = \frac{\sum_{i} r_{u,i} \cdot r_{v,i}}{\sqrt{\sum_{i} r_{u,i}^2} \cdot \sqrt{\sum_{i} r_{v,i}^2}}
      • Here, ru,ir_{u,i} is the rating user uu gave to item ii
      • Pearson Correlation formula
      sim(u,v)=i(ru,iruˉ)(rv,irvˉ)i(ru,iruˉ)2i(rv,irvˉ)2\text{sim}(u, v) = \frac{\sum_{i} (r_{u,i} - \bar{r_u})(r_{v,i} - \bar{r_v})}{\sqrt{\sum_{i} (r_{u,i} - \bar{r_u})^2} \cdot \sqrt{\sum_{i} (r_{v,i} - \bar{r_v})^2}}
      • Here, rˉu\bar{r}_u is the average rating given by user uu
    3. Selecting the most similar users : Based on the calculated similarity values, the most similar users are selected using the K-NN method

    4. Prediction and recommendation generation : Based on the ratings from the similar users, the predicted rating for items that the current user hasn't rated yet is calculated

    5. Recommendation generation : Recommend the items with the highest predicted ratings

  • Code Example

    • Python code for a recommendation system using User-based Collaborative Filtering
    • Cosine Similarity is used to calculate the similarity between users, and items are recommended based on the ratings of similar users
    • Calculate items to recommend to user 0 and recommends the items with the highest predicted ratings among the ones user 0 hasn’t rated yet
    import numpy as np
     from sklearn.metrics.pairwise import cosine_similarity
    
     # User-Item Rating Matrix
     user_item_matrix = np.array([[5, 3, 0, 1],
                                 [4, 0, 0, 1],
                                 [1, 1, 0, 5],
                                 [1, 0, 0, 4],
                                 [0, 1, 5, 4]])
    
     # Calculate similarity between users (Cosine Similarity)
    user_similarity = cosine_similarity(user_item_matrix)
    
     # User-based Collaborative Filtering function
    def recommend(user_id, user_similarity, user_item_matrix, k=2):
         # Select users with high similarity
        similar_users = np.argsort(-user_similarity[user_id])[:k]
        # Ratings of similar users
        similar_users_ratings = user_item_matrix[similar_users]
        # Predict ratings by averaging the ratings of similar users
        predicted_ratings = similar_users_ratings.mean(axis=0)
       
        # Recommend items that the user has not rated yet, with the highest predicted ratings
        unrated_items = np.where(user_item_matrix[user_id] == 0)[0]
        recommendations = np.argsort(-predicted_ratings[unrated_items])
        return unrated_items[recommendations]
    
     # Calculate recommended items for user 0
    recommendations = recommend(user_id=0, user_similarity=user_similarity,    user_item_matrix=user_item_matrix)
    print(f"User 0's recommended items: {recommendations}")

Item-based Collaborative Filtering

  • Definition : Based on the assumption that 'users who like similar items are likely to like other similar items,' the algorithm calculates the similarity between items and makes recommendations based on that similarity
  • How it works
    1. User-Item Interaction Matrix generation : Create a User-Item Interaction Matrix based on the ratings that users have given to each item
    2. Calculating item similarity : Mainly uses Cosine Similarity and Pearson Correlation to calculate the similarity between items
    3. Selecting the most similar items : Find items similar to those the user likes based on the calculated similarity values
    4. Prediction and recommendation generation : For items the user hasn’t rated yet, predict the ratings using the ratings of similar items
  • Overall, it's similar to User-based Collaborative Filtering, but focuses on items instead of users






🎨 Content-Based Filtering

Finding items with similar features by analyzing the user's past behavior and preference information

  • How it works

    1. Feature extraction of items: Extract the attributes of an item and represent them as a 'Feature Vector'
      ex) For a movie, attributes like genre, director, and cast can be converted into a vector

      Xi=[xi1,xi2,,xin]X_i = [x_{i1}, x_{i2}, \dots, x_{in}]
      • Here, XiX_i is the feature vector of item ii, and each feature xi1x_{i1} represents a specific attribute of the item
      • ex) xi1x_{i1} is the genre, xi2x_{i2} is the director, and xi3x_{i3} is the cast
    2. Creating a user profile: Generate a user profile vector that reflects the user's preferences by combining the Feature Vectors of items they have rated

    3. Calculating similarity between items and users: Calculate the similarity between the generated user profile vector and the feature vector of the item to recommend - usually using cosine similarity

      • Cosine Similarity is calculated by dividing the dot product of two vectors by the product of their magnitudes; the closer the result is to 1, the more similar the two vectors are
    4. Generating recommendations: Based on the calculated similarity, recommend the most similar items that the user has not rated yet

  • Code Example

    • Recommend similar movies based on the user's movie preferences using the Content-based Filtering method

      import numpy as np
      from sklearn.metrics.pairwise import cosine_similarity
      
      # The feature vector of a movie (e.g., [genre, director, cast])
      movie_features = np.array([[1, 1, 0],  # Movie 1
                                 [0, 1, 1],  # Movie 2
                                 [1, 0, 1],  # Movie 3
                                 [0, 0, 1]]) # Movie 4
      
      # The indices of movies rated by the user (e.g., the user likes Movie 1 and Movie 3)
      user_ratings = np.array([5, 0, 4, 0])  # Ratings for movies rated by the user
      
      # User profile vector calculation (weighted sum of the feature vectors of rated movies)
      user_profile = user_ratings @ movie_features
      
      # Calculation of similarity between the user profile and the feature vectors of each movie
      similarities = cosine_similarity([user_profile], movie_features)[0]
      
      # Recommend movies with high similarity from those the user has not rated yet
      unrated_movies = np.where(user_ratings == 0)[0]
      recommended_movies = np.argsort(-similarities[unrated_movies])
      
      print(f"Recommended movies: {unrated_movies[recommended_movies]}")

User Profile Vector

  • Process of generating the user profile vector

    1. Prepare the Feature Vector of items: A vector that quantifies the attributes of items
      ex) Assume each movie has the following attributes

      ItemGenreDirectorCast
      Movie A110
      Movie B011
      Movie C101
    • The feature vectors of each item can be expressed as:
      XA=[1,1,0],XB=[0,1,1],XC=[1,0,1]X_A = [1, 1, 0], \quad X_B = [0, 1, 1], \quad X_C = [1, 0, 1]
    1. Prepare user rating information: Information representing how much a user prefers each item
      ex) Assume the user rated the following three movies as follows:

      ItemRating
      Movie A5
      Movie B0
      Movie C4
    2. Calculate the user profile vector: Multiply the feature vector of each item by the user’s rating, sum them up, and normalize

    • The user profile vector UuU_u is the weighted average of the feature vectors of the items rated by the user

      Uu=1NuiIuru,iXiU_u = \frac{1}{N_u} \sum_{i \in I_u} r_{u,i} \cdot X_i
      • UuU_u is the user uu's profile vector
      • NuN_u is the number of items rated by the user
      • IuI_u is the set of items rated by the user
      • ru,ir_{u,i} is the rating that user uu gave to item ii
      • XiX_i is the feature vector of item ii
  • Concrete example

    1. Prepare the feature vectors of items: The feature vectors for movies A, B, and C are as follows:

      XA=[1,1,0],XB=[0,1,1],XC=[1,0,1]X_A = [1, 1, 0], \quad X_B = [0, 1, 1], \quad X_C = [1, 0, 1]
    2. Prepare user rating information: The ratings the user gave to movies A, B, and C are as follows:

      ru,A=5,ru,B=0,ru,C=4r_{u,A} = 5, \quad r_{u,B} = 0, \quad r_{u,C} = 4
    3. Calculate the user profile vector

      • Calculate the user profile vector:

        Uu=1Nu(ru,AXA+ru,BXB+ru,CXC)U_u = \frac{1}{N_u} \left( r_{u,A} \cdot X_A + r_{u,B} \cdot X_B + r_{u,C} \cdot X_C \right)
      • Here, Nu=2N_u=2 (the number of items rated by the user), and the feature vector of each item is multiplied by the user's rating:

        Uu=12(5[1,1,0]+0[0,1,1]+4[1,0,1])U_u = \frac{1}{2} \left( 5 \cdot [1, 1, 0] + 0 \cdot [0, 1, 1] + 4 \cdot [1, 0, 1] \right)
      • After calculating:

        Uu=12([5,5,0]+[0,0,0]+[4,0,4])Uu=12[9,5,4]Uu=[4.5,2.5,2.0]U_u = \frac{1}{2} \left( [5, 5, 0] + [0, 0, 0] + [4, 0, 4] \right) U_u = \frac{1}{2} [9, 5, 4] U_u = [4.5, 2.5, 2.0]
      • Therefore, this user’s profile vector is [4.5,2.5,2.0][4.5, 2.5, 2.0]

  • Code Example

import numpy as np

 # Item feature vectors (e.g., [Genre, Director, Cast])
 item_features = np.array([[1, 1, 0],  # Movie A
                           [0, 1, 1],  # Movie B
                           [1, 0, 1]]) # Movie C

 # Ratings given by the user
 user_ratings = np.array([5, 0, 4])  # Ratings for Movie A, Movie B, Movie C

 # Calculate user profile vector (weighted sum of the feature vectors of rated items)
 user_profile = np.dot(user_ratings, item_features) / np.count_nonzero(user_ratings)
 
 print(f"User profile vector: {user_profile}")






🎨 Hybrid Methods

A method that combines multiple recommendation algorithms to complement the weaknesses of individual algorithms and improve the performance and diversity of the recommendation system

Weighted Hybrid

  • A method that generates the final recommendation by assigning weights to the predicted scores of each algorithm

  • How it works: Calculate the result of each recommendation algorithm, and apply weights to those results to compute the final rating

  • Example formula (Combining Collaborative Filtering and Content-based Filtering)

    r^u,i=w1r^u,iCF+w2r^u,iCBF\hat{r}_{u,i} = w_1 \cdot \hat{r}_{u,i}^{CF} + w_2 \cdot \hat{r}_{u,i}^{CBF}
    • r^u,i\hat{r}_{u,i} is the predicted rating that user uu will give to item ii
    • r^u,iCF\hat{r}^{CF}_{u,i} is the predicted rating calculated by Collaborative Filtering
    • r^u,iCBF\hat{r}^{CBF}_{u,i} is the predicted rating calculated by Content-based Filtering
    • w1w_1 and w2w_2 are the weights of each algorithm, and w1+w2=1w_1 + w_2 = 1
  • Application example

    • You can calculate the combined result of Collaborative Filtering and Content-based Filtering using the weighted method
    • ex) If the user has a lot of past rating data, more weight is given to Collaborative Filtering; otherwise, more weight is given to Content-based Filtering

Switching Hybrid

  • Selecting different recommendation algorithms depending on specific situations
  • How it works
    1. When Cold Start problems occur: Use Content-based Filtering
    2. When there is sufficient interaction data: Use Collaborative Filtering
  • Application example
    • For newly registered users, provide recommendations that match the user profile using Content-based Filtering
    • For existing users, recommend items liked by other users using Collaborative Filtering

Mixed Hybrid

  • A method that runs multiple recommendation algorithms in parallel and combines their results to generate the final recommendation

  • How it works

    • Run two or more recommendation algorithms in parallel and combine their results to create the final recommendation
      ex) Combine the items recommended by Collaborative Filtering and Content-based Filtering into a single list
  • Example Formula

    R(u)=Combine(RCF(u),RCBF(u))R(u) = \text{Combine}(R_{CF}(u), R_{CBF}(u))
    • R(u)R(u): The final list of items recommended to user uu
    • RCF(u)R_{CF}(u): The list of items recommended by Collaborative Filtering
    • RCBF(u)R_{CBF}(u): The list of items recommended by Content-based Filtering
    • Combine: The method of merging the two lists
  • Application example

    • Recommend items based on the user's interaction data using Collaborative Filtering
    • Analyze the characteristics of items using Content-based Filtering to provide recommendations
    • Combine the results of both algorithms to recommend items similar to those the user already likes

Cascade Hybrid

  • A method that performs the first filtering with one algorithm and then applies another algorithm to the filtered results to generate the final recommendation

  • How it works: Perform the first filtering, and then use another algorithm to generate the final recommendation list from the filtered items

  • Example Formula

    R(u)=CF(CBF(u))R(u) = \text{CF}( \text{CBF}(u) )
    • R(u)R(u): The final list of items recommended to user uu
    • CF(CBF(u))CF(CBF(u)): Apply Collaborative Filtering to the items filtered by Content-based Filtering
  • Application example

    • Use Content-based Filtering to select candidate items for recommendation
    • Use Collaborative Filtering to finalize the recommended items based on interaction data

Feature Augmentation

  • A method where the results or features from one algorithm are used as inputs to another algorithm

  • How it works: Use the feature information calculated by one algorithm as input to another algorithm to generate recommendations

  • Example Formula

    Xnew=Augment(X,CBF(X))X_{new} = \text{Augment}(X, \text{CBF}(X))
    • XnewX_{new}: Item feature data augmented by the results of Content-based Filtering
    • CBF(X)CBF(X): The result of Content-based Filtering
  • Application example

    • First analyze the user profile and item feature vectors using Content-based Filtering
    • Apply that information to Collaborative Filtering to achieve better recommendation results
profile
Data Analysis

0개의 댓글