This repository contains Python code for building a movie recommendation system using collaborative filtering techniques. Below is a breakdown of the files and functionalities included:
-
Clone this repository:
git clone https://github.com/sadegh15khedry/MovieRecommendationSystem.git cd Movie-Recommendation-System-Using-Collaborative-Filtering
-
Install the required libraries using the environment.yml file using conda:
conda env create -f environment.yml
-
Download the movieLens datasets (
movies.csv
,tags.csv
,ratings.csv
) and update the path to them in the code. -
Run the
recommendation_system.ipynb
notebook to generate movie recommendations.
- Load the datasets (
movies.csv
,tags.csv
,ratings.csv
) using pandas. - Select relevant columns (
tag_df
,rating_df
,movie_df
) for further analysis. - Perform exploratory data analysis (EDA) to understand data shapes, missing values, duplicates, and basic statistics.
- Merge
rating_df
andmovie_df
onmovieId
to create a combined DataFrame (df
). - Aggregate ratings to find movies with more than 100 ratings (
agg_df
). - Merge
df
withagg_df
to filter out less popular movies (df_gt100
).
- Create a user-movie matrix (
user_movie_matrix
) usingpivot_table
, where rows represent users, columns represent movies, and values represent ratings.
- Normalize
user_movie_matrix
(matrix_norm
) by subtracting the mean rating of each user. - Calculate cosine similarities (
user_similarity
andmovie_similarity
) based onmatrix_norm
to find similar users and movies.
- Select a user (
picked_userId
) and set up variables (number_of_simlar_users
,user_similarity_threshold
). - Find similar users (
similar_users
) based on a similarity threshold. - Identify movies watched by the selected user (
picked_user_watched
) and similar users (similar_users_movies
). - Calculate item scores (
item_score
) based on weighted sums of ratings from similar users. - Sort and print top recommended movies (
ranked_item_score
) based on their scores.
- The script outputs top recommended movies for a selected user (
picked_userId
) based on collaborative filtering. - Evaluation metrics (e.g., precision, recall) and visualizations (e.g., heatmap of similarity matrices) can be added for performance analysis.
- Implement evaluation metrics to quantify the performance of the recommendation system.
- Optimize code efficiency for larger datasets and real-time recommendations.
- Incorporate content-based filtering or hybrid approaches for improved recommendation accuracy.
- Sadegh Khedry
This project is licensed under the Apache-2.0 License - see the LICENSE.md file for details.