In this repository there is the code I used for the challenge of the Recommender Systems course at Politecnico di Milano.
The goal of the competition was to create the recommender system TV programs by providing 10 recommended products to each target user.
Link to the official website of the challenge
I arrived 2nd in the competition with 83 participating team. My final MAP in the private leaderboard is 0,06110.
The dataset represents the interactions between the users and the items of a streaming platform. The item can have different types and different length (movies, TV series, ...).
A full description of the data is available at the challenge webpage. The main source file is interactions_and_impressions.csv
which contains the interactions of each user with the items, for example:
UserID | ItemID | Impressions | Data |
---|---|---|---|
0 | 11 | 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 | 1 |
0 | 21 | 0 | |
0 | 21 | 0 | |
0 | 21 | 20,21,22,23,24,25,26,27,28,29 | 0 |
Where:
Data
is0
if the user watched the item,1
if the user opened the item details page.Impressions
: string containing the items that were present on the screen when the user interacted with the item in columnItemID
. Not all interactions have a corresponding impressions list.
The recommender architecture is roughly the following:
The main body of the recommender is a linear hybrid which composes the item weights of its base recommenders (SLIM elastic net, item KNN, EASE_R, RP3beta, iALS).
The key point is that each base recommender uses a different tuned URM. The URM are composed taking into account the number of views and the number of "opening the details page" each user-item pair has, using the following formula:
Where the weights
Once the best URMs have been found the models are composed in a linear hybrid and their weights found by hyperparameters tuning.
The last step is to include impressions. This is done using the impression discounting technique in which each item which was recommended but not chosen by the user is penalized; this has been implemented multiplying the score by the exponential of the impressions:
- Python 3.8
- Poetry
- Cython configured (C compiler)
- Install packages with Poetry
- In the Poetry environment compile the Cython modules with
python run_compile_all_cython.py
To train and tune the recommender the following file have to be run sequentially:
1_tune_base_recommender_{RECOMMENDER_NAME}.py
: trains and tune the base recommenders2_tune_URM_{RECOMMENDER_NAME}.py
: tune the URM of each recommender3_tune_linear_hybrid.py
: tune the linear hybrid of the base recommenders4_tune_impression_discounting.py
: tune the parameters of the impression discounting step5_train_final_model_all_data.py
: train the final model with all the data and create submission
The hyperparameters tuning was done using a CCX31 Hetzner cloud instance which has 8 dedicated vCPU and 32 GB of RAM.
The code in this repository is inspired by MaurizioFD/RecSys_Course_AT_PoliMi, a repository used during the Recommender Systems course at Politecnico di Milano.