Shivani Manivasagan, Min Jegal, Alba Arribas Cervan, Ana Real Terradez
This project implements and evaluates several recommendation systems for Spotify playlists. We developed a content-based recommendation system using song lyrics and a user-based collaborative filtering system based on playlist overlap. The dataset created includes ~10,000 tracks from 100 playlists, preprocessed and vectorized using Bag-of-Words (BoW) and GloVe embeddings.
- Dataset Creation
- Text Preprocessing and Vectorization
- Recommendation Systems
- Dashboard
- Acknowledgements
-
Criteria:
- Overlapping tracks across playlists.
- Lyrics for each track.
- At least 10,000 tracks.
-
Method:
- Used the Million Playlist Dataset and the LRCLIB API for lyrics.
- Filtered playlists to meet criteria, resulting in 79 playlists with a mean of 123 tracks each.
-
Exploratory Analysis:
- Final dataset: 9720 tracks, 79 playlists.
- Issues: Low overlap between playlists and high token variance in lyrics.
-
Text Preprocessing Pipeline:
- Tokenization, homogenization, cleaning (using NLTK).
-
Corpus and Dictionary Filtering:
- Removed infrequent and extremely frequent tokens.
- Analyzed unique token occurrences and implemented N-gram detection.
-
Vectorization Techniques:
- BoW for LDA topic modeling, finding 4 optimal topics.
- GloVe for word embeddings, used in clustering and similarity calculations.
-
Content-Based Recommender System:
- Used cosine similarity of GloVe embeddings.
- Low accuracy due to data characteristics and lack of overlap.
-
User-Based Collaborative Filtering System:
- Sparse matrix creation, playlist similarity metrics using cosine distance.
- K-Nearest Neighbors for recommendations.
- Multiple optimizations for recommendation accuracy, including weighted frequency and exclusion of test playlists.
-
Performance Comparison:
- Evaluated using accuracy of obscured track hits.
- Collaborative filtering outperformed content-based recommendations.
- Developed using Python Dash.
- Features:
-
Professors:
- Carlos Sevilla Salcedo
- Jerónimo Arenas García
- Vanessa Gómez Verdejo
-
External Resources:
- Various academic papers and GitHub repositories on music recommendation systems and machine learning techniques.