This project focuses on enhancing a raw dataset of songs by extracting detailed musical features, popularity metrics, and genre information using Spotify's URI data. The enriched dataset facilitates deeper analysis and the development of a song recommendation system based on various attributes of the songs.
- Project Overview
- Setup and Installation
- Data Flow Diagram
- Feature Extraction
- Data Processing
- Recommendation Algorithm
- Usage
- Contributing
- License
The project automates the process of fetching detailed song features using the Spotify API and integrates these features into the original dataset. The key steps include:
- Initial data loading from a CSV file.
- Parallel feature extraction from Spotify URIs.
- Data cleaning and preprocessing.
- Sentiment analysis on song titles.
- Transformation and scaling of features.
- Generation of song recommendations based on user's playlist.
# Clone the repository
git clone https://github.com/your-username/song-feature-extraction.git
# Navigate to the project directory
cd song-feature-extraction
# Install dependencies (Assuming Python 3 and pip are installed)
pip install -r requirements.txt
graph LR
A[Raw Dataset] -->|Extract URIs| B(URI to Features)
B --> C{Parallel Processing}
C -->|Feature Extraction| D[Enriched Dataset]
D --> E[Data Preprocessing]
E --> F[Sentiment Analysis]
F --> G[Feature Transformation & Scaling]
G --> H[Feature Set Creation]
H --> I[Generate Recommendations]
- Utilizes
uri.py
script to extract features from Spotify URIs. - Features include musical attributes, popularity scores, and genres.
- Implements parallel processing for efficient data handling.
- Cleans and preprocesses the data for analysis.
- Applies sentiment analysis to track names.
- Transforms genre data using TF-IDF.
- Scales numerical features for uniformity.
- Summarizes user playlists into a feature vector.
- Calculates cosine similarity between playlist vector and non-playlist songs.
- Recommends songs with the highest similarity scores.
This project is structured around a Jupyter Notebook, which provides a detailed walkthrough of the data processing, feature extraction, and recommendation generation steps. To use this notebook:
- Ensure you have Jupyter installed. If not, you can install it using pip:
pip install notebook
- Start the Jupyter Notebook server from the command line:
jupyter notebook
-
Navigate to the project directory in the Jupyter Notebook web interface.
-
Open the
Song_Classification_and_Recommendation_System.ipynb
notebook. -
Run the cells sequentially to perform data loading, feature extraction, data processing, and to generate recommendations.
- Each cell in the notebook is annotated with comments to guide you through the process and explain the purpose of each code block.
- To execute a cell, select it and press
Shift + Enter
.
- Make sure all the required dependencies are installed (refer to the Setup and Installation section).
- Ensure you have access to the Spotify API and the necessary API keys if the feature extraction script requires it.
This project is licensed under the Apache License - see the LICENSE.md
file for details.