Skip to content

Music Recommendation by An Exploratory data analysis and Data visualization project using Spotify Data by Python, Numpy, Pandas, Seaborn, SQL

Notifications You must be signed in to change notification settings

Gokul-Raja84/Spotify-Music-Recommendation-and-Data-Analysis

Repository files navigation

Spotify Spotify - Music Recommendation System and Data Analysis

About the Project

Spotify is a Swedish audio streaming and media services provider founded in April 2006. It is the world's largest music streaming service provider and has over 381 million monthly active users, which also includes 172 million paid subscribers.

Spotify logo


Spotify Music Recommendation System

This project presents a Music Recommendation System built using the Spotify dataset. The system leverages advanced data analysis and machine learning techniques to recommend songs based on user preferences.

Introduction

The Spotify Music Recommendation System aims to recommend songs that align with a user's listening history. By analyzing patterns in song popularity, audio attributes, and genres, the system can suggest songs that are similar to those the user already enjoys. This project highlights the power of data analysis and machine learning in creating personalized user experiences.

Features

  • Data Extraction: Uses Spotipy to fetch song data from the Spotify Web API.
  • Exploratory Data Analysis (EDA): Identifies key features and patterns in the Spotify dataset.
  • Feature Engineering: Selects relevant features to build an accurate recommendation model.
  • Recommendation System: Recommends songs based on user-input songs using cosine similarity.

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/spotify-music-recommendation.git
    cd spotify-music-recommendation
  2. Install the required libraries:

    pip install -r requirements.txt
  3. Set up Spotify API credentials:

    • Create an app on the Spotify Developer's page.
    • Save your Client ID and Secret Key.
    • Set the environment variables:
      export SPOTIFY_CLIENT_ID='your_client_id'
      export SPOTIFY_CLIENT_SECRET='your_client_secret'

Usage

  1. Import necessary libraries:

    import spotipy
    from spotipy.oauth2 import SpotifyClientCredentials
    import pandas as pd
    import numpy as np
    from collections import defaultdict
    from sklearn.metrics import euclidean_distances
    from scipy.spatial.distance import cdist
    import difflib
    import os
  2. Authenticate and initialize Spotipy:

    sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=os.environ["SPOTIFY_CLIENT_ID"],
                                                               client_secret=os.environ["SPOTIFY_CLIENT_SECRET"]))
  3. Define functions to fetch song data and calculate recommendations (full code provided in the repository).

  4. Get song recommendations:

    recommended_songs = recommend_songs([
        {'name': 'Come As You Are', 'year': 1991},
        {'name': 'Smells Like Teen Spirit', 'year': 1991},
        {'name': 'Lithium', 'year': 1992},
        {'name': 'All Apologies', 'year': 1993},
    

Objective

  • Leverage Spotify's Rich Dataset for Personalized Music Recommendations: The primary objective of this project is to utilize the extensive dataset provided by Spotify to develop a sophisticated Music Recommendation System. By conducting an in-depth Exploratory Data Analysis (EDA), we aim to uncover patterns and relationships within the data that can be used to recommend songs tailored to individual user preferences. This involves identifying key audio features and metadata that influence music popularity and listener engagement.

Music Recommendation System Explanation

  • Building a Data-Driven Music Recommendation Engine: The Music Recommendation System is designed to suggest songs based on user input, leveraging advanced data analysis and machine learning techniques. By utilizing Spotipy, a Python client for the Spotify Web API, the system fetches detailed song data, including audio features and metadata. The system calculates the cosine similarity between the mean vector of user-input songs and other songs in the dataset, effectively identifying and recommending songs with similar audio characteristics and metadata. This approach ensures that the recommendations are highly relevant and personalized, reflecting the user's music taste and preferences.

Exploratory data analysis (EDA) and Data Visualization

The project entails an engaging and insightful exploratory data analysis (EDA) and data visualization initiative, centered around the rich and extensive dataset sourced from Spotify. Powered by the versatile programming language Python and harnessed by statistical methodologies, this endeavor delves into uncovering the hidden gems within the realm of music streaming.

  • The primary objective of this project revolves around distilling valuable insights from the vast musical catalog on Spotify. Utilizing a range of cutting-edge Python libraries, including Pandas for data manipulation, NumPy for numerical computations, Matplotlib for comprehensive data visualization, and Seaborn for enhancing the visual aesthetics, the project embarks on a journey to discern patterns, relationships, and trends within the data.

  • The project's scope is wide-ranging and captivating. The analysis begins by identifying the top 10 most and least popular songs on Spotify, unearthing the spectrum of musical tastes and preferences. Through the utilization of a correlation heatmap, the intricate relationships between various audio features are unveiled, shedding light on the nuances of musical compositions.

  • Further exploration is conducted through regression plots, where correlations between specific audio attributes are closely examined. The analysis delves into the connection between loudness and energy, as well as the interplay between popularity and acousticness, unearthing underlying trends that drive listeners' engagement.

The project then pivots to investigate the temporal dimension, visualizing the distribution of songs on Spotify since 1992. The change in song duration over the years is meticulously charted, revealing shifts and trends in musical composition styles. The analysis extends to dissecting song duration across different genres, providing a captivating narrative of how genres evolve and adapt over time.

  • Perhaps most intriguingly, the project unveils the top five genres by popularity, painting a vivid picture of musical trends that captivate audiences globally. This segment of the analysis sheds light on the dynamic landscape of music consumption and how certain genres maintain an enduring allure.

Spotify music collage

  • In conclusion, this exploratory data analysis project offers a multifaceted and captivating journey into the world of music through the lens of data. Leveraging Python's prowess and an array of sophisticated libraries, the project's in-depth examination of Spotify's dataset empowers us to decipher the intricate tapestry of musical trends, preferences, and influences that have shaped the sonic landscape over the years. Through its multifarious visualizations and comprehensive insights, this project resonates as an ode to the intersection of technology and artistry, offering a harmonious symphony of data-driven exploration.

Objective

  1. Top 10 most popular songs on Spotify

top 10

Image is for reference only

  1. Top 10 least popular songs on Spotify

  2. Correlation Heatmap between Variable

Correlation Heatmap between Variable

  1. Regression plot - Correlation between Loudness and Energy

Regression plot - Correlation between Loudness and Energy

  1. Regression plot - Correlation between Popularity and Acousticness Regression plot - Correlation between Popularity and Acousticness

  2. Distibution plot - Visualize total number of songs on Spotify since 1992 Distibution plot - Visualize total number of songs on Spotify since 1992

  3. Change in Duration of songs wrt Years Change in Duration of songs wrt Years

  4. Duration of songs in different Genres

Duration of songs in different Genres

  1. Top 5 Genres by Popularity Top 5 Genres by Popularity

Technologies used ⚙️

Python

Statistics

K-Means

Python Libraries :

Kaggle Spotify Datasets: Spotify Tracks and Artists
Spotify Tracks and Artists

Acknowledgments :

<> The project structure and code implementation were guided by various online resources and tutorials.

<> Feel free to explore the project, modify it according to your needs, and experiment with different approaches to improve the data analysis and data visualization.

About

Music Recommendation by An Exploratory data analysis and Data visualization project using Spotify Data by Python, Numpy, Pandas, Seaborn, SQL

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published