Skip to content

theking-leo/Movie_Rec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Code Explanation

  1. Data Loading and Preprocessing

    Code Snippet:

    df1 = pd.read_csv("movies.csv") df2 = pd.read_csv("ratings.csv") combine_df = pd.concat([df1, df2], ignore_index=True)

    Explanation: The code reads two CSV files (movies.csv and ratings.csv) into Pandas DataFrames. It combines both DataFrames into a single DataFrame (combine_df) using pd.concat.

  2. Data Exploration and Cleaning

    Code Snippet:

    combine_df.describe() combine_df.info() combine_df.drop_duplicates() combine_df['userId'].fillna(-1, inplace=True) combine_df['rating'].fillna(0, inplace=True)

    Explanation: describe() and info() provide statistical information and data type details. drop_duplicates() removes duplicate rows if any. fillna() is used to handle missing values in the 'userId' and 'rating' columns.

  3. Popularity-based Recommender

    Code Snippet:

    popularity_df = combine_df.groupby('movieId').agg( {'userId': 'count', 'rating': 'mean'}).reset_index() popularity_df.rename( columns={'userId': 'rating_count', 'rating': 'average_rating'}, inplace=True) popularity_df = popularity_df.sort_values(by='rating_count', ascending=False)

    Explanation: The code calculates the popularity of movies based on the number of ratings and average rating. groupby() groups the data by movieId. agg() calculates count and mean of userId and rating, respectively. The DataFrame is sorted by 'rating_count' in descending order.

  4. User-Based Collaborative Filtering Recommender

    Code Snippet:

    user_item_matrix = combine_df.pivot_table( index='userId', columns='movieId', values='rating', fill_value=0) user_similarity = cosine_similarity(user_item_matrix) user_similarity_df = pd.DataFrame( user_similarity, index=user_item_matrix.index, columns=user_item_matrix.index)

    Explanation: The code creates a user-item matrix using pivot_table. cosine_similarity is used to calculate similarity between users. The similarity matrix is stored in user_similarity_df.

  5. Streamlit UI

    Code Snippet:

    import streamlit as st

    def main(): st.title('Movie Recommender') st.write('Welcome to the Movie Recommender App!') # ... (UI code)

    Explanation: Streamlit is used to create a web application. The main() function sets up the main interface using st.title and st.write.

  6. Collaborative Filtering Recommender using Surprise

    Code Snippet:

    from surprise import Dataset, Reader, KNNBasic from surprise.model_selection import train_test_split

    reader = Reader(rating_scale=(1, 5)) data = Dataset.load_from_df(df2[['userId', 'movieId', 'rating']], reader) trainset, _ = train_test_split(data, test_size=0.2)

    Explanation: The Surprise library is used for collaborative filtering. A Surprise Reader is created, and data is loaded using Dataset. train_test_split is used to split the data into training and testing sets.

  7. Running the Streamlit App

    Code Snippet:

    if name == "main": main()

    Explanation: The main application is run when the script is executed.

Feel free to customize the explanation based on your project's specific details and add any additional context that might be helpful for users.