Skip to content

This project aims to analyze the popularity of YouTube content across different regions by leveraging datasets sourced from Kaggle. It employs a systematic approach to data preprocessing, cleaning, and analysis using various AWS (Amazon Web Services) services including S3, Lambda, Glue, and others, to build an automated ETL pipeline.

Notifications You must be signed in to change notification settings

NSVpriya/Youtube_Data_ETL_Project

Repository files navigation

Youtube_data_analytics_project

This project aims to analyze the popularity of YouTube content across different regions by leveraging datasets sourced from Kaggle. It employs a systematic approach to data preprocessing, cleaning, and analysis using various AWS (Amazon Web Services) services including S3, Lambda, Glue, and others, to build an automated ETL pipeline.

Objective:

The primary objective of this project is to provide insights into the most popular YouTube content in different regions through robust data processing and analysis techniques using Microsoft powerbi.

Solution Approach:

Dataset :

The below is the link for Kaggle dataset contains statistics (CSV files) on daily popular YouTube videos over the course of many months. https://www.kaggle.com/datasets/datasnaek/youtube-new

1.Data Collection: Utilizes datasets sourced from Kaggle, ensuring access to comprehensive and relevant YouTube data.

2.Automated Data Cleaning: Implements Lambda functions for efficient data cleaning and preprocessing, ensuring high-quality data for analysis.

3.ETL Pipeline: Constructs a streamlined ETL pipeline utilizing AWS Glue for seamless data extraction, transformation, and loading.

4.Analysis: Provides detailed reports and visualizations for analyzing the popularity of YouTube content across various content and regions.

Technologies Used:

Amazon S3:

Storage for raw and processed data.

AWS Lambda:

Serverless computing for data preprocessing tasks.

AWS Glue:

Managed ETL service for data integration and transformation.

Amazon Athena:

Interactive query service for analyzing data in S3 using standard SQL.

Vizualization(Microsoft PowerBI) :

Business intelligence platform for data visualization and analytics.

Expected Outcome:

1.Identify Trends:Discover popular content using views.

2.Regional Comparisons: Compare the popularity of YouTube content among different regions to understand regional preferences and trends.

3.Audience Engagement: Analyze audience engagement metrics such as views, likes, and comments to know the impact and reception of different types of content.

About

This project aims to analyze the popularity of YouTube content across different regions by leveraging datasets sourced from Kaggle. It employs a systematic approach to data preprocessing, cleaning, and analysis using various AWS (Amazon Web Services) services including S3, Lambda, Glue, and others, to build an automated ETL pipeline.

Topics

Resources

Stars

Watchers

Forks