Skip to content

uche-madu/podcast_scraping_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Web Scraping Pipeline Orchestrated With Airflow

This project creates a data pipeline that scraps podcast data into a Google Cloud SQL-managed Postgresql database. The Airflow-orchestrated pipeline also uploads the audio files of each podcast episode into a Google Cloud Storage bucket.

GCP resources are provisioned using Terraform.

Releases

No releases published

Packages