ETL using application streaming and creating a Data Lake
-
Updated
Apr 7, 2023 - Jupyter Notebook
ETL using application streaming and creating a Data Lake
This project outlines the final project requirements for DAV6100 - Information Architectures, focusing on group assignments, scoring criteria, topic selection, core requirements, and project components such as design, development, visualization, and executive presentation.
Used AWS Glue to perform ETL operations and load resultant data to AWS Redshift. In the second phase used AWS CloudWatch rules and LAMBDA to automatically run GLUE Jobs
This is a data pipeline built with the purpose of serving a business team.
O projeto foi elaborado com o objetivo de estabelecer uma arquitetura na AWS, originada a partir de uma migração de um banco de dados existente em um ambiente local (on-premise).
This Project aims to automate the process of infrastructure creation.
This project aims to analyze the popularity of YouTube content across different regions by leveraging datasets sourced from Kaggle. It employs a systematic approach to data preprocessing, cleaning, and analysis using various AWS (Amazon Web Services) services including S3, Lambda, Glue, and others, to build an automated ETL pipeline.
Terraform configuration that creates several AWS services, uploads data in S3 and starts the Glue Crawler and Glue Job.
Glue Data Quality Example - Deploy to your AWS Account w/ Terraform to test
Data Engineer project using Python and some AWS data services
Terraform module to create and manage a AWS Glue job
Terraform module which creates Glue Job resources on AWS.
The Sensitive Data Protection on AWS solution allows enterprise customers to create data catalogs, discover, protect, and visualize sensitive data across multiple AWS accounts. The solution eliminates the need for manual tagging to track sensitive data such as Personal Identifiable Information (PII) and classified information.
Extract, transform, and load data for analytic processing using AWS Glue
Build and deploy a serverless data pipeline on AWS with no effort.
Glue scripts for converting AWS Service Logs for use in Athena
Add a description, image, and links to the glue-job topic page so that developers can more easily learn about it.
To associate your repository with the glue-job topic, visit your repo's landing page and select "manage topics."