Skip to content

A project to create a Hive data warehouse for E-commerce in AWS and perform data analysis.

Notifications You must be signed in to change notification settings

shivananda199/hive-analytics-in-aws-for-e-commerce

Repository files navigation

hive-analytics-in-aws-for-e-commerce

Prerequisites

Few things you need to have before starting the project:

  • Basic understanding of AWS services and creation of an EC2 instance on AWS
  • Good knowledge of Hive, Spark, SQL, shell scripting and working knowledge of Docker
  • Strong understanding of databases, relational modeling and a zeal to learn :)

Project Motivation

The main motive behind the project is to understand performing Hive analysis in Docker containers running on an AWS EC2 instance. In process of doing so, you will learn about how to create an AWS EC2 instance, setting up docker containers on that instance, loading data from a local file to AWS using CLI, data ingestion/transformation using Sqoop, Hive, and Spark, data warehousing, and performing data analysis using Hive queries.


Dataset

To run any sort of analytics, we need data. The dataset that we will be using to perform Hive analysis in this project is AdventureWorks dataset. AdventureWorks database supports standard online transaction processing scenarios for a fictitious bicycle manufacturer - Adventure Works Cycles. Various components of the database include Manufacturing, Sales, Purchasing, Product Management, Contact Management, and Human Resources. The complete schema of the database can be found here. We will be concentrating mainly on AdventureWorks Sales and Customer Demographics data in this project.


Problem Statement

Perform data analysis on AdventureWorks Sales and Customer Demographics data in Hive and answer the following:

  • find the upper and lower discount limits offered for any product
  • find the top 10 customers with highest contribution to sales
  • purchase pattern of customers based on salary, education, and gender
  • sales contribution percentage of each customer and sales contribution percentage based on gender and salary
  • identify the top performing territory based on sales
  • find the territory-wise sales and adherence to their defined sales quota

Project Work

Below steps document the work done in order to solve the problem statement.

  • Project Setup in AWS - Refer here
  • Creating Hive data warehouse - Refer here
  • Performing Hive Analytics - Refer here

About

A project to create a Hive data warehouse for E-commerce in AWS and perform data analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published