Skip to content

Data Mining Techniques used in different Datasets. Market Segmentation using Clustering Analysis Further, using Random Forest Model, CART and Artificial Neural Network to make predictions on data.

Notifications You must be signed in to change notification settings

Honey28Git/Data-Mining

Repository files navigation

Data-Mining

K-means Clustering, Dendogram, Random Forest Classifier Problem 1: Clustering

A leading bank wants to develop a customer segmentation to give promotional offers to its customers. They collected a sample that summarizes the activities of users during the past few months. You are given the task to identify the segments based on credit card usage.

1.1 Read the data, do the necessary initial steps, and exploratory data analysis (Univariate, Bi-variate, and multivariate analysis).

1.2 Do you think scaling is necessary for clustering in this case? Justify

1.3 Apply hierarchical clustering to scaled data. Identify the number of optimum clusters using Dendrogram and briefly describe them

1.4 Apply K-Means clustering on scaled data and determine optimum clusters. Apply elbow curve and silhouette score. Explain the results properly. Interpret and write inferences on the finalized clusters.

1.5 Describe cluster profiles for the clusters defined. Recommend different promotional strategies for different clusters.

Dataset for Problem 1: bank_marketing_part1_Data.csv

Data Dictionary for Market Segmentation:

spending: Amount spent by the customer per month (in 1000s) advance_payments: Amount paid by the customer in advance by cash (in 100s) probability_of_full_payment: Probability of payment done in full by the customer to the bank current_balance: Balance amount left in the account to make purchases (in 1000s) credit_limit: Limit of the amount in credit card (10000s) min_payment_amt : minimum paid by the customer while making payments for purchases made monthly (in 100s) max_spent_in_single_shopping: Maximum amount spent in one purchase (in 1000s) Problem 2: CART-RF-ANN

An Insurance firm providing tour insurance is facing higher claim frequency. The management decides to collect data from the past few years. You are assigned the task to make a model which predicts the claim status and provide recommendations to management. Use CART, RF & ANN and compare the models' performances in train and test sets.

2.1 Read the data, do the necessary initial steps, and exploratory data analysis (Univariate, Bi-variate, and multivariate analysis). 2.2 Data Split: Split the data into test and train, build classification model CART, Random Forest, Artificial Neural Network 2.3 Performance Metrics: Comment and Check the performance of Predictions on Train and Test sets using Accuracy, Confusion Matrix, Plot ROC curve and get ROC_AUC score, classification reports for each model. 2.4 Final Model: Compare all the models and write an inference which model is best/optimized. 2.5 Inference: Based on the whole Analysis, what are the business insights and recommendations

Dataset for Problem 2: insurance_part2_data-1.csv

Attribute Information:

  1. Target: Claim Status (Claimed)
  2. Code of tour firm (Agency_Code)
  3. Type of tour insurance firms (Type)
  4. Distribution channel of tour insurance agencies (Channel)
  5. Name of the tour insurance products (Product)
  6. Duration of the tour (Duration in days)
  7. Destination of the tour (Destination)
  8. Amount worth of sales per customer in procuring tour insurance policies in rupees (in 100’s)
  9. The commission received for tour insurance firm (Commission is in percentage of sales) 10.Age of insured (Age)

About

Data Mining Techniques used in different Datasets. Market Segmentation using Clustering Analysis Further, using Random Forest Model, CART and Artificial Neural Network to make predictions on data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages