Exploratory Data Analysis (EDA), Data cleaning, feature extraction on 12 features and testing several ML models to predict whether a customer will subscribe to a term deposit.
Bank Term Deposit Predictions - Kaggle.com
Predicting Subscription to Term Deposits through Marketing Campaigns
This dataset, titled Direct Marketing Campaigns for Bank Term Deposits, is a collection of data related to the direct marketing campaigns conducted by a Portuguese banking institution. These campaigns primarily involved phone calls with customers, and the objective was to determine whether or not a customer would subscribe to a term deposit offered by the bank.
The dataset contains various features that provide insights into customer attributes and campaign outcomes. These features include: age, job, martial status, education, default, balance, housing loan, contact type, day, duration, campaign contacts count, days passed since last contact, previous outcome of contact.
- ROC Score: 0.8901
Metric | Class 0 | Class 1 |
---|---|---|
Precision | 0.93 | 0.57 |
Recall | 0.95 | 0.47 |
F1-Score | 0.94 | 0.52 |
Support | 7952 | 1091 |
Macro Avg | 0.75 | 0.71 |
Weighted Avg | 0.89 | 0.89 |
- ROC Score: 0.9025
Metric | Class 0 | Class 1 |
---|---|---|
Precision | 0.92 | 0.66 |
Recall | 0.97 | 0.36 |
F1-Score | 0.94 | 0.46 |
Support | 7952 | 1091 |
Macro Avg | 0.79 | 0.67 |
Weighted Avg | 0.89 | 0.90 |
- ROC Score: 0.8026
Metric | Class 0 | Class 1 |
---|---|---|
Precision | 0.91 | 0.56 |
Recall | 0.97 | 0.26 |
F1-Score | 0.94 | 0.36 |
Support | 7952 | 1091 |
Macro Avg | 0.73 | 0.62 |
Weighted Avg | 0.86 | 0.89 |
- ROC Score: 0.9148
Metric | Class 0 | Class 1 |
---|---|---|
Precision | 0.92 | 0.65 |
Recall | 0.97 | 0.37 |
F1-Score | 0.94 | 0.48 |
Support | 7952 | 1091 |
Macro Avg | 0.78 | 0.67 |
Weighted Avg | 0.89 | 0.90 |
- ROC Score: 0.9100
Metric | Class 0 | Class 1 |
---|---|---|
Precision | 0.93 | 0.60 |
Recall | 0.96 | 0.46 |
F1-Score | 0.94 | 0.52 |
Support | 7952 | 1091 |
Macro Avg | 0.76 | 0.71 |
Weighted Avg | 0.89 | 0.90 |
The best model is Random Forest, with an F-1 score of 0.52. One possible improvement is to fine-tune the model using cross-validation and grid-search. We also could pick a different threshold, where we would not have as many true negatives (134), but have more false positives (291). This is due to the reason that marketing resources aren't more valuable than a subscribing customer. We would rather have more calls to people who wouldn't subsribe, than not calling people who would.
Thanks to Samir Gouda & Omar Eldahshoury for thier support.