Credit Card Clustering for Market Segmentation

This project is trying to cluster a big credit card dataset that has 17 variables and 9000 rows of data. Prior to clustering, the dataset was cleaned, featured-selected using R-squared for multicollinearity assessment, and transformed via standardisation.

Principal component analysis (PCA) was used for exploratory data analysis (EDA) and found that the amount of transaction for cash in advance, credit limit, magnitude of payments paid back to the credit card company, magnitude of purchases made, and the percentage of full payment made by credit card users are important variables (or known as features) to look at in this analysis. Other variables can be important supportive information such as preference of making minimum payments and purchase frequency in oneoff-nature or installment-nature.

5 different machine learning clustering algorithms are performed, which are Clustering for Large Application (CLARA), Hierarchical K-means Clustering Hybrid (HKmeans), Fuzzy clustering, Model-based Clustering, Density-Based Spatial Clustering for Application with Noise (DBSCAN). Analysis outcome shows that VEV model suggested by model-based clustering is the best clustering model that is able to detect 8 distinct group of credit card users, which is also the largest amount suggested compared to other algorithms. These groups are:

Cluster 1: Less active user that prefer cash-in-advance.
Cluster 2: Less active user that prefer to use credit card to make purchases, especially installment purchases.
Cluster 3: Revolvers who prefer to make expensive purchases.
Cluster 4: Less active users who prefer to make full repayment back to credit card company.
Cluster 5: Active card users who make expensive purchases and make repayment in big amount sometime they prefer full payment, sometime they pay minimum payments.
Cluster 6: Max payers who prefer to pay money owe in full with zero tenure.
Cluster 7: Less active revolvers, they spend small amount of money to purchase cheaper products.
Cluster 8: Max Payer but this users group is more active in making purchases than cluster 6.

Highlight

Reference

Arjun Bhasin 2018, Credit card Dataset for Clustering, viewed 10 May 2022, https://www.kaggle.com/datasets/arjunbhasin2013/ccdata

Alboukadel Kassambara 2017, Practical Guide to Cluster Analysis in R, Multivariate Analysis 1, Edition 1, sthda.com

Clustering and dimensionality reduction techniques on the Berlin Airbnb data and the problem of mixed data (n.d.),viewed 15 May 2022 https://rstudio-pubs-static.s3.amazonaws.com/579984_6b9efbf84ee24f00985c29e24265d2ba.html

Jenny Listman 2019, Customer Segmentation / K Means Clustering, viewed 14 May 2022, https://www.kaggle.com/code/jennylistman/customer-segmentation-k-means-clustering

Matt.0 2019, 10 Tips for Choosing the Optimal Number of Clusters, viewed 12 May 2022, https://towardsdatascience.com/10-tips-for-choosing-the-optimal-number-of-clusters-277e93d72d92

https://www.choice.com.au/money/credit-cards-and-loans/credit-cards

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
cc_files/figure-gfm		cc_files/figure-gfm
.gitignore		.gitignore
R document.R		R document.R
README.md		README.md
cc.Rmd		cc.Rmd
cc.md		cc.md
cc_cluster.Rproj		cc_cluster.Rproj
cc_dataset.csv		cc_dataset.csv
note.Rmd		note.Rmd
pic1_highlight.png		pic1_highlight.png
pic2_thumbnail.png		pic2_thumbnail.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Card Clustering for Market Segmentation

About

Releases

Packages

Languages

KAR-NG/Credit-Card-Market-Segmentation

Folders and files

Latest commit

History

Repository files navigation

Credit Card Clustering for Market Segmentation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages