Skip to content

Machine learning model to identify customers that are more likely to default based on employment, bank balance and annual salary.

Notifications You must be signed in to change notification settings

luuisotorres/Loan-Default-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation


👨‍💻 | Loan Default Prediction 💰

--

Python Jupyter Notebook Pandas NumPy Plotly scikit_learn

--

Important Note

This notebook is easier to read on Kaggle. Please, click here to see it on Kaggle, where plotly charts are fully interactive!

About

Loans are an essential part of our economy.People borrow money from financial institutions all the time, either for starting a business, emergency expenses, vehicle financing, vacation costs, or education costs.

However, when lending money to someone, there is always the risk that that person may not be able to pay you back. When it comes to financial institutions, such as banks, that borrow large amounts of money to many different people for many different reasons, the risk of losses from defaults gets exponentially higher.

For this reason, it is extremely important that financial institutions avoid loans to people that are highly likely to default, and they usually invest a lot of time and resources in background checks on people to avoid having losses. In this notebook, I'll develop a machine learning model that will be able to predict how likely a client is to default based on whether or not he's employed, his bank balance, and his annual salary.

The dataset

To develop this loan default predictor, I've used the Loan Default Prediction dataset on Kaggle, which is a synthetic dataset created using actual data from a financial institution, containing data from 10,000 clients. It's important to notice that this data has been transformed in order to avoid identification of these clients and this institution.

The attirbutes in this dataset are as follows:

  • Employed: 1 for employed and 0 for unemployed;

  • Bank Balance: The amount of money that client had available in their account at the moment the data was obtained;

  • Annual Salary: The annual salary of each client;

  • Defaulted?: This is our target variable and it's filled of 0 for each client who didn't default and 1 for each client who defaulted their loans.

I've used some EDA techniques to evaluate how each attributed interacted with each other and how relevant they were to the target variable.

Libraries Used

  • pandas
  • numpy
  • plotly
  • matplotlib
  • seaborn
  • sklearn
  • pycaret

Author

Luis Fernando Torres