Skip to content

A machine learning project where we first detected and removed the outliers and then checked correlation among features and then applied different ML algorithms to check if the person might get a heart attack or not.

Notifications You must be signed in to change notification settings

prathammehta16/Heart-Disease-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

Heart-Disease-Prediction

Concepts Used:

EDA, Feature Extraction, Seaborn, Pandas, Numpy, Inter-Quartile Range, Z-score, Pearson's Correlation coefficient, Spearman's Correlation coefficient, Logistic regression, Decision trees, Random forest, K nearest neighbours.

Data:

Data

  • Age : Age of the patient

  • Sex : Sex of the patient

  • exang: exercise induced angina (1 = yes; 0 = no)

  • ca: number of major vessels (0-3)

  • cp : Chest Pain type chest pain type
    a)Value 1: typical angina
    b)Value 2: atypical angina
    c)Value 3: non-anginal pain
    d)Value 4: asymptomatic

  • trtbps : resting blood pressure (in mm Hg)

  • chol : cholestoral in mg/dl fetched via BMI sensor

  • fbs : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)

  • rest_ecg : resting electrocardiographic results
    a)Value 0: normal
    b)Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
    c)Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
    d)thalach : maximum heart rate achieved

  • target : 0= less chance of heart attack 1= more chance of heart attack

EDA:

First, we saw here the datatype of the parameters using data.info() then we checked the number of duplicate records in the dataset and then removed it.Then we also checked for the NULL values in the dataset using data.isnull() and then removed the NULL values.

Detecting Outliers:

Detecting Outliers using Seaborn's Boxplots:
Here we found that outliers are present in trtbps, chol, thalachh, oldpeak, caa, thall.

Removing Outliers:

  1. Removing the outliers using IQR(Inter-Quartile Range):
    In IQR the data points that are not in the range (lower limit, upper limit) are considered as outliers.
  • upper limit = Q3 + 1.5 * IQR
  • lower limit = Q1 – 1.5 * IQR Afetr performing IQR, we found that 228 records still remain.
  1. Removing outliers using Z-score:
  • Here the data point is considered as an outlier if the corresponding Z-score > 3. After performing Z-score we found that 287 records still remain.

As after performing Z-score we have more number of records, we preferred Z-score.

Correlation:

  1. Finding Correlation using Seaborn's Heatmap:

2. Finding Correlation using Pearson's Correlation:

3. Finding correlation using Spearman's correlation:

Training Models:

Here the models we used to predict are:

  1. Logistic Regression
  2. Decision Trees
  3. Random Forest
  4. K nearest neighbor.
    And their corresponding accuracy scores are:


Hence, after removeing the outliers we conclude that the Logistic regression algorithm is best suitable for this problem.

About

A machine learning project where we first detected and removed the outliers and then checked correlation among features and then applied different ML algorithms to check if the person might get a heart attack or not.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages