Skip to content

subhashgowda/MachineLearningComplete

Repository files navigation

MACHINE LEARNING

Based on Avain Jain's 100 days of ml code**

DATA PROCESSING

day 1

For code and dataset <--- Click


REGRESSION

-> Regression is used when the prediction have "infinite posibilities".

Types of regression

Simple Linear Regression

Multiple Linear Regression

Polynomial Regression


SIMPLE LINEAR REGRSSION

day 2

Clik here for Code and dataset

SLR is used, when we have a "single input attribute" and we want to use linearity between variables.

2 Variables, Dependent variable (predicting) and independent variable / exploratory variable(observed)

Simple Linear Regression follows linear equation

                              Y = m x + C

Y = line, Output variable to be predict

x = input variable

m = slope

C = intercept

A line plot through variables, must be "passing through intercept and mean of (x,Y) cordinate, then that line is known as line of best fit.

The goal is to find the best estimates for the coefficients to mininmize the errors in predicting y from x.

Slope

How x translates into Y value before bias.

b1 / m = (Sum((x-mean(x)* (y-mean(y)))/(Sum((x-mean(x)^2))

Intercept

Point that cuts through x axis is intercept

C = mean(y)-m(mean(x))

Assumptions of Linear regression

1️⃣ Model should be Linear

2️⃣ Errors should be Independent

3️⃣ Error terms should be normally distributed

4️⃣ Homoscedacity :Const variance on error terms


MULTIPLE LINEAR REGRESSION

day 3

For greater numbers of independent variables, visual understanding is more abstract. For p independent variables, the data points (x1, x2, x3 …, xp, y) exist in a p + 1 -dimensional space. What really matters is that the linear model (which is p -dimensional) can be represented by the p + 1 coefficients β0, β1, …, βp so that y is approximated by the equation y = β0 + β1*x1 +....

Click here for code and dataset


CLASSIFICATION


LOGISTIC REGRESSION

day 4

Click here for Code and dataset

data


K NEAREST NEIGHBOUR

day 7

Click here for Code and Dataset


SUPPORT VECTOR MACHINES/REGRESSION

day 12

Click here for code and Dataset.


DECISION TREE

day 23

Click here for code and Dataset for regression.

Click here for code and Dataset for Classififcation.


RANDOM FOREST

day 33

Click here for code and Dataset for regression. Click here for code and Dataset for ckassifier.


KERNAL SVM

Click here for code and Dataset.


NAIVE BAYES

Click here for code and Dataset.


CLUSTERING


K MEANS

day 43

An unsupervised learning algorithm (meaning there are no target labels) that allows you to identify similar groups or clusters of data points within your data.

Algorithm

  1. We randomly initialize the K starting centroids. Each data point is assigned to its nearest centroid.
  2. The centroids are recomputed as the mean of the data points assigned to the respective cluster.
  3. Repeat steps 1 and 2 until we trigger our stopping criteria.

optimizing for and the answer is usually Euclidean distance or squared Euclidean distance to be more precise. Data points are assigned to the cluster closest to them or in other words the cluster which minimizes this squared distance. We can write this more formally as:

Kmeans Visualize

We have defined k = 2 so we are assigning data to one of two clusters at each iteration. Figure (a) corresponds to the randomly initializing the centroids. In (b) we assign the data points to their closest cluster and in Figure c we assign new centroids as the average of the data in each cluster. This continues until we reach our stopping criteria (minimize our cost function J or for a predefined number of iterations). Hopefully, the explanation above coupled with the visualization has given you a good understanding of what K means is doing.

1_dpglfqy3obgpgubyqk9hiq

Click here for code and Dataset.


Hierarchical Clustering

day 54

Click here for code and Dataset.


Projects:

About

#100daysofmlchallenge

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages