GitHub - m0hamedAit/Parallel-implementation-Kmeans-mapreduce: Parallel implementation of Kmeans algorithm using mapreduce

What is K-Means ?

K-Means is a clustering algorithm that partitions a set of data points into k clusters (Figure). The k-means clustering algorithm is commonly used on large datasets, and due to the characteristics of the algorithm characteristics of the algorithm, this algorithm is a good candidate for parallelization.

How does it work ?

There are several steps to explain it. For explanation, there is data provided below :

Here, I will explain step by step how k-means works

Step 1. Determine the value “K”, the value “K” represents the number of clusters.

In our case, we’ll select K=3. That is to say, we want to identify 3 clusters. Is there any way to determine the value of K? yes there is, but we wont talk about it here.

Step 2. Randomly select 3 distinct centroid (new data points as cluster initialization)

for example — attempts 1. “K” is equal 3 so there are 3 centroid, in which case it will be the cluster initialization

Step 3. Measure the distance (euclidean distance) between each point and the centroid

for example, measure the distance between first point and the centroid.

Step 4. Assign the each point to the nearest cluster

for example, measure the distance between first point and the centroid.

Do the same treatment for the other unlabeled point, until we get this

Step 5. Calculate the mean of each cluster as new centroid

Update the centroid with mean of each cluster

Step 6. Repeat step 3–5 with the new center of cluster

Repeat until stop:

Convergence. (No further changes)
Maximum number of iterations.

Since the clustering did not change at all during the last iteration, we’re done.

Implementation

The goal of this of this exercise is to implement k-means using Hadoop MapReduce.

In this Implementation, we'll cluster a brain MRI image.

The image is in grid level (grayscale image) where the value of each pixel is between 0 and 255 (0 represents the black color and 255 represents the white color), this image shows three parts of the brain, namely the white matter, the gray matter and cerebrospinal fluid, the objective is to know the pixels of each part. To do this you have to load the image to be stored in HDFS, then classify the image into three clusters, you also have to create a file centersRMI.txt file that contains three values of the initial centroids. The output of the program will be something like this :

cerebrospinal fluid doesn't look like this in reality, its because of the image we clustered isn't very clear.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
assets		assets
inputMRI		inputMRI
src/main/java		src/main/java
target/classes		target/classes
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is K-Means ?

How does it work ?

Step 1. Determine the value “K”, the value “K” represents the number of clusters.

Step 2. Randomly select 3 distinct centroid (new data points as cluster initialization)

Step 3. Measure the distance (euclidean distance) between each point and the centroid

Step 4. Assign the each point to the nearest cluster

Step 5. Calculate the mean of each cluster as new centroid

Step 6. Repeat step 3–5 with the new center of cluster

Implementation

About

Releases

Packages

Languages

m0hamedAit/Parallel-implementation-Kmeans-mapreduce

Folders and files

Latest commit

History

Repository files navigation

What is K-Means ?

How does it work ?

Step 1. Determine the value “K”, the value “K” represents the number of clusters.

Step 2. Randomly select 3 distinct centroid (new data points as cluster initialization)

Step 3. Measure the distance (euclidean distance) between each point and the centroid

Step 4. Assign the each point to the nearest cluster

Step 5. Calculate the mean of each cluster as new centroid

Step 6. Repeat step 3–5 with the new center of cluster

Implementation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages