Malicious Executable Detection using Cluster Analysis 📊

Welcome to the Malicious Executable Detection project! This repository explores the world of machine learning and clustering analysis to detect malicious executable files. 🔍🤖

Problem Statement 🎯

In an era where cyber warfare is on the rise, detecting malicious code has become crucial. This project aims to develop a machine learning approach to identify malicious executable files. 💻🦠

Understanding the Data and Attributes 📚

The dataset contains features extracted from both malicious and non-malicious Windows executable files. It includes a total of 373 samples, with 301 being malicious and 72 non-malicious files. The dataset is imbalanced, with 531 features represented as F1, F2, and so on, and a label column indicating whether the file is malicious or non-malicious. 📈🧐

Data Preparation 🛠️

Imputation: Rows and columns with missing data exceeding 70% are removed. 🧹
Feature Selection: Relevant features are chosen for analysis. 🎯
Data Standardization: Standardization is applied to make the data suitable for clustering. 📊

K-Means Clustering 📈

K-Means clustering is applied to group similar instances together. The Silhouette method is used to determine the optimal number of clusters. 🧩

Silhouette Analysis 📊

Silhouette analysis helps evaluate the quality of clustering. A higher silhouette score indicates better clustering. 📈🔍

Cluster Stability Check 🔒

Cluster stability is assessed by comparing clusters with and without random sampling of data. 🔄

Categorizing New Samples 🆕

The model is used to predict clusters for new executable files. 📋

Learning Outcomes 📚

Implementing cluster analysis in Python
Pre-processing data for analysis
Hierarchical clustering and dendrogram visualization
Implementing K-Means clustering
Determining the optimal number of clusters
Cluster stability evaluation
Predicting clusters for new samples

Feel free to explore the notebooks and the code to dive deeper into the analysis!

Kaggle Notebook 📊

You can also view this project on Kaggle. 📑

Open in Colab 🚀

Want to run the notebooks in Google Colab? Click here to open them directly! 💡

Connect with Us 🌐

Join our community and stay updated on our latest projects:

🌐 GitHub
🔗 LinkedIn
🐦 Twitter
📝 Medium

Happy coding! 👩‍💻👨‍💻

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
malware-detection-using-clustering.ipynb		malware-detection-using-clustering.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Malicious Executable Detection using Cluster Analysis 📊

Problem Statement 🎯

Understanding the Data and Attributes 📚

Data Preparation 🛠️

K-Means Clustering 📈

Silhouette Analysis 📊

Cluster Stability Check 🔒

Categorizing New Samples 🆕

Learning Outcomes 📚

Kaggle Notebook 📊

Open in Colab 🚀

Connect with Us 🌐

About

Releases

Packages

Languages

Vidhi1290/Malware-Detection

Folders and files

Latest commit

History

Repository files navigation

Malicious Executable Detection using Cluster Analysis 📊

Problem Statement 🎯

Understanding the Data and Attributes 📚

Data Preparation 🛠️

K-Means Clustering 📈

Silhouette Analysis 📊

Cluster Stability Check 🔒

Categorizing New Samples 🆕

Learning Outcomes 📚

Kaggle Notebook 📊

Open in Colab 🚀

Connect with Us 🌐

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages