Skip to content

Welcome to the Malicious Executable Detection project! This repository explores the world of machine learning and clustering analysis to detect malicious executable files πŸ”₯πŸ”

Notifications You must be signed in to change notification settings

Vidhi1290/Malware-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Malicious Executable Detection using Cluster Analysis πŸ“Š

Welcome to the Malicious Executable Detection project! This repository explores the world of machine learning and clustering analysis to detect malicious executable files. πŸ”πŸ€–

Problem Statement 🎯

In an era where cyber warfare is on the rise, detecting malicious code has become crucial. This project aims to develop a machine learning approach to identify malicious executable files. πŸ’»πŸ¦ 

Understanding the Data and Attributes πŸ“š

The dataset contains features extracted from both malicious and non-malicious Windows executable files. It includes a total of 373 samples, with 301 being malicious and 72 non-malicious files. The dataset is imbalanced, with 531 features represented as F1, F2, and so on, and a label column indicating whether the file is malicious or non-malicious. πŸ“ˆπŸ§

Data Preparation πŸ› οΈ

  • Imputation: Rows and columns with missing data exceeding 70% are removed. 🧹
  • Feature Selection: Relevant features are chosen for analysis. 🎯
  • Data Standardization: Standardization is applied to make the data suitable for clustering. πŸ“Š

K-Means Clustering πŸ“ˆ

K-Means clustering is applied to group similar instances together. The Silhouette method is used to determine the optimal number of clusters. 🧩

Silhouette Analysis πŸ“Š

Silhouette analysis helps evaluate the quality of clustering. A higher silhouette score indicates better clustering. πŸ“ˆπŸ”

Cluster Stability Check πŸ”’

Cluster stability is assessed by comparing clusters with and without random sampling of data. πŸ”„

Categorizing New Samples πŸ†•

The model is used to predict clusters for new executable files. πŸ“‹

Learning Outcomes πŸ“š

  • Implementing cluster analysis in Python
  • Pre-processing data for analysis
  • Hierarchical clustering and dendrogram visualization
  • Implementing K-Means clustering
  • Determining the optimal number of clusters
  • Cluster stability evaluation
  • Predicting clusters for new samples

Feel free to explore the notebooks and the code to dive deeper into the analysis!

Kaggle Notebook πŸ“Š

You can also view this project on Kaggle. πŸ“‘

Open in Colab πŸš€

Want to run the notebooks in Google Colab? Click here to open them directly! πŸ’‘

Connect with Us 🌐

Join our community and stay updated on our latest projects:

Happy coding! πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

About

Welcome to the Malicious Executable Detection project! This repository explores the world of machine learning and clustering analysis to detect malicious executable files πŸ”₯πŸ”

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published