Skip to content

Comparative Analysis of Unsupervised Learning Methods for Real-time Anomaly Detection in Industrial Control Systems (ICS)

Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation

Comparative Analysis of Unsupervised Learning Methods for Real-time Anomaly Detection in Industrial Control Systems (ICS)

This project is a collaborative assignment for the Open Source Technologies / Stream Mining subjects.


The primary objective of this project is to conduct a comparative analysis of unsupervised learning methods for real-time anomaly detection within Industrial Control Systems (ICSs). The project aims to assess algorithmic effectiveness and applicability in domains like cybersecurity and industrial monitoring.


The project follows a structured architecture:

Architecture Diagram

Technologies Used

The project leverages several open-source technologies:

  • Kafka: An open-source distributed event streaming platform crucial for efficient data pipelines and streaming analytics.
  • PySpark: An interface for Apache Spark in Python, supporting various Spark functionalities like Spark SQL, DataFrame, Streaming, MLlib (Machine Learning), and Spark Core.
  • InfluxDB: An open-source time series database by InfluxData, suitable for storing and retrieving time series data in real-time applications.
  • Grafana: An open-source analytics and visualization web application capable of generating charts, graphs, and alerts when connected to compatible data sources.
  • Chronograf: A web application developed by InfluxData as part of the InfluxDB project, facilitating data visualization and exploration.

Run the Project

To execute the project, use the following command in the terminal:

  1. Run Docker Compose:

    docker-compose up 
  2. Merge SWaT Normal and Attack Datasets: Run to merge SWaT normal and attack datasets. This script combines these datasets for further processing.

  3. Create 'swat' Topic in Kafka: Inside the kafka folder, execute topic_creation.ipynb to create the 'swat' topic within Kafka. This step is required only once initially.

Data Streaming and Processing:

  1. Data Streaming to Kafka: Execute kafka_producer.ipynb in the kafka folder. This notebook streams data from CSV files into Kafka.

  2. Preprocess Data using Spark: a. Find the Spark container ID: sh docker ps # Copy the Spark container ID b. Access the Spark container: sh docker exec -it [spark_container_id] bash c. Preprocess data using Spark: sh spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2 /sparkScripts/

  3. Perform Batch Data Preprocessing: Use to preprocess data using Spark libraries.

  4. Initiate Model Training on Batch Data: Commence model training by executing <model_name>.py scripts.

Visualization and Result Interpretation:

  1. Configure InfluxDB Data Source in Grafana and Chronograf: Set up a new InfluxDB data source in both Grafana and Chronograf to visualize and analyze the results.

Additional Notes:

  • All related scripts and sketches are located in the spark folder.
  • To run Spark scripts, access the Spark Docker container and execute:
    spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2 /sparkScripts/<>