Skip to content

hieudtrung/green-hydrogen-gh2

Repository files navigation

Green Hydrogen

Introduction

Also known as GH2, this repository hosts the software part of my semester project at SDU Mechatronics, Fall 2023. The motivation behind it can be found here. In short, it is an AI-powered management system for green hydrogen production that uses only electricity from wind farms and national grid.

The software architecture design is descsribed here, while mechanical, electrical & electronics design, and test bench results are not yet disclosed.

Getting Started

Development Environment Setup

  1. Clone the source code

    git clone https://dagshub.com/hieudtrung/mlo.git \
    cd mlo
  2. Create a virtual environment by using either conda or mamba

    conda install -f conda_env.yaml
  3. (optional) Setup DagsHub for experiment tracking & data versioning

    # MLFlow with DagsHub Experiment as host
    export MLFLOW_TRACKING_URI=https://dagshub.com/hieudtrung/green-hydrogen-gh2.mlflow \
    export MLFLOW_TRACKING_USERNAME=<your_username> \
    export MLFLOW_TRACKING_PASSWORD=<your_password>
    
    # DVC with DagsHub as remote storage
    dvc remote add origin https://dagshub.com/hieudtrung/green-hydrogen-gh2.dvc
    dvc remote modify origin --local auth basic 
    dvc remote modify origin --local user hieudtrung 
    dvc remote modify origin --local password <your_token>
    
    # DVC with MinIO as remote storage
    dvc remote add origin s3://dvc
    dvc remote modify origin endpointurl s3://gh2-emu-trials
    dvc remote modify origin --local access_key_id <your_token> \
    dvc remote modify origin --local password secret_access_key <your_token>
  4. (optional) CICD with Github Actions

    Retraining our model with newer data is a tedious task which can be automated. First, we follow this guide to sync DagsHub repo with Github.

    Then, create a Github Actions config so that any code update triggers CI pipelines to run.

Reproduce the results

A Docker-compose file is also available for you to self-host on your own PC. Note that it is not fully tested.

Architecture Design

Sequence Diagrams

There are many use cases whose sequence diagrams will be uploaded in this OneDrive folder.

Overall Architecture

This picture reveals the overall system's architecture that is Azure-native. For more details about each service, please look at their corresponding README.

overall-architecture-azure.

Deploy On-premise

Regarding data management, I also have a self-hosted solution on my homelab cluster using Delta Lake, Apache Spark, and Kubeflow. I'll keep it update once everything is tested properly.

Please keep in mind, the public source code is designed to work with Azure services. On-premise deployment would require a lot of modification, thus it's not recommended.

Contribute

This work is published under MIT license as a showcase of my skills. If you have any issue or update requirement, please log an issue. Feel free to fork, redistribute, or use as your own good.