Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Interface for OptiSim Algorithm #224

Open
wants to merge 68 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
3637959
Added workflow file for deployment
JackyZzZz Jun 16, 2024
afd0525
Added Main Page for the interface
JackyZzZz Jun 16, 2024
fc38712
Added some images for the interface
JackyZzZz Jun 16, 2024
b73665a
Added the MaxMin interface
JackyZzZz Jun 16, 2024
6bafce4
Added the Dockerfile
JackyZzZz Jun 16, 2024
f7bc6c4
Added the configuration file for HuggingFace
JackyZzZz Jun 16, 2024
4bca7cd
Added streamlit in the dependencies
JackyZzZz Jun 16, 2024
541f8f1
Added the option of heading in csv and xlsx file
JackyZzZz Jun 16, 2024
36f601a
Added visualization and better support for numpy array files
JackyZzZz Jun 17, 2024
f996882
Added some error handling for MaxMin interface
JackyZzZz Jun 17, 2024
4fd4492
Use git-lfs to track xlsx files
JackyZzZz Jun 17, 2024
b3bdd90
Remove binary files before deployment
JackyZzZz Jun 17, 2024
02590c3
Removed gitattributes file
JackyZzZz Jun 17, 2024
a722d3b
Added some info about the org
JackyZzZz Jun 17, 2024
e4ca43a
Added some background color
JackyZzZz Jun 17, 2024
7ac69bd
Added interface for MaxSum
JackyZzZz Jun 17, 2024
423d1d8
Added workflow file for deployment
JackyZzZz Jun 16, 2024
ceeef3f
Added Main Page for the interface
JackyZzZz Jun 16, 2024
b53e581
Added some images for the interface
JackyZzZz Jun 16, 2024
f521f26
Added the MaxMin interface
JackyZzZz Jun 16, 2024
0c37793
Added the Dockerfile
JackyZzZz Jun 16, 2024
ef5c187
Added the configuration file for HuggingFace
JackyZzZz Jun 16, 2024
c77ab5b
Added streamlit in the dependencies
JackyZzZz Jun 16, 2024
4643281
Added the option of heading in csv and xlsx file
JackyZzZz Jun 16, 2024
52823a2
Added visualization and better support for numpy array files
JackyZzZz Jun 17, 2024
7634a97
Added some error handling for MaxMin interface
JackyZzZz Jun 17, 2024
2e0c79a
Use git-lfs to track xlsx files
JackyZzZz Jun 17, 2024
9c5b04c
Remove binary files before deployment
JackyZzZz Jun 17, 2024
65bf7bd
Removed gitattributes file
JackyZzZz Jun 17, 2024
c681970
Added some info about the org
JackyZzZz Jun 17, 2024
1fd5f99
Added some background color
JackyZzZz Jun 17, 2024
b3e7754
Modified the branch that triggers Github Actions
JackyZzZz Jun 19, 2024
c98835d
Use the latest checkout version
JackyZzZz Jun 19, 2024
61d3c12
Use the latest release of setup-python and python version
JackyZzZz Jun 19, 2024
c672e5b
Removed unused dependencies and added an empty line
JackyZzZz Jun 19, 2024
d9a8afb
Updated python version and added an empty line
JackyZzZz Jun 19, 2024
94bb529
Removed streamlit from package dependencies
JackyZzZz Jun 19, 2024
7be242d
Added license for streamlit interfaces
JackyZzZz Jun 19, 2024
75ff707
Modified the name for the package
JackyZzZz Jun 19, 2024
a82282f
Modified the name for the file
JackyZzZz Jun 19, 2024
a0b5847
Modified the name for the file
JackyZzZz Jun 19, 2024
e671201
Removed symbolic text
JackyZzZz Jun 19, 2024
1314ba8
Fixed some code convention
JackyZzZz Jun 19, 2024
65bd637
Merged some changes
JackyZzZz Jun 19, 2024
77507f0
Changed strategy to setup
JackyZzZz Jun 20, 2024
ed6feb4
Remove streamlit from package dependencies
JackyZzZz Jun 20, 2024
14e751d
Delete the origin MaxMin file
JackyZzZz Jun 20, 2024
8149b82
Remove symbolic text
JackyZzZz Jun 20, 2024
b169fe9
Fixed a typo
JackyZzZz Jun 20, 2024
1d72b6d
Merge branch 'uncleaned-branch' of https://github.com/JackyZzZz/Selec…
JackyZzZz Jun 21, 2024
3024414
Fixed some code convention
JackyZzZz Jun 21, 2024
6c8604c
Created a common file for reused methods
JackyZzZz Jun 22, 2024
24e96a5
Removed duplicate code
JackyZzZz Jun 22, 2024
5ebce49
Added a blank line at the end
JackyZzZz Jun 22, 2024
25a78c5
Added starting template for DISE
JackyZzZz Jun 22, 2024
85f2562
Fixed some typo
JackyZzZz Jun 22, 2024
b38008a
Update run_algorithm to accommodate init parameters
JackyZzZz Jun 24, 2024
3dcaf4d
Added the function for clearing outputs
JackyZzZz Jun 24, 2024
29aed7b
First version of DISE algorithm
JackyZzZz Jun 24, 2024
e671796
Pass fun_dist to MaxMin as an option
JackyZzZz Jun 25, 2024
617e05f
import pairwise_distances for MaxMin
JackyZzZz Jun 25, 2024
d864d3d
Pass fun_dist to MaxSum as an option
JackyZzZz Jun 25, 2024
274fcd4
clear outputs if parameters change for MaxMin and MaxSum
JackyZzZz Jun 25, 2024
e460f43
Added page configurations
JackyZzZz Jun 25, 2024
bef0839
Added algorithm-interface branch to Github Actions for testing
JackyZzZz Jun 27, 2024
d7e9afa
Merge branch 'main' of https://github.com/JackyZzZz/Selector into alg…
JackyZzZz Jun 27, 2024
12b56d5
Added the interface for OptiSim
JackyZzZz Jun 29, 2024
36d9780
Test Deployment on HuggingFace
JackyZzZz Jun 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions .github/workflows/interface_auto.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
name: Deployment on DockerHub and Hugging Face

# Trigger the workflow on push and pull request events to the 'main' or 'master' branch
on:
push:
branches:
- "master"
- "main"
- "optisim-interface"
pull_request:
branches:
- "master"
- "main"

jobs:
build-and-deploy:
# Run the job on the latest Ubuntu runner
runs-on: ubuntu-latest

steps:
# Step 1: Check out the latest code from the repository
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0
lfs: true

# Step 2: Set up Git LFS
- name: Set up Git LFS
run: |
git lfs install
git lfs pull

# Step 3: Remove binary files from git history
- name: Remove binary files from git history
run: |
git filter-branch --force --index-filter \
"git rm --cached --ignore-unmatch DiverseSelector/test/test2/BBB_SECFP6_1024.xlsx DiverseSelector/test/test2/BBB_SECFP6_2048.xlsx" \
--prune-empty --tag-name-filter cat -- --all

# Step 4: Set up Python environment with version 3.11
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: 3.11

# Step 5: Install the required dependencies
- name: Install dependencies
run: |
pip install -r requirements.txt

# Step 6: Build the Docker image to Docker Hub
- name: Build Docker image
run: |
docker build . -t jackyzzz076/selector-deployment:latest

# Step 7: Push the Docker image to Docker Hub
- name: Push Docker image
run: |
docker login -u jackyzzz076 -p ${{ secrets.DOCKERHUB_TOKEN }}
docker push jackyzzz076/selector-deployment:latest

# Step 8: Replace the README.md file for Hugging Face
- name: Replace README for Hugging Face
run: |
mv README_hf.md README.md
git config --global user.name "github-actions[bot]"
git config --global user.email "[email protected]"
git add README.md
git commit -m "Replace README.md with README_hf.md for Hugging Face"

# Step 9: Push the app to Hugging Face
- name: Push to Hugging Face
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
git push https://JackyZzZzZ:[email protected]/spaces/JackyZzZzZ/selector optisim-interface:main --force
34 changes: 34 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Use the official image as a parent image
FROM python:3.11-slim

# Set the working directory in the container
WORKDIR /app

# Install system dependencies required for building packages
RUN apt-get update && \
apt-get install -y build-essential && \
apt-get clean

# Copy the requirements file into the container
COPY requirements.txt requirements.txt
COPY requirements_dev.txt requirements_dev.txt

# Upgrade pip, setuptools, and wheel
RUN pip install --upgrade pip setuptools wheel

# Install the dependencies using --use-pep517
RUN pip install --use-pep517 --no-cache-dir -r requirements.txt
RUN pip install --use-pep517 --no-cache-dir -r requirements_dev.txt
RUN pip install --use-pep517 --no-cache-dir streamlit

# Copy the rest of the application code
COPY . .

# Install the Selector package using PEP 517 standards-based tools
RUN pip install --use-pep517 .

# Expose the port the app runs on
EXPOSE 8501

# Command to run the app
CMD ["streamlit", "run", "streamlit_app/app.py", "--server.enableXsrfProtection=false"]
8 changes: 8 additions & 0 deletions README_hf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: QC-Selector
emoji: 🐳
colorFrom: purple
colorTo: gray
sdk: docker
app_port: 8501
---
99 changes: 99 additions & 0 deletions streamlit_app/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# The Selector library provides a set of tools for selecting a
# subset of the dataset and computing diversity.
#
# Copyright (C) 2023 The QC-Devs Community
#
# This file is part of Selector.
#
# Selector is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 3
# of the License, or (at your option) any later version.
#
# Selector is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, see <http://www.gnu.org/licenses/>
#
# --

import streamlit as st
import os


# Get the current directory path
current_dir = os.path.dirname(os.path.abspath(__file__))

# Construct the path to the assets directory
assets_dir = os.path.join(current_dir, "assets")

# Set page configuration
st.set_page_config(
page_title = "Selector",
page_icon = os.path.join(assets_dir, "QC-Devs.png"),
)

st.image(os.path.join(assets_dir, "selector_logo.png"))

st.write("# Welcome to Selector! 👋")

st.sidebar.success("Select an algorithm to get started.")

st.info("👈 Select an algorithm from the sidebar to see some examples of what Selector can do!")

st.markdown(
"""
[selector](https://github.com/theochem/Selector) is a free, open-source, and cross-platform
Python library designed to help you effortlessly identify the most diverse subset of molecules
from your dataset.
Please use the following citation in any publication using selector library:

**“Selector: A Generic Python Package for Subset Selection”**, Fanwang Meng, Alireza Tehrani,
Valerii Chuiko, Abigail Broscius, Abdul, Hassan, Maximilian van Zyl, Marco Martínez González,
Yang, Ramón Alain Miranda-Quintana, Paul W. Ayers, and Farnaz Heidar-Zadeh”

The selector source code is hosted on [GitHub](https://github.com/theochem/Selector)
and is released under the [GNU General Public License v3.0](https://github.com/theochem/Selector/blob/main/LICENSE).
We welcome any contributions to the selector library in accordance with our Code of Conduct;
please see our [Contributing Guidelines](https://qcdevs.org/guidelines/qcdevs_code_of_conduct/).
Please report any issues you encounter while using
selector library on [GitHub Issues](https://github.com/theochem/Selector/issues).
For further information and inquiries please contact us at [email protected].

### Why QC-Selector?
In the world of chemistry, selecting the right subset of molecules is critical for a wide
range of applications, including drug discovery, materials science, and molecular optimization.
QC-Selector offers a cutting-edge solution to streamline this process, empowering researchers,
scientists, and developers to make smarter decisions faster.

### Key Features
1. Import Your Dataset: Simply import your molecule dataset in various file formats, including SDF, SMILES, and InChi, to get started.

2. Define Selection Criteria: Specify the desired level of diversity and other relevant parameters to tailor the subset selection to your unique requirements.

3. Run the Analysis: Let QC-Selector’s powerful algorithms process your dataset and efficiently select the most diverse molecules.

4. Export: Explore the diverse subset and export the results for further analysis and integration into your projects.
"""
)

st.sidebar.title("About QC-Devs")

st.sidebar.info("QC-Devs develops various free, open-source, and cross-platform libraries for scientific computing, especially theoretical and computational chemistry. Our goal is to make programming accessible to chemists and promote precepts of sustainable software development. For further information and inquiries please contact us at [email protected].")

# Add icons to the sidebar
st.sidebar.markdown(
"""
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
<div style="text-align: center;">
<a href="https://qcdevs.org/" target="_blank"><i class="fa fa-home" style="font-size:24px"></i> WEBSITE</a><br>
<a href="mailto:[email protected]"><i class="fa fa-envelope" style="font-size:24px"></i> EMAIL</a><br>
<a href="https://github.com/theochem" target="_blank"><i class="fa fa-github" style="font-size:24px"></i> GITHUB</a><br>
© 2024 QC-Devs. All rights reserved.
</div>
""",
unsafe_allow_html=True
)
Binary file added streamlit_app/assets/QC-Devs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added streamlit_app/assets/selector_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
112 changes: 112 additions & 0 deletions streamlit_app/pages/page_DISE.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# The Selector library provides a set of tools for selecting a
# subset of the dataset and computing diversity.
#
# Copyright (C) 2023 The QC-Devs Community
#
# This file is part of Selector.
#
# Selector is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 3
# of the License, or (at your option) any later version.
#
# Selector is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, see <http://www.gnu.org/licenses/>
#
# --

import streamlit as st
import sys
import os

from selector.methods.distance import DISE

# Add the streamlit_app directory to the Python path
current_dir = os.path.dirname(os.path.abspath(__file__))
parent_dir = os.path.join(current_dir, "..")
sys.path.append(parent_dir)

from utils import *

# Set page configuration
st.set_page_config(
page_title = "DISE",
page_icon = os.path.join(parent_dir, "assets" , "QC-Devs.png"),
)

st.title("Directed Sphere Exclusion (DISE)")


description = """
In a nutshell, this algorithm iteratively excludes any sample within a given radius from
any already selected sample. The radius of the exclusion sphere is an adjustable parameter.
Compared to Sphere Exclusion algorithm, the Directed Sphere Exclusion algorithm achieves a
more evenly distributed subset selection by abandoning the random selection approach and
instead imposing a directed selection.

Reference sample is chosen based on the `ref_index`, which is excluded from the selected
subset. All samples are sorted (ascending order) based on their Minkowski p-norm distance
from the reference sample. Looping through sorted samples, the sample is selected if it is
not already excluded. If selected, all its neighboring samples within a sphere of radius r
(i.e., exclusion sphere) are excluded from being selected. When the selected number of points
is greater than specified subset `size`, the selection process terminates. The `r0` is used
as the initial radius of exclusion sphere, however, it is optimized to select the desired
number of samples.
"""

references = "Gobbi, A., and Lee, M.-L. (2002). DISE: directed sphere exclusion."\
"Journal of Chemical Information and Computer Sciences,"\
"43(1), 317–323. https://doi.org/10.1021/ci025554v"

display_sidebar_info("Directed Sphere Exclusion (DISE)", description, references)

# File uploader for feature matrix or distance matrix (required)
matrix_file = st.file_uploader("Upload a feature matrix or distance matrix (required)",
type=["csv", "xlsx", "npz", "npy"], key="matrix_file", on_change=clear_results)

# Clear selected indices if a new matrix file is uploaded
if matrix_file is None:
clear_results()
# Load data from matrix file
else:
matrix = load_matrix(matrix_file)
num_points = st.number_input("Number of points to select", min_value = 1, step = 1,
key = "num_points", on_change=clear_results)
label_file = st.file_uploader("Upload a cluster label list (optional)", type = ["csv", "xlsx"],
key = "label_file", on_change=clear_results)
labels = load_labels(label_file) if label_file else None

# Parameters for Directed Sphere Exclusion
st.info("The parameters below are optional. If not specified, default values will be used.")

r0 = st.number_input("Initial guess for radius of exclusion sphere (r0)", value=None, step=0.1,
on_change=clear_results)
ref_index = st.number_input("Reference index (ref_index)", value=0, step=1, on_change=clear_results)
tol = st.number_input("Percentage tolerance of sample size error (tol)", value=0.05, step=0.05,
on_change=clear_results)
n_iter = st.number_input("Number of iterations for optimizing the radius of exclusion sphere (n_iter)",
value=10, step=10, on_change=clear_results)
p = st.number_input("Minkowski p-norm distance (p)", value=2.0, step=1.0, on_change=clear_results)
eps = st.number_input("Approximate nearest neighbor search parameter (eps)", value=0.0, step=0.1,
on_change=clear_results)

if st.button("Run DISE Algorithm"):
selector = DISE(r0=r0, ref_index=ref_index, tol=tol, n_iter=n_iter, p=p, eps=eps)
selected_ids = run_algorithm(selector, matrix, num_points, labels)
st.session_state['selector'] = selector
st.session_state['selected_ids'] = selected_ids

# Check if the selected indices are stored in the session state
if 'selected_ids' in st.session_state and matrix_file is not None:
selected_ids = st.session_state['selected_ids']
st.write("Selected indices:", selected_ids)

if 'selector' in st.session_state:
st.write("Radius of the exclusion sphere:", st.session_state['selector'].r)

export_results(selected_ids)
Loading
Loading