Skip to content

A knowledge graph of the wastewater treatment microbiome and its biological context

License

Notifications You must be signed in to change notification settings

Multiomics-Analytics-Group/MicW2Graph

Repository files navigation

MicW2Graph

PyPI - License DOI

Building a knowledge graph of the wastewater treatment microbiome and its biological context

Table of contents:

About the project

Wastewater treatment (WWT) is the process of removing contaminants from used water before it is discharged back into the environment, which contributes to address water scarcity and to protect aquatic ecosystems. Recent advances in high-throughput omics technologies have facilitated the study of microbiomes from complex environmental samples such as WWT. A comprehensive study of an environmental microbiome requires integrating data from various studies and meta-omics technologies, as well as biological knowledge to interpret these data.

In this project, we investigated the microbiome of the WWT process to build MicW2Graph, an open-source knowledge graph that integrates metagenomic and metatranscriptomic information with their biological context, including biological processes, environmental and phenotypic features, chemical compounds, and additional metadata. We developed a workflow to collect meta-omics datasets from MGnify and infer potential interactions among microorganisms through microbial association networks. MicW2Graph enables the investigation of research questions related to WWT, focusing on aspects such as microbial connections, community memberships, and potential ecological functions.

The following figure shows the general workflow of the MicW2Graph project:

Methods MicW2Graph

Data

WWT meta-omics studies were queried from the MGnify API using experiment type and biome parameters. Further filters were applied based on experimental and taxonomic criteria. The abundance tables from the filtered studies were then grouped by biome and experiment type to infer microbial association networks. The workflow for retrieving and filtering WWT meta-omics studies from MGnify is summarized in the diagram below:

MGnify studies filtering

The code to retrieve the data from MGnify is available in this GitHub repository.

Exploratory data analysis

A general overview of the filtered studies was provided through various plots, describing the number of studies and samples, experiment types, sampling countries, sub-biomes, and other relevant metadata.The exploratory data analysis was encapsulated in a module of the MicW2Graph web application, containing a general overview of all studies, studies by sub-biomes, individual studies, and a section for conducting pairwise comparisons between studies.

EDA MicW2Graph

Microbial association networks

MANs are weighted and undirected networks, defined as G = (V, E), where V is a set of nodes and E is a set of edges. Nodes in these networks are Operational Taxonomic Units at a specific taxonomic level, while edges indicate substantial co-presence (positive interaction) or mutual exclusion (negative interaction) trends in microorganism abundances across samples. Weights in MANs correspond to association values among species defined by the inference method, and there is an edge between two nodes if this number is greater than or equal to a given cutoff t.

In this project, we selected the Correlation inference for Compositional data through Lasso (CCLasso) method. Network inference was conducted using the NetCoMi R package. The MANs for this study are available for download and visualization in the MicW2Graph web application.

EDA MicW2Graph

The code for the network inference and analysis of MANs is available in this GitHub repository.

MicW2Graph

MicW2Graph incorporates the MANs with the optimal association threshold for each WWT sub-biome and experiment type, the biological context of the species within the MANs, and ontologies that standardize and expand the information of this resource. This KG comprises 1247 nodes and 9749 relationships, categorized into 12 node labels and 8 relationship labels. The relationships in MicW2Graph are classified as taxonomic, functional, and data-driven, reflecting the different layers of knowledge available in the KG.

The MicW2Graph metagraph and a snapshot of the graph database with nodes and edges for all sub-biomes and experiment types are shown below:

MicW2Graph metagraph

The KG and sub-biome subgraphs are available for download and visualization in the MicW2Graph web application.

MicW2Graph metagraph

Case studies

The use cases demonstrate the potential of MicW2Graph to discover new species associated with WWT biological processes, showing how the available information of well-known species can help to predict potential functions and traits for less studied species. These species and communities can be further investigated as potential candidates to optimize the bioremediation process. The subgraphs for the case studies can be visualized and downloaded in the MicW2Graph web application.

Case studies MicW2Graph

How to run the web app locally?

Poetry was used to create a Python virtual environment, which allows the management of python libraries and their dependencies. Each Poetry virtual environment has a pyproject.toml file with the names and versions of libraries installed, and a poetry.lock file, a JSON file that contains versions of libraries and their dependencies.

To create a Python virtual environment with libraries and dependencies required for this project, you should install Poetry, clone this GitHub repository, open a terminal, move to the folder containing this repository, and run the following commands:

# Create the Python virtual environment 
$ poetry install

# Activate the Python virtual environment 
$ poetry shell

You can find a detailed guide on how to use Poetry here.

Alternatively, you can create a conda virtual environment with the required libraries using the requirements.txt file.

After installing the libraries, you can run the streamlit app locally with the command below:

$ streamlit run MicW2Graph_Home.py

Credits and Contributors

Contact

If you have comments or suggestions about this project, you can open an issue in this repository.