Skip to content

This project is an automated pipeline to generate annotated gene expression atlases for kingdoms of life

Notifications You must be signed in to change notification settings

wgohome/plants-pipeline

Repository files navigation

LSTrAP-Kingdom: an automated pipeline to generate annotated gene expression atlases for kingdoms of life

By NTU Plants Systems Biology and Evolution Laboratory

This repository is found in this Github Repository, with an accompanying paper found here and preprint version here. Do create pull requests for issues/bugs and feature requests. Contact me for feedback or reporting bugs.

Guides

A. First local setup of the pipeline

  • This segment only needs to be implemented at the first setup of this repository on your local machine/server.

B. Initialization for each session

  • These commands need to be run everytime the pipeline is accessed from a new terminal session. They will load the python environment with the installed packages, and add ascp and kallisto commands to the global environment $PATH. If kallisto or ascp(Aspera CLI) is not downloaded, they will also be downloaded.

C & D. Download guide

  • C. Bulk Download

    • The steps to run a download job for multiple species are outlined here.
  • D. Small download job

    • The steps to run a download job for a single species are outlined here.

E. Directory structure

  • This segment provides an overview of the structure of the directories in the main scripts plants-pipeline directory and the data directory pipeline-data.

F. Postprocessing

  • After the download job is completed, these are the steps needed to generate the TPM matrices and perform quality control, which includes:
    • Generating TPM matrices
    • Quality control
    • Performing coexpression to count number of ribosomal gene neighbours for every gene
  • The F1 scores for the benchmark in the paper are generated using the scripts here.

G. Annotation Benchmark

  • Describes how the annotation accuracy and coverage were derived for the publication.

The figures in the publication were prepared with the code given in this iPython notebook.

About

This project is an automated pipeline to generate annotated gene expression atlases for kingdoms of life

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published