Skip to content

Majid-Soheili/BMDSRA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BMD-SRA: A Boosting Model for Differentiating Sequence Read Archive Files Based on the Context.

The volume of the deposit sequence file is increase dramatically. Also, the submitter of the sequence file is main responsible for annotating. Although the submitter and public repositories pay attention to making accurate metadata, mistakes can happen. These issues can cause troubles in performing downstream analysis. BMD-SRA tries to differentiate the given sequence files into four categories including

  1. Meta Genomes
  2. Amplicons
  3. Single Amplified Genomes (SAGs)
  4. Isolated Genomes

For developing this model, some stages were tracked, which listed below:

  1. Preparing Metadata
  2. Downloading Sequence Files
  3. Feature Extraction
  4. Outlier Detection
  5. Developing Model
  6. Evaluation Model

How can you use it?

There are two ways for using the outcomes of the study. Generating your own model or Applying the generated model in your project.

Generating your own model

There is well-form documentation about preparing training data You can use the extracted features and generate your own model.

Load the generated model and apply it.

The generated model is accessible here. You can use the BMDSRA class and pass just two parameters to make an object.

  1. The path of the model.
  2. The path of the scaler.
After making an object of the BMDSRA class, just call predict function and pass the path of the sequence file.

It is worth mentioning that the BMD-SRA needs access to two files, including FeatureExtraction and Preprocessing. Also, accessing to the xgboost package is essential.

Example:

from Codes.BMDSRA import BMDSRA
model_path = "..\\..\\resource\\4-model\\model.json"
scaler_path = "..\\..\\resource\\4-model\\scaler.gz" 
model = BMDSRA(model_path, scaler_path)

seq_path = "..\\..\\resource\\2-subsra\\SRR1588386.fastq" 
res = model.predict(seq_path)
print(res)

To reach more sample about the running model you can see here