Skip to content

USask-BINFO/Predict-Phenotypes-from-Genotypes-using-NovGMDeep

Repository files navigation

Predicting Phenotypes From Novel Genomic Markers Using Deep Learning

NovGMDeep is a deep learning model designed for genomic selection, specifically focusing on predicting phenotypes using novel genomic markers (SVs and TES). This project aims to address the challenge of high dimensionality in genomic marker data by utilizing a one-dimensional deep convolutional neural network. The model employs convolutional, pooling, and dropout layers to mitigate overfitting and reduce complexity introduced by a large number of genomic markers. The model has been trained and evaluated using Arabidopsis thaliana and Oryza sativa samples, employing K-Fold cross-validation. The prediction accuracy is evaluated using Pearson’s correlation coefficient (PCC), Mean absolute error (MAE), and Standard deviation of MAE. The predicted results for the phenotypes showed a higher correlation when the model was trained with SVs and TEs than with SNPs.

NovGMDeep Architecture

Installation

Ensure you have Python 3.9 installed. Install required packages using:

pip install -r requirements.txt

Data

  • Access the full VCF variant files containing structural variants data for A. thaliana samples from the European Variation Archive (PRJEB38975).
  • Download the zipped folder containing CSV files with structural variants data: 'Deletions.csv', 'Duplications.csv', and 'Inversions.csv'.
  • Phenotype data for Flowering time of A. thaliana samples can be found in the file "FT10_arabi.csv".
  • TE genotype file for O. sativa. The three values -1, 0, and 1 indicate '1/1', '0/1', and '0/0'.
  • SNP genotype file for O. sativa
  • Associated phenotypic values for O. sativa

Usage

  1. Data Preprocessing

Select high-quality genotypes: Refer to quality_based_selection.ipynb.
Prepare data for model input: Refer to data_processing.ipynb.

  1. Data Split

Split training and testing datasets: Execute sv_data_split.py.

  1. Train the Model

Train the model: Execute sv_model_train.py.

  1. Test the Model

Test the trained model: Execute sv_model_train.py.

Citation

If you use this work in your research, please cite:

@article{sehrawat2023predicting,
  title={Predicting phenotypes from novel genomic markers using deep learning},
  author={Sehrawat, Shivani and Najafian, Keyhan and Jin, Lingling},
  journal={Bioinformatics Advances},
  volume={3},
  number={1},
  pages={vbad028},
  year={2023},
  publisher={Oxford University Press}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages