Skip to content

michkam89/PAPi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

title author output
PAPi (Pangenome Analysis Pipeline) v1.0.0
Michał Kamiński
word_document pdf_document
default
default

DESCRIPTION

The analysis is designed to calculate core-pangenome for defined dataset of given protein sequences. At the moment PAPi in not self-sufficient and require some user effort. User should provide .faa files with appropriately formatted fasta headers and file names:

Fasta header must contain protein ID and genome name separated by underscore sign

Please name your genome files according to this template:

  • Genus_species_strain.faa eg. Sphingopyxis_lindanitolerans_WS5A3p.faa
  • Genus_sp._strain.faa eg. Sphingopyxis_sp._A083.faa

PREPROCESS .FAA FILES

  1. Run "pre_papi.sh test" command to check if preprocessing step works correctly
  2. Run "pre_papi.sh user your/input/dir your/output/dir"
  • your input directory should contain only .faa files
  1. IMPORTANT When script is completed please run CD-HIT program with your own parameters and provide the .clstr file as an input for further analyses

RUN PAPi

From this point PAPi will do the analysis for you.

  1. Ensure that you have gnames_file.csv in PAPi directory (automatically generated by pre_papi). If you didn't use pre_papi please create .csv file listing genomes in one column, one genome per row like this:
df <- readr::read_delim(
  file = "gnames_file.csv",
  delim = ",",
  col_names = FALSE, )
head(df)
  1. run ./papi.sh test to run the analysis on test data (sample from my publication dataset)
  • If the analysis is completed with success you will see 4 plots in PAPi/plots directory
  1. Run PAPi with your data and have fun :)

If you found this code usefull please cite this repository and/or this paper: Kaminski, M.A.; Sobczak, A.; Dziembowski, A.; Lipinski, L. Genomic Analysis of γ-Hexachlorocyclohexane-Degrading Sphingopyxis lindanitolerans WS5A3p Strain in the Context of the Pangenome of Sphingopyxis. Genes 2019, 10, 688.

Thanks!