title | author | output | ||||
---|---|---|---|---|---|---|
PAPi (Pangenome Analysis Pipeline) v1.0.0 |
Michał Kamiński |
|
The analysis is designed to calculate core-pangenome for defined dataset of given protein sequences. At the moment PAPi in not self-sufficient and require some user effort. User should provide .faa files with appropriately formatted fasta headers and file names:
Fasta header must contain protein ID and genome name separated by underscore sign
Please name your genome files according to this template:
- Genus_species_strain.faa eg. Sphingopyxis_lindanitolerans_WS5A3p.faa
- Genus_sp._strain.faa eg. Sphingopyxis_sp._A083.faa
- Run "pre_papi.sh test" command to check if preprocessing step works correctly
- Run "pre_papi.sh user your/input/dir your/output/dir"
- your input directory should contain only .faa files
- IMPORTANT When script is completed please run CD-HIT program with your own parameters and provide the .clstr file as an input for further analyses
From this point PAPi will do the analysis for you.
- Ensure that you have gnames_file.csv in PAPi directory (automatically generated by pre_papi). If you didn't use pre_papi please create .csv file listing genomes in one column, one genome per row like this:
df <- readr::read_delim(
file = "gnames_file.csv",
delim = ",",
col_names = FALSE, )
head(df)
- run ./papi.sh test to run the analysis on test data (sample from my publication dataset)
- If the analysis is completed with success you will see 4 plots in PAPi/plots directory
- Run PAPi with your data and have fun :)
If you found this code usefull please cite this repository and/or this paper: Kaminski, M.A.; Sobczak, A.; Dziembowski, A.; Lipinski, L. Genomic Analysis of γ-Hexachlorocyclohexane-Degrading Sphingopyxis lindanitolerans WS5A3p Strain in the Context of the Pangenome of Sphingopyxis. Genes 2019, 10, 688.
Thanks!