Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file structure for no_pred #4

Closed
fungal-spore opened this issue Nov 26, 2019 · 3 comments
Closed

file structure for no_pred #4

fungal-spore opened this issue Nov 26, 2019 · 3 comments
Labels
enhancement New feature or request

Comments

@fungal-spore
Copy link

Hello,
I have pangloss pipeline setup and ran successfully on the test data provided.

We have used alternative gene pred pipeline so have nucleotide and amino seq for our isolates.
We will use --no_pred argument, however it is not clear what directory structure, files, and file locations are needed for this to function properly. We would still like to run all other arguments (e.g. blastall, panoct etc). Can you clarify how to arrange files for this?
Thanks!

@fungal-spore
Copy link
Author

I figured it out, for future reference you need:
*.faa and *.attributes in ./gm_pred/sets
then you can run --no_pred.

I had to lookup from PanOCT (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3526259/) requirements for *.attributes and write script to convert my *.gff3 files from gene prediction into format that was needed.

@chmccarthy
Copy link
Owner

Hi fungal-spore,

Thanks for pointing this out, there's definitely more info I need to provide in the manual (and maybe in the README) about certain usage situations.

For the moment if users want to use protein and location data from other sources like say NCBI, they'll need to write their own script to convert GFF or GTF files into PanOCT-compatible attributes files. In the future I might look into seeing if something like gffutils might make this aspect of data import easier. My past experiences with parsing GFF files in python were... iffy to put it mildly.

Going to pin this issue just to give people a heads-up in the meantime.

@chmccarthy chmccarthy pinned this issue Dec 4, 2019
@chmccarthy chmccarthy added the enhancement New feature or request label Dec 4, 2019
@fungal-spore
Copy link
Author

I forgot to mention that you also need the *.nucl file in /gm_pred/sets too, as well you need genomes.fna and genome.txt in ./genomes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants