Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

necessary fields for "-f" not stated correctly in documentation, and also not given by "--arb-list-fields"? #96

Open
jvollme opened this issue Nov 5, 2020 · 1 comment

Comments

@jvollme
Copy link

jvollme commented Nov 5, 2020

Until now, I was still struggling to get the lca classification when running sina locally. Now, using the newest version on bioconda (v1.7.1), I still could not get it to work based on documentation at "readthedocs" or supplied by the help function of sina itself.

All of these seemed to suggest that the field to specify would be -f tax_slv. However, this just results in an empty column without information. Some further trial and error based on infos the verbose output finally led me to try -f lca_tax_slv.
This finally seems to be the correct field, giving me the lca classification.

However this field is not only not mentioned in any of the documentations, but also not even when trying to get a list of the actual fields available in the reference-database itself (using "--arb-list-fields")?

Is this perhaps a specific error of this particular sina version? Or should the documentation be corrected?

@epruesse
Copy link
Owner

Hi @jvollme,

--arb-list-fields is new - I put it in exactly for your case. Knowing where the taxonomy might be stored in the reference database was a little too esoteric.

The reference database, e.g. the SILVA one, needs to contain a taxonomy classification in "materialized path" format. So some field that says "Bacteria; Proteobacteria; Gammaproteobacteria; ...". When you use --lca-fields <field>[:<field>,...], SINA will do a LCA style classification based on the input fields specified. It will put the output classification into lca_<field>. So if you say --lca-fields tax_slv it will generate lca_tax_slv. The -f flag is also new. It allows reducing the number of output fields to the CSVs become more usable. So technically, it should be --lca-fields tax_slv, -f lca_tax_slv.

Not as straight forward as I had thought. I wasn't using both at the same time.

It sounds like perhaps --lca-fields tax_slv should generate its output in tax_slv? Or at least have an option to do that? The original thought was to have it clear which fields where input data, and which where calls made by a different method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants