You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For users who want to use GISAID data with this workflow, the following steps work nearly as expected.
These steps assume you have downloaded:
all sequences in FASTA format with whitespace replaced by underscore
patient metadata
# Download sequences: data/gisaid_pox_2022_06_16_19.fasta# Download patient metadata: data/gisaid_pox_2022_06_16_19.tsv# Note: patient metadata lacks submitting/originating lab.# Parse out metadata from sequence deflines.
augur parse \
--sequences data/gisaid_pox_2022_06_16_19.fasta \
--fields strain gisaid_epi_isl date \
--output-sequences data/sequences.fasta \
--output-metadata data/sequence_metadata.tsv
# Join sequence metadata with patient metadata.
csvtk --tabs join -f 1 \
data/sequence_metadata.tsv \
data/gisaid_pox_2022_06_16_19.tsv > data/metadata.tsv
# TODO: Need a transform for GISAID locations like the one we have for GenBank.# Run workflow.# TODO: This step requires users to know that the "wrangling" of metadata renames the "strain" column to "strain_original"# so they can rename it back to "strain". Correspondingly, the user has to tell the workflow not to use "strain_original"# as the display strain name.
nextstrain build \
--docker \
--image=nextstrain/base:branch-nextalign-v2 \
--cpus 1 \
. \
--configfile config/config_mpxv.yaml \
--config strain_id_field=strain_original display_strain_field=strain
Note, the biggest issue with the implementation above is that there is no transform command to convert GISAID's location field to the standard Nextstrain geographic columns (region, country, division, and location). This means the default Augur filter logic that groups by country and year prints a warning message that it cannot find a "country" column and only groups. In Augur 16.0.0, this missing group-by column will produce an error message, so we should consider implementing the transform for GISAID locations.
Given the commands above, however, I get the following tree from the workflow:
The very long branches also indicate that users will need to manage their own list of strains to exclude, since strain names will not match GenBank accessions.
The text was updated successfully, but these errors were encountered:
For users who want to use GISAID data with this workflow, the following steps work nearly as expected.
These steps assume you have downloaded:
Note, the biggest issue with the implementation above is that there is no transform command to convert GISAID's location field to the standard Nextstrain geographic columns (region, country, division, and location). This means the default Augur filter logic that groups by country and year prints a warning message that it cannot find a "country" column and only groups. In Augur 16.0.0, this missing group-by column will produce an error message, so we should consider implementing the transform for GISAID locations.
Given the commands above, however, I get the following tree from the workflow:
The very long branches also indicate that users will need to manage their own list of strains to exclude, since strain names will not match GenBank accessions.
The text was updated successfully, but these errors were encountered: