Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingest: canonicalize strain names #32

Open
joverlee521 opened this issue Jun 7, 2022 · 1 comment
Open

ingest: canonicalize strain names #32

joverlee521 opened this issue Jun 7, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@joverlee521
Copy link
Contributor

Context

Currently, the ingest pipeline accepts any format for the strain names.
We should canonicalize them to have prettier names for display in Auspice and to have a way to deduplicate sequences.

Description

We need a clear standard format for strain names. If we follow the existing pattern we use for other pathogens (e.g. SARS-CoV-2), this would be <country>/<sample_id>/<year>

Once we've decided on a format, we should add necessary transforms to ingest/bin/transform-strain-names.

@corneliusroemer
Copy link
Member

To bump this: there's a strain name from Sweden that's particularly unhelpful: Lesion for genbank accession OX009124
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development

No branches or pull requests

2 participants