Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[phylo] CI workflow DAG includes update_example_data #237

Open
joverlee521 opened this issue Feb 13, 2024 · 0 comments
Open

[phylo] CI workflow DAG includes update_example_data #237

joverlee521 opened this issue Feb 13, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@joverlee521
Copy link
Contributor

Context

Sometimes running the phylo workflow with the CI configs locally includes the update_example_data rule in the DAG:

$ nextstrain build . --configfile profiles/ci/builds.yaml -n
Building DAG of jobs...
Job stats:
job                            count
---------------------------  -------
align                              1
all                                1
ancestral                          1
clades                             1
colors                             1
combine_samples                    1
copy_example_data                  1
decompress                         1
download                           1
export                             1
filter                             1
final_strain_name                  1
fix_tree                           1
mask                               1
mutation_context                   1
recency                            1
refine                             1
rename_clades                      1
reverse_reverse_complements        1
subsample                          2
traits                             1
translate                          1
tree                               1
update_example_data                1
total                             25
...
Reasons:
    (check individual jobs above for details)
    code has changed since last execution:
        decompress
    input files updated by another job:
        align, all, ancestral, clades, colors, combine_samples, copy_example_data, decompress, export, filter, final_strain_name, fix_tree, mask, mutation_context, recency, refine, rename_clades, reverse_reverse_complements, subsample, traits, translate, tree, update_example_data
    missing output files:
        download
    set of input files has changed since last execution:
        decompress
Some jobs were triggered by provenance information, see 'reason' section in the rule displays above.
If you prefer that only modification time is used to determine whether a job shall be executed, use the command line option '--rerun-triggers mtime' (also see --help).
If you are sure that a change for a certain output file (say, <outfile>) won't change the result (e.g. because you just changed the formatting of a script or environment definition), you can also wipe its metadata to skip such a trigger via 'snakemake --cleanup-metadata <outfile>'. 
Rules with provenance triggered jobs: decompress

This is not an issue in our automated CI runs via GitHub Action because the GH Action workflow does a clean clone of the repo.

Possible solutions

  1. Manually removing the local .snakemake directory clears the Snakemake cache and resolves the issue.
  2. Move the chores.smk file to be conditionally included in the core phylo workflow
  3. Move the chores.smk file to a separate build-config that extends the workflow with custom_rules (conforms to the pathogen-repo-guide)
@joverlee521 joverlee521 added the bug Something isn't working label Feb 13, 2024
@joverlee521 joverlee521 self-assigned this Feb 13, 2024
joverlee521 added a commit that referenced this issue Feb 13, 2024
Since the chores rules are internal Nextstrain rules, they do not need
to be part of the core workflow.

This resolves the confusion the Snakemake has when running CI locally.¹

README.md includes the new instructions on how to invoke the
workflow to update the example data. This requires two config files
because we need the default config to provide the required config params
for the core workflow and we need the custom build config to include
the custom rule.

¹ #237
joverlee521 added a commit that referenced this issue Feb 14, 2024
Part of work to update this repo to match the pathogen-repo-guide.

Since the chores rules are internal Nextstrain rules, they do not need
to be part of the core workflow. This also resolves the DAG confusion
that Snakemake occassionally runs into when running CI locally.¹

README.md includes the new instructions on how to invoke the workflow to
update the example data. This requires two config files:

1. The CI config to provide all required config params and to ensure
the example data uses correct `strain_id_field` for CI builds.
2. The chores config to include the custom rules

¹ #237
joverlee521 added a commit that referenced this issue Feb 16, 2024
Part of work to update this repo to match the pathogen-repo-guide.

Since the chores rules are internal Nextstrain rules, they do not need
to be part of the core workflow. This also resolves the DAG confusion
that Snakemake occassionally runs into when running CI locally.¹

README.md includes the new instructions on how to invoke the workflow to
update the example data. This requires two config files:

1. The CI config to provide all required config params and to ensure
the example data uses correct `strain_id_field` for CI builds.
2. The chores config to include the custom rules

¹ #237
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant