Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drone step hanging for a bit after cache restore #194

Open
xvandish opened this issue Dec 14, 2021 · 5 comments
Open

Drone step hanging for a bit after cache restore #194

xvandish opened this issue Dec 14, 2021 · 5 comments
Labels
bug Something isn't working need-more-info

Comments

@xvandish
Copy link

xvandish commented Dec 14, 2021

Describe the bug
In Drone, when running the plugin on a cache restore using GCS, the restorer component prints out that restore is finished and took x seconds. The Drone step for the restore, however, doesn't terminate for another ~30-50s. A screenshot is attached below.

In the screenshot you can see the component=restorer msg="cache restored" took=11.174617156s is printed as 12s by drone, but the pipeline step that does the restore still took ~54s.

The cache restored message is printed here and called by Exec here, and I don't see any work that occurs after. Could it be a hanging gcs connection that isn't immediately terminated and a worker waits around for? I'll take a look on my own, but figured I'd file this just in case this is a know issue or I made a mistake somewhere.

Any questions let me know, thanks!

To Reproduce
Steps to reproduce the behavior:

  1. Using ... config (in .jsonnet format)
local RebuildOrRestoreCache(isRestore) = {
  name: "%s-cache" % (if isRestore then "restore" else "rebuild"),
  image: "meltwater/drone-cache:latest",
  environment: {
    GOOGLE_APPLICATION_CREDENTIALS: "./credentials.json",
    BACKEND_OPERATION_TIMEOUT: "12m"
  },
  pull: "if-not-exists",
  settings: {
    debug: true,
    backend: "gcs",
    restore: isRestore,
    rebuild: !isRestore,
    override: false,
    // checksum function provided by plugin - https://github.com/meltwater/drone-cache/blob/master/DOCS.md#using-cache-key-templates
    cache_key: '{{ checksum "yarn.lock" }}',
    archive_format: "gzip",
    bucket: "...",
    region: "...",
    mount: [
      "node_modules",
    ],
  },
};
  1. While ...
    Running the restore-cache step created above
  2. See error
    Pipeline step finishes much later than the final output.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
Screen Shot 2021-12-14 at 4 38 38 PM

Desktop (please complete the following information):
Running in CI off of the official image.

Additional context
Add any other context about the problem here.

@xvandish xvandish changed the title Mismatching drone step timing and component=restorer msg="cache restored" took=11.174617156s Drone step hanging for a bit after cache restore Dec 14, 2021
@jimsheldon
Copy link
Contributor

Thank you for reporting this issue, we will investigate as soon as possible.

@jimsheldon jimsheldon added the bug Something isn't working label Dec 17, 2021
@bdebyl
Copy link
Contributor

bdebyl commented Jul 19, 2022

@xvandish is this still an issue you are facing? If so I can try to replicate it, though not having used GCS and having no live test enviropnment for it this may prove challenging to replicate.

@messense
Copy link

@bdebyl I'm facing the same kind of issue with AWS S3.

@rmannibucau
Copy link

rmannibucau commented Jul 25, 2023

Hi, I have the exact same issue on premise using a local volume. Logs state it takes a few sec (around 10s) but step lasts for between 1 and 2 minutes.
I wondered if it can be due to the updates drone does to the pod images (replacing placeholders) but if so it is not convenient and we should report them to implement the pipeline differently on kubernetes probably.

@rmannibucau
Copy link

Hi,

any news on that? I get it on a single node kubernetes cluster with a local host volume for the cache so it is quite weird and bothering for a pipeline which should be fast (~1mn) it adds another minute of overhead/latency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working need-more-info
Projects
None yet
Development

No branches or pull requests

5 participants