Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support http.extraheader #262

Open
pmrowla opened this issue Jul 29, 2023 · 22 comments
Open

support http.extraheader #262

pmrowla opened this issue Jul 29, 2023 · 22 comments

Comments

@pmrowla
Copy link
Contributor

pmrowla commented Jul 29, 2023

jelmer/dulwich#882

in libgit2/pygit2 this requires exposing git_fetch_options.extra_headers in the pygit remote callbacks

@dberenbaum
Copy link
Contributor

@pmrowla What is the level of effort to support http.extraheader? Thinking about this some more, it would be nice to have that support because I think we increasingly have valid CI use cases that don't require CML at all, and avoiding both cml ci and iterative/setup-cml would be preferable.

@pmrowla
Copy link
Contributor Author

pmrowla commented Aug 1, 2023

@dberenbaum I think it would probably take me about a full sprint. One thing to note here is that the git http. config options also support a lot of ways to do url substitution which also get used in a lot of the CI platforms and I'm not sure if any of them are actually supported in either dulwich or libgit2

@dberenbaum
Copy link
Contributor

Discussed that this isn't a priority for now given the level of effort and ability to handle this in iterative/setup-cml and/or iterative/setup-dvc (and in cml ci)

@shcheklein
Copy link
Member

For the record we hit the same issue with gto here: iterative/gto#277

Also, AFAIU cml ci requires REPO_TOKEN to be set which is an extra items - security issues, and overall requires some setup.

@dberenbaum
Copy link
Contributor

@shcheklein Can we recommend to include setup-dvc and/or setup-cml when using our tools inside GHA?

@dberenbaum
Copy link
Contributor

@dacbd Do you know what exactly are the requirements to include in a GHA workflow in order to push tags via gto?

@shcheklein
Copy link
Member

@shcheklein Can we recommend to include setup-dvc and/or setup-cml when using our tools inside GHA?

we probably can. But it also means we'll need to do the same with gto for example which becomes a bit heavy and unexpected (at least by its name - setup-cml, setup-dvc).

Also, by biggest concern that it requires and extra token to be set - REPO_TOKEN AFAIU.

@dberenbaum
Copy link
Contributor

we probably can. But it also means we'll need to do the same with gto for example which becomes a bit heavy and unexpected (at least by its name - setup-cml, setup-dvc).

I can't find it ATM, but we had some discussion about whether we should consolidate to use a single action everywhere like setup-dvc even when using cml, gto, etc., or to have a setup-gto that would take care of this.

Also, by biggest concern that it requires and extra token to be set - REPO_TOKEN AFAIU.

@dacbd To confirm, is it required to include this?

        env:
          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}

@shcheklein Is your main concern about including this or managing the secret in GH? I think it only uses the auto-generated secrets.GITHUB_TOKEN, so it does not require setting up a secret manually AFAIK.

@shcheklein
Copy link
Member

I think it only uses the auto-generated secrets.GITHUB_TOKEN, so it does not require setting up a secret manually AFAIK.

If that's the case, it less important to address then. I was not sure (and I'm still not) about this. I tried to use it here https://github.com/shcheklein/PR-Merge-Register and the action failed with:

{"level":"error","message":"token not found","stack":"Error: token not found\n    at new Github (/usr/local/lib/node_modules/@dvcorg/cml/src/drivers/github.js:91:23)\n    at CML.getDriver (/usr/local/lib/node_modules/@dvcorg/cml/src/cml.js:161:35)\n    at CML.ci (/usr/local/lib/node_modules/@dvcorg/cml/src/cml.js:487:25)\n    at exports.handler (/usr/local/lib/node_modules/@dvcorg/cml/bin/cml/repo/prepare.js:13:13)\n    at /usr/local/lib/node_modules/@dvcorg/cml/node_modules/yargs/build/index.cjs:1:8993\n    at /usr/local/lib/node_modules/@dvcorg/cml/node_modules/yargs/build/index.cjs:1:4949"}

I haven't had to time dig further tbh so see if it's about token being required or something else.

@dacbd
Copy link

dacbd commented Sep 15, 2023

Can we just add the token rewriting patch to scmrepo if it detects its running in github actions? Until there is proper support for http.extraheader

I try and do a draft PR so you can see what it might look like.

@dacbd Do you know what exactly are the requirements to include in a GHA workflow in order to push tags via gto?

Just need to include a custom auth header when pushing git refs over https AUTHORIZATION: basic ***

@shcheklein
Copy link
Member

@dacbd

Can we just add the token rewriting patch to scmrepo if it detects its running in github actions?

will it be rewriting the token that GH provides?

@dacbd
Copy link

dacbd commented Sep 15, 2023

@shcheklein no, we just need to pull token out of the git config entry and add it in the git origin as a username/password.

approx: https://token:ghs_***@github.com/org/repo.git'

@shcheklein
Copy link
Member

@dacbd I guess makes sense to me. I would rely on @pmrowla judgement on this. Practically it's a patching it upstream vs implementing a GH-specific path on scmrepo (probably a temporary solution). My 2cs - if it's simple and quick it's fine to have it.

@dacbd
Copy link

dacbd commented Sep 15, 2023

@pmrowla
Copy link
Contributor Author

pmrowla commented Sep 16, 2023

I don't think a GHA specific patch belongs in scmrepo, iirc some other users encountered a similar issue with aws codecommit and whatever their CI equivalent is.

If this is a priority we should just fix it properly and add support for http.extraheader so it works everywhere.

@shcheklein
Copy link
Member

I don't think a GH specific patch belongs in scmrepo, iirc some other users encountered a similar issue with aws codecommit and whatever their CI equivalent is.

can it be generalized on the scmrepo level (e.g. if it's always about the same stuff as on GH)?

would it be faster compared to changing it upstream? Clearly it's the right way to do this, if it takes let's say 10x more time then it might make sense to consider alternatives.

@pmrowla
Copy link
Contributor Author

pmrowla commented Sep 16, 2023

It can't be generalized at the scmrepo level since the headers have to be set in the actual HTTP requests (that are crafted entirely in the upstream git backends).

The GHA example would work purely in scmrepo since the way GHA uses it ends up just setting the HTTP basic Authorization header and you can accomplish the same thing by rewriting the entire git remote URL to fill in http://username:[email protected]/... (with the gha token instead as password)

But the extraheader really lets you set any HTTP header you want, so in the event a git forge uses it to set something other than Authorization, we can't accomplish it via rewriting the remote URL (and have to set the raw header in the HTTP request)

@shcheklein
Copy link
Member

Got it. Have you seen other (more or less meaningful, production) use case for the extraheader besides Auth?

@pmrowla
Copy link
Contributor Author

pmrowla commented Sep 16, 2023

Azure pipelines CI uses it to set the Authorization header to a bearer token (so it sets Authorization: Bearer <token string>. (the URL username:password rewrite workaround only works for setting Authorization: Basic <base64 encoded username:password>)

@dberenbaum
Copy link
Contributor

Can GTO fallback to using CLI Git?

@pmrowla
Copy link
Contributor Author

pmrowla commented Sep 18, 2023

We could make GTO init the scmrepo instances with gitpython as the highest priority backend, but that's still not ideal since it means DVC will end up using gitpython when it calls into GTO.

@dberenbaum
Copy link
Contributor

What about trying to prioritize gitpython in scmrepo if we detect we are in GitHub? Fixing in dulwich sounds best but the full sprint estimate makes it hard to prioritize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants