Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always return the upstream aliases when no alias groups are generated #2374

Open
Moullisha opened this issue Jul 5, 2024 · 12 comments · May be fixed by #2376
Open

Always return the upstream aliases when no alias groups are generated #2374

Moullisha opened this issue Jul 5, 2024 · 12 comments · May be fixed by #2376
Labels
bug Something isn't working data quality Issues with data quality

Comments

@Moullisha
Copy link

Describe the bug
Some vulnerabilities in OSV do not mention alias whereas the source link has alias data.

To Reproduce
https://vuln.go.dev/ID/GO-2024-2947.json mentions two aliases whereas https://api.osv.dev/v1/vulns/GO-2024-2947 does not.
Similar behaviour observed in case of GO-2024-2918, GHSA-v6v8-xj6m-xwqh, GO-2024-2963

Expected behaviour
If alias is available in source data, same should be provided by OSV.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

@cuixq cuixq added the data quality Issues with data quality label Jul 8, 2024
Copy link

github-actions bot commented Jul 8, 2024

✨ Thank you for your interest in OSV.dev's data quality! ✨

Please review our FAQ entry on how to most efficiently have this addressed.

@another-rex another-rex added the bug Something isn't working label Jul 8, 2024
@another-rex
Copy link
Contributor

So this is caused by our AliasGroup computation finding there were too many transitive aliases, likely because the GO entry refers to (CVE-2024-6104), and I see a large number of chain guard advisories also refer to CVE-2024-6104, so deciding to skip generating the alias group.

There are 2 separate issues here:

  1. When no alias groups are generated, we should still always return the upstream aliases for this specific entry.
  2. Chain guard might not be correctly using aliases, and should be publishing related instead.

@andrewpollock
Copy link
Contributor

2. Chain guard might not be correctly using aliases, and should be publishing related instead.

@cpanato are you the right person to review the Chainguard records being published? Refer to https://ossf.github.io/osv-schema/#aliases-field and https://ossf.github.io/osv-schema/#related-field

@cpanato
Copy link

cpanato commented Jul 8, 2024

  1. Chain guard might not be correctly using aliases, and should be publishing related instead.

@cpanato are you the right person to review the Chainguard records being published? Refer to https://ossf.github.io/osv-schema/#aliases-field and https://ossf.github.io/osv-schema/#related-field

can be, what is the issue, i did not understand that, do you have an example?

@andrewpollock
Copy link
Contributor

can be, what is the issue, i did not understand that, do you have an example?

In a nutshell, aliases is meant to be 1:1 under most circumstances, whereas related may be 1:many

https://ossf.github.io/osv-schema/#aliases-field

Aliases should be considered symmetric

https://ossf.github.io/osv-schema/#related-field

Related vulnerabilities are symmetric but not transitive.

(any feedback on how to make this aspect of the schema documentation more comprehensible is very welcome)

It sounds like there are many Chainguard advisories aliased to CVE-2024-6104, which tends to indicate something is wrong:

  • CGA-23mp-53xj-rxq8
  • CGA-27mv-v84v-3r2r
  • CGA-59jh-x873-28v5
  • CGA-6mh4-qmrc-qf44
  • CGA-cc4w-g258-gcp4

@cpanato
Copy link

cpanato commented Jul 8, 2024

I see, i think I got it, I will update to move that to related field. thank you for the feedback

cc @luhring

@luhring
Copy link

luhring commented Jul 8, 2024

I'd be cool with making the change in the Chainguard OSV data. We'd also want to give other consumers of the data a heads up if we do, but that wouldn't delay the change by more than a week or so.

I've read the docs for aliases and related a few times, and I think I'm not yet fully appreciating the correctness of related for this case, maybe you can help me. 😃

Chainguard's advisories each describe the impact of one vulnerability in a specific distro package that we ship. These distro packages are downstream of the upstream components for which the vulnerabilities are originally reported. So the vulnerability itself (i.e. the specific software weakness) is the same, and any fixes upstream are inherited in these downstream distro packages. But the two software artifacts are distinct from one another.

I think I'm getting confused around these two sentences from the aliases docs:

Two vulnerabilities can be described as aliases if a potential patch that addresses one of the vulnerabilities (and no other vulnerabilities) will also address the other vulnerability, and vice versa.

^ makes me lean toward "this isn't an alias". While it's true that our advisories don't describe multiple upstream vulnerabilities, patches to the vulnerability downstream in the distro package build process don't benefit the upstream project. But...

Aliases may be used for vulnerabilities affecting different packages or ecosystems as long as they follow this definition.

^ makes me lean back toward "well, maybe this is an alias?", because it actually allows for there to be distinct packages in distinct ecosystems, which was my only hesitation in satisfying the first constraint.

The examples given for the docs of the related field are:

  • A similar but completely different vulnerability.

^ This doesn't seem to apply here.

  • A similar OSV entry that bundles multiple distinct vulnerabilities in the same entry.

^ Our OSV entries aren't bundling multiple distinct vulnerabilities, like other distro advisory systems sometimes do. However, each vulnerability in a language ecosystem (e.g. GHSA-, GO-, etc.) is "1:many" with Chainguard's advisories, in the case when a given upstream component is used in multiple downstream distro packages.

  • Cases that do not satisfy the strict definition of aliases.

^ This is helpful, but only once we clearly understand the definition of aliases.

We defer to the OSV maintainers on how Chainguard should be using the OSV format, of course! I'm just looking to make sure I can follow along intuitively.

@michaelkedar
Copy link
Member

I think Chainguard might be on a bit of an edge case with our alias vs related distinction, but I'll try give our rationale:

The primary purpose of the aliases field is to allow for de-duplication of the same vulnerability from different data sources. e.g. CVE-2024-6104, GHSA-v6v8-xj6m-xwqh, and GO-2024-2947 all describe the same thing. A user of the OSV API would generally want to consider these all together as a single vulnerability for things like scanning and VEX generation.

Since we can't realistically expect all our data sources to know and list the correct IDs from every other data source, OSV computes the complete set of aliases by transitively combining sets of aliases from each ingested record. This becomes problematic if misused, since unrelated vulnerabilities can become lumped together by transitive chains of aliases. #888 might provide some background.

For Chainguard, CGA-23mp-53xj-rxq8, CGA-2gpx-j9wf-x7gf, and about 30 others all consider themselves aliases of CVE-2024-6104. This makes OSV treat all of them as aliases of each other as well, which is probably not intentional.

From what I can tell, these are listed as aliases because the artifacts are all built with the vulnerable Go package. While this might be considered as ambiguously correct, it doesn't produce a desirable alias grouping and therefore should not be done.
To give my intuitive reasoning as to why aliases is wrong in this case: CGA-23mp-53xj-rxq8 contains CVE-2024-6104, but CGA-23mp-53xj-rxq8 is not CVE-2024-6104.

The related field is meant for informational purposes only, not de-duplication, and doesn't expand transitively so it's generally alright to add IDs there.

Aliases may be used for vulnerabilities affecting different packages or ecosystems as long as they follow this definition.

^ makes me lean back toward "well, maybe this is an alias?", because it actually allows for there to be distinct packages in distinct ecosystems, which was my only hesitation in satisfying the first constraint.

Admittedly, I'm not entirely sure what the schema's intent was for this line. I think it's there to allow aliases across ecosystems in the same way a single OSV record can contain multiple different ecosystems, but it is also a bit unclear to me.

Hopefully this helps to clear up why we want it this way. Sorry this comment became a bit of an essay 😅

@luhring
Copy link

luhring commented Jul 9, 2024

Thanks, this is incredibly helpful, @michaelkedar! 🙏

Here's my proposal for next steps:

  1. To help address the breakage among the broader OSV ecosystem right now, we'll move our current aliases data to the related field.
  2. To help with the ambiguity in the schema docs, I'll open a PR with some suggestions. If those suggestions turn out not to be helpful, no hard feelings here. It was a bit difficult to figure out the "right answer" from the docs alone, so if we can make things any easier for the next onboarded ecosystem, then that's great.
  3. It looks like the aliases documentation line in question was updated in Update aliases & related definitions ossf/osv-schema#193 — that was a great read. I share the concern expressed in that PR: There seems to be a "hole" in the OSV spec when it comes to distros' ability to participate. By moving to related, we're missing out on the opportunity to have strong, automation-usable links to the same vulnerability as described by our advisories. It seems like there should be a new field that's similar to aliases, but for strong "asymmetric" references, to help OSV better support vulnerability workflows beyond language ecosystems and into the world of distros. I can open an issue to capture this, and hopefully we'll have a good dialog there about potential improvements to the spec.

Please let me know how this sounds to you! 😃

luhring added a commit to luhring/wolfictl that referenced this issue Jul 9, 2024
See google/osv.dev#2374 for motivation.

Signed-off-by: Dan Luhring <[email protected]>
@andrewpollock
Copy link
Contributor

@luhring

2. To help with the ambiguity in the schema docs, I'll open a PR with some suggestions.

Please do, high quality documentation is immensely helpful with scaling. If it isn't easily comprehensible to you, it's very likely to be to other unfamiliar readers. Your contributions are very welcome.

I'll fork out your third point as an issue on the OSV-Schema so it doesn't get lost.

@andrewpollock
Copy link
Contributor

Hi @Moullisha thanks for flagging this, and sorry for the tangential noise with Chainguard record issues.

@andrewpollock andrewpollock changed the title Alias missing for some CVEs in OSV Always return the upstream aliases when no alias groups are generated Jul 10, 2024
@cpanato
Copy link

cpanato commented Jul 10, 2024

now we moved to use the related field: https://packages.cgr.dev/chainguard/osv/CGA-23mp-53xj-rxq8.json

thanks @luhring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data quality Issues with data quality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants