Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouping multiple unrelated Git repos into a single "affected" entry is confusing #2334

Open
Bo98 opened this issue Jun 21, 2024 · 7 comments
Assignees
Labels
documentation Improvements or additions to documentation frontend Frontend Infrastructure

Comments

@Bo98
Copy link

Bo98 commented Jun 21, 2024

Example: CVE-2020-8927

Multiple unrelated git repos are grouped into a single affected entry, which leads to a confusing "affected versions" list which seemingly only applies to the first git repo (at least in this example - is this guaranteed?).

Perhaps it would be more clear if the git repos were separated into individual entries with their own versions array?

@michaelkedar
Copy link
Member

@andrewpollock would be the best person to address this, but he's out-of-office until next week.

@michaelkedar michaelkedar added the data quality Issues with data quality label Jun 25, 2024
Copy link

✨ Thank you for your interest in OSV.dev's data quality! ✨

Please review our FAQ entry on how to most efficiently have this addressed.

@andrewpollock
Copy link
Contributor

Hi @Bo98

I'm sorry you're finding this behaviour confusing.

Do you have any suggestions on what would be helpful here?
An FAQ entry with signposting from the individual vulnerability page?

(please reopen with any concrete suggestions if you have them)

which seemingly only applies to the first git repo (at least in this example - is this guaranteed?)

In short, this is not a correct interpretation, see below.

The relevant code:

osv.dev/osv/impact.py

Lines 529 to 602 in dd9566e

def _analyze_git_ranges(repo_analyzer: RepoAnalyzer, checkout_path: str,
affected_range: vulnerability_pb2.Range,
new_versions: set, commits: set, new_introduced: set,
new_fixed: set) -> tuple[set, set]:
"""Analyze Git ranges.
Args:
repo_analyzer: an instance of RepoAnalyzer to use.
checkout_path: If defined, used in lieu of cloning the repo.
affected_range: the GIT range from the vulnerability.
new_versions: a set that will be in-place modified to contain any new
versions detected by analysis.
commits: a set that will be in-place modified to contain any commits???
new_introduced: a set that will be in-place modified to contain additional
introduced commits determined by cherry-pick detection.
new_fixed: a set that will be in-place modified to contain additional fixed
commits determined determined by cherry-pick detection.
Returns:
A tuple of the set of new_versions and commits
"""
package_repo = None
with tempfile.TemporaryDirectory() as package_repo_dir:
if checkout_path:
repo_name = os.path.basename(
affected_range.repo.rstrip('/')).rstrip('.git')
package_repo = repos.ensure_updated_checkout(
affected_range.repo, os.path.join(checkout_path, repo_name))
else:
package_repo = repos.clone_with_retries(affected_range.repo,
package_repo_dir)
all_introduced = []
all_fixed = []
all_last_affected = []
all_limit = []
for event in affected_range.events:
if event.introduced and event.introduced != '0':
all_introduced.append(event.introduced)
continue
if event.last_affected:
all_last_affected.append(event.last_affected)
continue
if event.fixed:
all_fixed.append(event.fixed)
continue
if event.limit:
all_limit.append(event.limit)
continue
try:
result = repo_analyzer.get_affected(package_repo, all_introduced,
all_fixed, all_limit,
all_last_affected)
except ImpactError:
logging.warning('Got error while analyzing git range in %s: %s',
affected_range.repo, traceback.format_exc())
return new_versions, commits
for introduced, fixed, _ in result.affected_ranges:
if introduced and introduced not in all_introduced:
new_introduced.add(introduced)
if fixed and fixed not in all_fixed:
new_fixed.add(fixed)
new_versions.update(result.tags)
commits.update(result.commits)
return new_versions, commits

So looking at https://api.osv.dev/v1/vulns/CVE-2020-8927 specifically:

curl -s https://api.osv.dev/v1/vulns/CVE-2020-8927| | jq '.affected[9].versions'
[
  "v0.1.0",
  "v0.2.0",
  "v0.3.0",
  "v0.4.0",
  "v0.6.0",
  "v1.0.0",
  "v1.0.0-rc1",
  "v1.0.0-rc2",
  "v1.0.1",
  "v1.0.2",
  "v1.0.3",
  "v1.0.4",
  "v1.0.5",
  "v1.0.6",
  "v1.0.7",
  "v1.1",
  "v1.1.0",
  "v1.1.0-preview1",
  "v1.1.1",
  "v1.1.2",
  "v1.1.4",
  "v2.0.0",
  "v2.0.0-preview1",
  "v2.0.0-preview2"
]

is derived from the relevant tags present in any of the three of the repos from the relevant ranges:

curl -s https://api.osv.dev/v1/vulns/CVE-2020-8927| jq -r '.affected[9].ranges[].repo'
https://github.com/dotnet/core
https://github.com/google/brotli
https://github.com/powershell/powershell

Proof:

$ git ls-remote -t https://github.com/dotnet/core | fgrep -f <(curl -s https://api.osv.dev/v1/vulns/CVE-2020-8927 | jq -r '.affected[9].versions[]')
1d31854df431523a4699a4a94dc924b76317f47b        refs/tags/v1.0.0
6251598c0087239b392cccda68515cd8c71d7f3d        refs/tags/v1.0.0^{}
6030c481946d683a68cd024d8b600483047783ca        refs/tags/v1.0.0-rc1
247cf6d5c4e91ea2b53182d152aa8eda5754aa1c        refs/tags/v1.0.0-rc2
4222ca3903679bbb680ef44abcde9a260ca97479        refs/tags/v1.0.1
0f3c68d295798951f6d96925f452718721b6f550        refs/tags/v1.0.1^{}
2cfe0ed1939ef93ebc245cc6bf6cbadde578075e        refs/tags/v1.0.10
cedfbca6aa645a550a3e9d221943395c9db7c510        refs/tags/v1.0.11
9bdb9f2628ae2d96fdd4a19aaa7707f1d5ec494f        refs/tags/v1.0.12
22422b4c3fe95777faf8feaebf098230291bd0f4        refs/tags/v1.0.13
ec9ff2d8f486d3ec5ee3c00b122b0c4a5f9eebc8        refs/tags/v1.0.14
9c95f339ba7e29c25cfb6753255bdb5cadea9957        refs/tags/v1.0.15
3dacb1d9cc36f1c9bd0fadb00f56ac356f7469be        refs/tags/v1.0.16
408817dd19dc3effd7e6f9657b1c31022d7a536d        refs/tags/v1.0.2
427b2f13a3b218d5fa309f6a12fa7b8b307ce20d        refs/tags/v1.0.2^{}
6474336beb517e5c5610791a30fa5cd35123dd93        refs/tags/v1.0.3
c6efffe6c7cb220af3819b4a9dabe687023984e0        refs/tags/v1.0.3^{}
1e76df0f0173a04819bfdd5f3f8c49a95e683bf2        refs/tags/v1.0.4
f0bd1c2180483dde5d5283639176f98918d1e3d4        refs/tags/v1.0.4^{}
11f502a023e97148333f90d4800d12d76240ef6e        refs/tags/v1.0.5
aaea89e8d9d2496291ff53396d0eccbca839ae83        refs/tags/v1.0.7
218888387c3e7aff6241f65599529620fb7b4127        refs/tags/v1.1
c00ac07535cc9c37cd8addfc8772e016fe6f5539        refs/tags/v1.1.0
6251598c0087239b392cccda68515cd8c71d7f3d        refs/tags/v1.1.0^{}
6c0b42a5c14815a3c6c66873c769103fe5592f28        refs/tags/v1.1.0-preview1
8b11243b41c6d44ed7f654b55d858001c129e22d        refs/tags/v1.1.1
f0bd1c2180483dde5d5283639176f98918d1e3d4        refs/tags/v1.1.1^{}
22422b4c3fe95777faf8feaebf098230291bd0f4        refs/tags/v1.1.10
ec9ff2d8f486d3ec5ee3c00b122b0c4a5f9eebc8        refs/tags/v1.1.11
9c95f339ba7e29c25cfb6753255bdb5cadea9957        refs/tags/v1.1.12
3dacb1d9cc36f1c9bd0fadb00f56ac356f7469be        refs/tags/v1.1.13
11f502a023e97148333f90d4800d12d76240ef6e        refs/tags/v1.1.2
aaea89e8d9d2496291ff53396d0eccbca839ae83        refs/tags/v1.1.4
8b41c63650f1c1223ce82de5d9040c1f7ff470c1        refs/tags/v1.1.5
d33e85105ecef7814cfff53852e89d7366ff2876        refs/tags/v1.1.6
cedfbca6aa645a550a3e9d221943395c9db7c510        refs/tags/v1.1.7
cedfbca6aa645a550a3e9d221943395c9db7c510        refs/tags/v1.1.8
9bdb9f2628ae2d96fdd4a19aaa7707f1d5ec494f        refs/tags/v1.1.9
f3c64ec2d713c053b20bcf90e962cd7fb161370c        refs/tags/v2.0.0
61c2be531cf5e36940baa5c67233b0a01c51e1dd        refs/tags/v2.0.0-preview1
a6a596009df17884a8260b5575e014fe55846d1a        refs/tags/v2.0.0-preview2

$ git ls-remote -t https://github.com/google/brotli | fgrep -f <(https://api.osv.dev/v1/vulns/CVE-2020-8927 | jq -r '.affected[9].versions[]')
d811b186c5037b434d56ddb831ceccdf5a954687        refs/tags/v0.1.0
7f7a2fb48cec63c0459ec6b6e7260810bfb01819        refs/tags/v0.2.0
98ed7a23a83d64133b0a36a884e489bffb0eb864        refs/tags/v0.3.0
29d31d5921b0a2b323ac24e7f7d0cdc9a3c0dd08        refs/tags/v0.4.0
46c1a881b41bb638c76247558aa04b1591af3aa7        refs/tags/v0.6.0
c60563591a9a86196f19987c81dde4384a088861        refs/tags/v1.0.0
5b4769990dc14a2bd466d2599c946c5652cba4b2        refs/tags/v1.0.1
0ad94eed00420bf1154cb16a289aa27efbb30c01        refs/tags/v1.0.2
533843e3546cd24c8344eaa899c6b0b681c8d222        refs/tags/v1.0.3
c6333e1e79fb62ea088443f192293f964409b04e        refs/tags/v1.0.4
b601fe817bd3217cb144bbb380a43cae8e847388        refs/tags/v1.0.5
6eba239a5bb553fd557b7d78f7da8f0059618b9e        refs/tags/v1.0.6
d6d98957ca8ccb1ef45922e978bb10efca0ea541        refs/tags/v1.0.7
ed738e842d2fbdf2d6459e39267a633c4a9b2f5d        refs/tags/v1.1.0
2a5a088b03ba5fd3aab4f34338c84e2c61d82c49        refs/tags/v1.1.0rc

$ git ls-remote -t https://github.com/powershell/powershell | fgrep -f <(https://api.osv.dev/v1/vulns/CVE-2020-8927 | jq -r '.affected[9].versions[]')
9bcbca3700ded4c2d99f8c5b2ce8b313130b162c        refs/tags/v0.1.0
704704140414dcc7f734e30c7fa1073fff5dd9a3        refs/tags/v0.1.0^{}
42afa34d2e4809f722b28e550aea44c649ebd314        refs/tags/v0.2.0
21401f852ef446d6de41b906b533a385ae83ba1e        refs/tags/v0.2.0^{}
9ec48565842980cb762f84778cb6bbed825d3546        refs/tags/v0.3.0
2629fff55ed6cf6665007fd15ea97d3cd203f465        refs/tags/v0.3.0^{}
43f2223fb24e0163b1a2a16d9bae16d968973aff        refs/tags/v0.4.0
85f45399c3c8bcef2f47f8dd1bc2cb414b0798c3        refs/tags/v0.4.0^{}
89d002d460e6d6dc32e91e97b4d80d04897a4ec1        refs/tags/v0.6.0
43f07425e9e2c92dd14c7d1f823e4aeec91dd56a        refs/tags/v0.6.0^{}

@andrewpollock andrewpollock added documentation Improvements or additions to documentation frontend Frontend Infrastructure and removed data quality Issues with data quality labels Jul 2, 2024
@Bo98
Copy link
Author

Bo98 commented Jul 2, 2024

Your example demonstrates the problem a bit.

The v1.1.0 tag is ambiguous. You identified it to apply to dotnet/core and google/brotli. However this is not correct as the last affected version for google/brotli is v1.0.7. It's seemingly impossible to use the versions array to identify affected versions unless the tag format used in each of the repos have no overlap.

Not sure what happened to the powershell/powershell repo as affected range there is v7.0.0 to <v7.0.9 according to the git commit ranges. That's why I intially interpreted it to be the first repo only.

@Bo98
Copy link
Author

Bo98 commented Jul 8, 2024

The problem really is merging git repos into one entry defeats the purpose of versions. The schema states:

The versions list is generally recommended to always be present, to allow software to easily answer the question “is this specific version affected?” without having to contain code specific to every different ecosystem.

As there's no information to tell which repo v1.1.0 applies to, it's impossible to reliably use the versions array here without querying the git repos and thus have code specific to the git ecosystem.

I've updated my code to ignore any vulnerability with multiple git repos. Does mean I lose some CVE coverage but is likely the best I can do here.

@oliverchang
Copy link
Collaborator

oliverchang commented Jul 11, 2024

Thanks for the feedback @Bo98 !

I think there's a great point to be made here to separate the affected entries by git repo.

That said -- could you please explain a bit how you are making use of the versions array here? Our intention was for consumers to use the commit hashes for unambiguous matching instead of using the version tags in a programmatic way.

@oliverchang oliverchang reopened this Jul 11, 2024
@Bo98
Copy link
Author

Bo98 commented Jul 11, 2024

I have a set of packages/dependencies where I know what git tag is being used but would need to clone the repo in order to do a commit comparison. In particular, I may not even have the git repo cloned.

For example, I may have a brotli dependency that uses https://github.com/google/brotli/archive/refs/tags/v1.1.0.tar.gz. From that I know the git tag - it's v1.1.0. The idea was to use the versions array to know whether this is affected or not. If it were commit hashes instead, I would need to fetch the repo and do git-specific comparisons I ordinarily would not need to do for other ecosystems (and is difficult to do so without significantly slowing down comparisons). The schema currently recommends the versions array as the way to do this, where available.

I already use versions array (and semver ranges) for PyPI etc dependencies so the same version comparison code is already used for all non-git ecosystems and it would be useful for git ecosystem to work the same as the others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation frontend Frontend Infrastructure
Projects
None yet
Development

No branches or pull requests

4 participants