Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added ci script that updates mismatch database #4236

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions .github/workflows/update-mismatch.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
name: Update mismatch database

on:
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
push:
branches:
- main
paths:
- 'data/**'

env:
NO_EXIT_CVE_NUM: 1
nvd_api_key: ${{ secrets.NVD_API_KEY }}

permissions:
contents: read

jobs:
linux:
name: Update mismatch database
runs-on: ubuntu-20.04
timeout-minutes: 60
steps:
- name: Harden Runner
uses: step-security/harden-runner@17d0e2bd7d51742c71671bd19fa12bdc9d40a3d6 # v2.8.1
with:
egress-policy: audit

- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5.1.0
with:
python-version: '3.10'
cache: 'pip'
- name: Get date
id: get-date
run: |
echo "date=$(/bin/date -u "+%Y%m%d")" >> $GITHUB_OUTPUT
- uses: actions/cache@0c45773b623bea8c8e75f6c82b208c3cf94ea4f9 # v4.0.2
with:
path: cache
key: Linux-cve-bin-tool-${{ steps.get-date.outputs.date }}
- name: Install cve-bin-tool
run: |
python -m pip install --upgrade pip
python -m pip install --upgrade setuptools
python -m pip install --upgrade wheel
python -m pip install --editable .
- name: Update database
run: |
[[ -e cache ]] && mkdir -p .cache && mv cache ~/.cache/cve-bin-tool
python -m cve_bin_tool.cli test/assets/test-kerberos-5-1.15.1.out -u now
cp -r ~/.cache/cve-bin-tool cache
- name: Update mismatch database
run: |
python -m cve_bin_tool.mismatch_loader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is doing a full database update which you probably don't want, and then it's saving the data from that update but not saving the data from the mismatch loader. Here's a quick rework just to highlight what's going on and what should be going on instead:

Suggested change
- name: Update database
run: |
[[ -e cache ]] && mkdir -p .cache && mv cache ~/.cache/cve-bin-tool
python -m cve_bin_tool.cli test/assets/test-kerberos-5-1.15.1.out -u now
cp -r ~/.cache/cve-bin-tool cache
- name: Update mismatch database
run: |
python -m cve_bin_tool.mismatch_loader
- name: Copy github cache into appropriate directory
run: |
[[ -e cache ]] && mkdir -p .cache && mv cache ~/.cache/cve-bin-tool
- name: Update mismatch database
run: |
python -m cve_bin_tool.mismatch_loader
- name: Save data back to github cache
run: |
cp -r ~/.cache/cve-bin-tool cache

That said, don't accept that suggestion. You definitely want to take what's there and integrate the update into the existing cache-update.yml file instead.

The reason is mostly that I want to make sure that only one job is updating the date and data in the github cache because doing anything else is going to obfuscate any uptime issues with NVD and they're already a pain to handle.

Longer-term, though: we shouldn't need to run a separate script to update the mismatch data. All of that should be integrated into what happens when you run cve-bin-tool -u now so that it happens seamlessly. Probably you want to treat mismatch as a data_source similar to how purl2cpe is loaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea of separate script was to ensure we can make a library for mismatch database in which user can add/remove data as a standalone entity. Making it similar to data_source, wouldn't that be attached to cve-bin-tool?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we'll need to refactor a bit if we want it to work standalone because of the database initialization -- it's not much of a standalone if we have to run a full cve-bin-tool update to make it work. But let's get it working this way first.

I think what we want eventually is the ability to update any single data source separately (we've got an open issue for that) and we can probably solve that problem and this one at the same time.

4 changes: 3 additions & 1 deletion cve_bin_tool/cvedb.py
Original file line number Diff line number Diff line change
Expand Up @@ -526,7 +526,6 @@ def populate_db(self) -> None:
self.populate_metrics()
# EPSS uses metrics table to get the EPSS metric id.
# It can't be run before creation of metrics table.
self.populate_purl2cpe()

for idx, data in enumerate(self.data):
_, source_name = data
Expand All @@ -539,6 +538,9 @@ def populate_db(self) -> None:
# if source_name != "NVD" and cve_data[0] is not None:
# cve_data = self.update_vendors(cve_data)

if source_name == "PURL2CPE":
self.populate_purl2cpe()

if source_name == "EPSS":
if cve_data is not None:
self.store_epss_data(cve_data)
Expand Down
Loading