Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced benchmark.py for improved metrics handling and multi-commit … #31692

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

54J4N
Copy link

@54J4N 54J4N commented Jun 28, 2024

Changes:

  • Integrated enhanced metrics handling to capture decode.latency.mean, per_token.latency.mean, and per_token.throughput.value.
  • Implemented multi-commit support for benchmarking against specific commit references.
  • Incorporated functionality to summarize benchmark results into structured JSON reports for easier analysis and comparison.
  • Enabled repository upload integration with the HuggingFace team's repository structure for seamless collaboration and sharing of benchmark results.

@amyeroberts
Copy link
Collaborator

Hi @54J4N, thanks for opening a PR!

Could you provide some more context for these changes? Is there a discussion on the forums or related issue describing the motivations?

@54J4N
Copy link
Author

54J4N commented Jun 28, 2024

Hi Amy,
These changes aim to enhance metrics handling, support multi-commit benchmarking, and provide structured JSON reports for better performance analysis across different code versions.

@amyeroberts
Copy link
Collaborator

@54J4N It would help us review and decide if this should be an addition to the codebase if you could provide some more information:

enhance metrics handling

How does this change the metrics handling?

support multi-commit benchmarking

Could you provide some more context here? What does 'multi-commit' becnhmarking mean here? Could you provide code snippets showing how this can and should be done?

provide structured JSON reports

This is best show by providing an example command and desired output

@54J4N
Copy link
Author

54J4N commented Jun 28, 2024

Enhanced Metrics Handling
Changes Made:

-Integrated new metrics to capture detailed performance data:

  • decode.latency.mean: Captures the average latency during the decoding process.
  • per_token.latency.mean: Measures the average latency per token.
  • per_token.throughput.value: Calculates the throughput per token.

Impact:

  • These metrics offer finer granularity of performance insights, aiding in the precise identification of bottlenecks and optimization opportunities.
def summarize(run_dir, metrics, expand_metrics=False):
    """Produce a summary for each optimum-benchmark launched job's output directory found in `run_dir`.

    Each summary's format is as follows (for `expand_metrics=False`):
    ```
    {
        "model": "google/gemma-2b",
        "commit": "3cd6ed22e4d49219f300f5055e71e3929aba20d7",
        "config": "benchmark.input_shapes.batch_size=1,benchmark.input_shapes.sequence_length=5",
        "metrics": {
            "decode.latency.mean": 1.624666809082031,
            "per_token.latency.mean": 0.012843788806628804,
            "per_token.throughput.value": 77.85864553330948
        }
    }
    ```
    """
    reports = glob.glob(os.path.join(run_dir, "**/benchmark_report.json"), recursive=True)
    report_dirs = [str(Path(report).parent) for report in reports]

    summaries = []
    for report_dir in report_dirs:
        commit = re.search(r"/commit=([^/]+)", report_dir).group(1)

        if not os.path.isfile(os.path.join(report_dir, "benchmark.json")):
            continue
        benchmark = Benchmark.from_json(os.path.join(report_dir, "benchmark.json"))
        report = benchmark.report

        model = benchmark.config.backend.get("model", "")

        # Extract benchmark name from directory path
        benchmark_name = os.path.basename(os.path.normpath(report_dir))
        if benchmark_name.startswith("commit="):
            benchmark_name = benchmark_name[len("commit="):]

        metrics_values = {}
        # Post-processing of report: extract selected metrics
        for metric in metrics:
            keys = metric.split(".")
            value = report
            current = metrics_values
            for key in keys:
                if key not in value:
                    continue
                value = value[key]

                if expand_metrics:
                    if isinstance(value, dict):
                        if key not in current:
                            current[key] = {}
                        current = current[key]
                    else:
                        current[key] = value

            if not expand_metrics:
                metrics_values[metric] = value

        # Print summary details
        print(f"model: {model}")
        print(f"commit: {commit}")
        print(f"config: {benchmark_name}")
        if metrics_values:
            print("metrics:")
            if expand_metrics:
                print(metrics_values)
            else:
                for metric, value in metrics_values.items():
                    print(f"  - {metric}: {value}")
        print("-" * 80)

        summary = {
            "model": model,
            "commit": commit,
            "config": benchmark_name,
            "metrics": metrics_values,
        }
        summaries.append(summary)

        with open(os.path.join(report_dir, "summary.json"), "w") as fp:
            json.dump(summary, fp, indent=4)

    return summaries

Support Multi-Commit Benchmarking
Changes Made:

  • Implemented functionality to benchmark multiple commits by checking out specific commits and running benchmarks for each.

Context:

  • Multi-commit benchmarking allows users to compare the performance of different code versions, providing insights into how changes impact performance over time.
@contextmanager
def checkout_commit(repo: Repo, commit_id: str):
    """
    Context manager that checks out a given commit when entered, but gets back to the reference it was at on exit.
    Args:
        repo (`git.Repo`): A git repository (for instance the Transformers repo).
        commit_id (`str`): The commit reference to checkout inside the context manager.
    """
    current_head = repo.head.commit if repo.head.is_detached else repo.head.ref
    try:
        repo.git.checkout(commit_id)
        yield
    finally:
        repo.git.checkout(current_head)

# Code to iterate over commits and run benchmarks
commits = [x for x in args.commit if x]
if not commits:
    commits = [current_head]
elif len(commits) == 1 and commits[0] == "diff":
    commits = ["main", current_head]

for commit in commits:
    with checkout_commit(repo, commit):
        commit = str(repo.head.commit)

        commit_run_dir = exp_run_dir
        if exp_run_dir:
            commit_run_dir = os.path.join(exp_run_dir, f"commit={commit}")

        print(f"Running benchmark on commit: {commit}")

        for model in models:
            model_arg = [f"backend.model={model}"] if model else []
            dir_args = []
            if commit_run_dir:
                if "hydra.sweep.dir=" in optimum_benchmark_args:
                    optimum_benchmark_args[optimum_benchmark_args.index("hydra.sweep.dir=")] = f"hydra.sweep.dir={commit_run_dir}"
                else:
                    dir_args = [
                        f"hydra.sweep.dir={commit_run_dir}",
                        f"hydra.run.dir={commit_run_dir}/" + "${hydra.job.override_dirname}",
                    ]
            main(args.config_dir, args.config_name, model_arg + dir_args + optimum_benchmark_args)

Providing Structured JSON Reports
Changes Made:

  • Added functionality to generate and summarize benchmark results into structured JSON reports for better analysis and comparison.
JSON Output:
{
  "model": "google/gemma-2b",
  "commit": "9b9c7f03da625b13643e99205c691fe046461724",
  "config": "benchmark.input_shapes.batch_size=1,benchmark.input_shapes.sequence_length=5",
  "metrics": {
    "decode.latency.mean": 1.624666809082031,
    "per_token.latency.mean": 0.012843788806628804,
    "per_token.throughput.value": 77.85864553330948
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants