Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Gaps in sum and avg aggregations when joining histogram quantile with pod labels #7466

Open
velavokr opened this issue Jun 17, 2024 · 0 comments

Comments

@velavokr
Copy link

velavokr commented Jun 17, 2024

#2736 (comment)

Thanos, Prometheus and Golang version used:
thanos helm chart: 15.0.5
kube-prometheus-stack helm chart: 57.2.0
Chart.yaml and values.yaml for them: https://gist.github.com/velavokr/e8410555385db7bc9b1a1c184fe99b72

Object Storage Provider:
s3

What happened:
A problem similar to one described here.

After joining the metric with kube_pod_labels to filter by an additional label sum and avg aggregations started making gaps. Other aggregations (max and min) still have no gaps though.

How it happens:

  1. Have one pod under small but constant load (5rps) and emitting bucket counter metrics. The metrics are collected by prometheus. There's no gap in the aggregated metric.
  2. Wind up a second, similar pod and start putting way more rps on it (100), so its bucket counters are growing 20 times faster.
  3. Stop applying load on the second pod. Its bucket counters are not growing anymore. The first pod is still loaded with 5rps.
  4. A new gap starts in the aggregated metric.
  5. Delete the second pod.
  6. The aggregated metric still has a gap before the time the second pod was deleted but no more gap after the time.

The PromQL expression used:
avg(histogram_quantile(0.5, sum(rate(my_latency_bucket[$__rate_interval])) by (le,pod)) * on(pod) group_right(le) kube_pod_labels{my_pod_label="my_pod_label_value"})

Here is the unaggregated metric. There are a few pods coming and going:
Screenshot from 2024-06-16 21-50-06

Here is the result of min. No gaps, as expected:
Screenshot from 2024-06-16 21-49-27

And here is the result of avg. Notice the gaps:
Screenshot from 2024-06-16 21-49-43

What you expected to happen:
No gaps in aggregations if there are no gaps in the underlying metrics.

How to reproduce it (as minimally and precisely as possible):
I'm unsure.

Full logs to relevant components:
Cannot find the logs relevant to the expression evaluated.

Anything else we need to know:

@velavokr velavokr changed the title Gaps in sum and avg aggregations after joining the metric Gaps in sum and avg aggregations after joining the metric with pod labels Jun 17, 2024
@velavokr velavokr changed the title Gaps in sum and avg aggregations after joining the metric with pod labels Gaps in sum and avg aggregations after joining histogram with pod labels Jun 17, 2024
@velavokr velavokr changed the title Gaps in sum and avg aggregations after joining histogram with pod labels Gaps in sum and avg aggregations after joining histogram quantile with pod labels Jun 17, 2024
@velavokr velavokr changed the title Gaps in sum and avg aggregations after joining histogram quantile with pod labels [BUG] Gaps in sum and avg aggregations after joining histogram quantile with pod labels Jun 17, 2024
@velavokr velavokr changed the title [BUG] Gaps in sum and avg aggregations after joining histogram quantile with pod labels [Bug] Gaps in sum and avg aggregations after joining histogram quantile with pod labels Jun 17, 2024
@velavokr velavokr changed the title [Bug] Gaps in sum and avg aggregations after joining histogram quantile with pod labels [Bug] Gaps in sum and avg aggregations when joining histogram quantile with pod labels Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant