Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup stale metrics from time to time #32

Open
flaviostutz opened this issue May 12, 2021 · 4 comments
Open

Cleanup stale metrics from time to time #32

flaviostutz opened this issue May 12, 2021 · 4 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@flaviostutz
Copy link
Member

flaviostutz commented May 12, 2021

After just a single observation of a metric, it will be reported forever, even with its count freezed in "1", "2" etc for days or months. When those "stale" metrics are scraped and processed by Prometheus it will compare this metrics to its previous value on the datastore (that will be the same) and it will simply discard it. Now imagine you have hundreds of error-info messages of even thousand of different paths that are not used anymore and in every /metrics scrape that is returned, wasting CPU and Network resources until you restart the server. This is happening with us in production.

Proposal

Perform a "soft reset" of all metrics in memory each 48h in order to reduce stale metrics. This way all metrics will be erased and on the next metric Observation it will become "1" again (Prometheus is designed to handle this kind of discontinuation/resets in series).

@gilliardmacedo
Copy link
Member

@flaviostutz If a counter reset and increases between two scraps, is this meter still reliable?

Micrometer supports metrics unregister operations. But I know that is very difficult to maintain records to define if a metric is stale or not

@flaviostutz
Copy link
Member Author

flaviostutz commented May 12, 2021 via email

@CarlosPanarello
Copy link

@flaviostutz Today this is already possible using MetricRegistry and executing the clean method. In both we can reset all metrics, we do this in tests, maybe we can create a scheduler process to execute this clean and set a default time to do this, and add this in both libs.

@flaviostutz
Copy link
Member Author

flaviostutz commented May 13, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants