Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create thresholds response time limits for 'degraded' and 'failed' #837

Open
mxkaske opened this issue May 26, 2024 · 0 comments
Open

Create thresholds response time limits for 'degraded' and 'failed' #837

mxkaske opened this issue May 26, 2024 · 0 comments
Labels
👥 Core Team Prioritized for core team

Comments

@mxkaske
Copy link
Member

mxkaske commented May 26, 2024

Allow the user to select the response time limits for when a monitor is degraded and failed.

Add two new db columns for monitor:

  • limitDegraded or thresholdDegraded (e.g.)
  • limitFailed or thresholdFailed (e.g.)

Both should be a number that the user selects (can be via select or input). We should have a max allowed number, e.g. 45sec (45.000ms).

Right now, we have the export const monitorStatus = ["active", "error"] as const; that is used within the monitor_status and monitor schema.

We could extend it with "degraded" to include that information within a specific monitor or within a specific monitor_status (based on the region).

Within a monitor overview, extend the cards:

Image

Important

Right now, the displayed data is coming from Tinybird. How to calculate the 'degraded' values?

How to calculate the 'degraded' value? Multiple options:

  1. extend our ping_response tb schema with a 'status' column and count the amount of different status (which will then be fixed as it will then be hardcoded in)
  2. within our metrics_endpoints, pass the limit props and calculate/count the amount based on the props (dynamic, as the user could change the threshold and we will calculate them differently)

Image

I like the second option as it is dynamic - and we don't have to extend the schema.

Open questions:

  • How should we notify the users about degraded services? (again, only if >50% of regions are degraded?) We might wanna include an additional boolean db column to allow alerts?
  • How to extend the Tracker class get the current status of the monitors, like "degraded" - and how to display it in the status page?
@mxkaske mxkaske added the 👥 Core Team Prioritized for core team label May 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
👥 Core Team Prioritized for core team
Projects
Status: No status
Development

No branches or pull requests

1 participant