Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Grafana stack #2549

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Add Grafana stack #2549

wants to merge 10 commits into from

Conversation

jawadqur
Copy link
Contributor

@jawadqur jawadqur commented May 14, 2024

Link to JIRA ticket if there is one:
https://ctds-planx.atlassian.net/browse/GPE-1272

New Features

  • Replace old Prometheus stack with LGTM stack from Grafana.

  • This deploys the following new services in the monitoring namespace:

    • Mimir
    • Loki
    • Tempo (optional)
    • Grafana

These services provide enhanced observability into Gen3 deployments, and store their data in S3.

Details:

Loki:

  • Loki is a log aggregation system designed to store and query logs efficiently.
  • It indexes only the metadata of logs, reducing storage costs and speeding up queries.
  • Integrated with Grafana for easy visualization and dashboard creation.

Mimir:

  • Mimir is a scalable and highly available time-series database compatible with Prometheus.
  • It provides advanced query capabilities and high availability for time-series data.
  • Supports long-term storage and efficient data retrieval.

Tempo (optional):

  • Tempo is a distributed tracing backend that stores and queries trace data.
  • Helps in tracking requests as they flow through various services, aiding in performance monitoring and troubleshooting.
  • Optional deployment, but integrates seamlessly with Grafana for trace visualization.

Grafana:

  • Grafana is a multi-platform open-source analytics and interactive visualization web application.
  • Provides a unified interface for visualizing data from Loki, Mimir, and Tempo.
  • Supports the creation of detailed, customizable dashboards for monitoring and alerting.

These additions will significantly improve the observability and monitoring capabilities of Gen3 deployments, ensuring better performance insights and easier troubleshooting.

Breaking Changes

Bug Fixes

Improvements

Dependency updates

Deployment changes

@jawadqur jawadqur marked this pull request as ready for review May 20, 2024 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant