Back to Projects
SLO Tracker
Centralized observability platform reducing Mean Time to Detection by 40% across 50+ engineering teams.
Overview
Challenge
Fidelity's 500+ microservices lacked unified reliability metrics. Teams couldn't track SLOs consistently, leading to slow incident detection and alert fatigue.
Solution
Built end-to-end SLO platform with alerts-as-code, error budget tracking, and automated breach notifications. Integrated OpenTelemetry across all services with a React dashboard for visibility.
Impact
Reduced Mean Time to Detection by 40% across 50+ teams. High-cardinality metrics POC cut alert noise by 50%, saving 20 engineering hours weekly.
Tech Stack
PythonFastAPIAWS LambdaPostgreSQLReactTypeScriptTerraformOpenTelemetry
Key Metrics
- 40% reduction in Mean Time to Detection
- 50+ engineering teams onboarded
- 20 engineering hours/week saved in alert triage
Interested in discussing this project or similar work?
Get in Touch