← All dashboards

Platform SRE · Live status

Are all services healthy and within SLA right now?

Operational 10 May 2026 · 09:47 UTC

✓

API Gateway p99 142ms · 99.99% Healthy

✓

Auth Service p99 68ms · 100% Healthy

Database Cluster p99 380ms · 99.82% Degraded

✓

CDN Edge p99 18ms · 100% Healthy

✓

Payments p99 96ms · 99.98% Healthy

142 ms

12 msvs 1h ago

SLA ceiling: 200ms · 29% headroom

0.08%

0.02 ppvs 1h ago

SLA budget: ≤ 0.10%

8,240 RPS

320vs 1h ago

Peak today: 9,140 RPS at 08:12

99.94%

0.02 ppvs last 30d

SLA target: 99.9% · 4.38h budget/mo

Incident	Service	Severity	Duration	On-call
DB latency spike	Database	SEV-2	24 min	A. García
Memory pressure	Worker pool	SEV-3	1h 12m	J. Kowalski
Slow query backlog	Database	SEV-3	38 min	A. García
CDN cache miss rate	CDN Edge	SEV-4	2h 04m	M. Chen

Latency percentiles from Datadog APM (p50/p95/p99 of all API gateway requests). SLA 200ms applies to p99. Error rate = HTTP 5xx responses ÷ total requests. Uptime = availability of API gateway endpoint per external monitors. All figures are synthetic for illustration only.

Platform SRE · Live status

Is API latency trending toward the 200ms SLA ceiling?

Is the error rate within the 0.10% SLA budget?

How is request throughput trending vs the 60-min average?

Active incidents requiring attention