Skip to content

Observability

Orimora exposes everything you need to run it like a production service: a Prometheus metrics endpoint, health probes for your orchestrator, optional error tracking to a Sentry-compatible backend, and a correlation ID on every request and log line for tracing. All of it is off by default and opt-in via environment variables — see Configuration for the full table.

ConcernMechanismEnabled by
MetricsGET /api/metrics (Prometheus)METRICS_TOKEN
LivenessGET /api/livealways on
ReadinessGET /api/readyalways on
Error trackingSentry-compatible ingestSENTRY_DSN
Request tracingX-Correlation-Id header + logsalways on

GET /api/metrics returns metrics in the Prometheus text exposition format. It is disabled until you set METRICS_TOKEN, and then requires that token as a bearer credential:

Terminal window
curl -s -H "Authorization: Bearer $METRICS_TOKEN" \
https://your-orimora.example.com/api/metrics

Without the header (or with the wrong token) the endpoint returns 401; if METRICS_TOKEN is unset it returns 404. The response is never cached.

On top of the default Node.js/process metrics from prom-client (process_*, nodejs_* — CPU, heap, event-loop lag, open handles), Orimora emits:

MetricTypeLabelsMeaning
http_requests_totalcounterroute, method, statusTotal HTTP requests
http_request_duration_secondshistogramroute, methodRequest latency distribution
auth_attempts_totalcountermethod, outcomeSign-in attempts (magic-link/SSO/MFA), success/fail
rate_limit_blocks_totalcounterbucketRequests rejected by a rate limiter
audit_events_totalcounteraction, outcomeAudit events recorded

These cover the golden signals (traffic, latency, errors) plus security-relevant counters you’ll want alerts on — e.g. a spike in auth_attempts_total{outcome="failure"} or rate_limit_blocks_total.

scrape_configs:
- job_name: orimora
metrics_path: /api/metrics
scheme: https
authorization:
type: Bearer
credentials: <METRICS_TOKEN>
static_configs:
- targets: ['your-orimora.example.com']
groups:
- name: orimora
rules:
- alert: OrimoraHighAuthFailures
expr: rate(auth_attempts_total{outcome="failure"}[5m]) > 1
for: 10m
- alert: OrimoraHighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 10m
- alert: OrimoraSlowRequests
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
for: 15m

Two endpoints drive container orchestration; both are always available and need no token:

  • GET /api/live — pure liveness. Returns 200 whenever the process is up. Use it to decide whether to restart the container. It does not touch the database or Redis.
  • GET /api/readyreadiness. Checks the database and Redis and returns 200 only when both are reachable, otherwise 503. Use it to decide whether to route traffic.

The bundled Docker health check uses /api/ready. (/api/health remains as a backward-compatible alias.) See Deployment for the Compose wiring.

# Kubernetes
livenessProbe:
httpGet: { path: /api/live, port: 3000 }
readinessProbe:
httpGet: { path: /api/ready, port: 3000 }

Set SENTRY_DSN to forward unexpected 5xx server errors and SSO authentication failures to any Sentry-compatible backend — sentry.io, a self-hosted Sentry, or GlitchTip all work with only the DSN changing (no vendor lock-in). Routine 4xx responses are never sent.

  • SENTRY_ENVIRONMENT — defaults to NODE_ENV; tag staging vs production.
  • SENTRY_RELEASE — e.g. a git SHA; groups issues by deployed version.

Every request is tagged with a correlation ID. If the incoming request carries an X-Correlation-Id header it is reused; otherwise one is generated. The ID is:

  • returned on the response as X-Correlation-Id, and
  • attached to every structured log line emitted while handling that request.

When error tracking is enabled the same ID is attached to the captured event, so you can pivot from a log line or an HTTP response straight to the corresponding error. Put a load balancer or gateway in front that injects X-Correlation-Id to trace a request end-to-end across services.