Observability

Orimora exposes everything you need to run it like a production service: a Prometheus metrics endpoint, health probes for your orchestrator, optional error tracking to a Sentry-compatible backend, and a correlation ID on every request and log line for tracing. All of it is off by default and opt-in via environment variables — see Configuration for the full table.

At a glance

Concern	Mechanism	Enabled by
Metrics	`GET /api/metrics` (Prometheus)	`METRICS_TOKEN`
Liveness	`GET /api/live`	always on
Readiness	`GET /api/ready`	always on
Error tracking	Sentry-compatible ingest	`SENTRY_DSN`
Request tracing	`X-Correlation-Id` header + logs	always on

Prometheus metrics

GET /api/metrics returns metrics in the Prometheus text exposition format. It is disabled until you set METRICS_TOKEN, and then requires that token as a bearer credential:

curl -s -H "Authorization: Bearer $METRICS_TOKEN" \
  https://your-orimora.example.com/api/metrics

Without the header (or with the wrong token) the endpoint returns 401; if METRICS_TOKEN is unset it returns 404. The response is never cached.

Exposed metrics

On top of the default Node.js/process metrics from prom-client (process_*, nodejs_* — CPU, heap, event-loop lag, open handles), Orimora emits:

Metric	Type	Labels	Meaning
`http_requests_total`	counter	`route, method, status`	Total HTTP requests
`http_request_duration_seconds`	histogram	`route, method`	Request latency distribution
`auth_attempts_total`	counter	`method, outcome`	Sign-in attempts (magic-link/SSO/MFA), success/fail
`rate_limit_blocks_total`	counter	`bucket`	Requests rejected by a rate limiter
`audit_events_total`	counter	`action, outcome`	Audit events recorded

These cover the golden signals (traffic, latency, errors) plus security-relevant counters you’ll want alerts on — e.g. a spike in auth_attempts_total{outcome="failure"} or rate_limit_blocks_total.

Scrape config

scrape_configs:
    - job_name: orimora
      metrics_path: /api/metrics
      scheme: https
      authorization:
          type: Bearer
          credentials: <METRICS_TOKEN>
      static_configs:
          - targets: ['your-orimora.example.com']

Starter alerts

groups:
    - name: orimora
      rules:
          - alert: OrimoraHighAuthFailures
            expr: rate(auth_attempts_total{outcome="failure"}[5m]) > 1
            for: 10m
          - alert: OrimoraHighErrorRate
            expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
            for: 10m
          - alert: OrimoraSlowRequests
            expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
            for: 15m

Health probes

Two endpoints drive container orchestration; both are always available and need no token:

GET /api/live — pure liveness. Returns 200 whenever the process is up. Use it to decide whether to restart the container. It does not touch the database or Redis.
GET /api/ready — readiness. Checks the database and Redis and returns 200 only when both are reachable, otherwise 503. Use it to decide whether to route traffic.

The bundled Docker health check uses /api/ready. (/api/health remains as a backward-compatible alias.) See Deployment for the Compose wiring.

# Kubernetes
livenessProbe:
    httpGet: { path: /api/live, port: 3000 }
readinessProbe:
    httpGet: { path: /api/ready, port: 3000 }

Error tracking

Set SENTRY_DSN to forward unexpected 5xx server errors and SSO authentication failures to any Sentry-compatible backend — sentry.io, a self-hosted Sentry, or GlitchTip all work with only the DSN changing (no vendor lock-in). Routine 4xx responses are never sent.

SENTRY_ENVIRONMENT — defaults to NODE_ENV; tag staging vs production.
SENTRY_RELEASE — e.g. a git SHA; groups issues by deployed version.

Request correlation IDs

Every request is tagged with a correlation ID. If the incoming request carries an X-Correlation-Id header it is reused; otherwise one is generated. The ID is:

returned on the response as X-Correlation-Id, and
attached to every structured log line emitted while handling that request.

When error tracking is enabled the same ID is attached to the captured event, so you can pivot from a log line or an HTTP response straight to the corresponding error. Put a load balancer or gateway in front that injects X-Correlation-Id to trace a request end-to-end across services.