Skip to content

Disaster Recovery

This is the operational runbook for restoring Orimora after a real incident. It is the companion to Backup & Restore (what is backed up + scheduling). Keep a copy of this page — and your recovery code — somewhere you can reach when the instance is down.

MetricTargetNotes
RPO (max acceptable loss)24 hDaily backup at 03:00 UTC (BACKUP_SCHEDULE).
RTO (max acceptable outage)4 hManual restore + boot; achievable with this runbook in hand.
  1. The latest DB dump (orimora-db-<ts>.dump, pg_dump custom format).
  2. The latest uploads archive (uploads-<ts>.tar.gz).
  3. The recovery code — the age private key (AGE-SECRET-KEY-…) shown once at setup under Settings → Admin → Backups.

Off-site artifacts are named *.age (encrypted). Local on-host copies under BACKUP_PATH are plaintext (the host already holds the live DB) — if you still have the host volume, skip the decrypt step.

The PostgreSQL client must match the server major version (16) — the image ships postgresql16-client. age and rclone are bundled in the image too.

Terminal window
# 0. Stop the app so nothing writes during the restore.
docker compose stop app
# 1. Fetch the latest off-site artifacts (example: an rclone remote named "offsite").
rclone copy offsite:orimora ./restore --include '*.age'
# 2. Decrypt with the recovery code (age private key).
printf '%s\n' "$AGE_SECRET_KEY" > /tmp/orimora-recovery.key
age -d -i /tmp/orimora-recovery.key -o ./restore/db.dump ./restore/orimora-db-<ts>.dump.age
age -d -i /tmp/orimora-recovery.key -o ./restore/uploads.tar.gz ./restore/uploads-<ts>.tar.gz.age
shred -u /tmp/orimora-recovery.key # don't leave the key on disk
# 3. Restore the database (into a clean DB).
# Either drop+recreate the existing DB, or restore into a fresh one and repoint DATABASE_URL.
pg_restore --clean --if-exists --no-owner --no-privileges \
--dbname="$DATABASE_URL" ./restore/db.dump
# 4. Restore uploads (unpacks back to ./uploads/...).
tar -xzf ./restore/uploads.tar.gz -C /app # adjust target to your uploads parent
# 5. Start the app. The entrypoint runs migrations fail-closed before serving.
docker compose up -d app

Verify after boot: login works, a document with an attachment renders its image (blob resolved), and Settings → Admin → Backups health is green.

ScenarioWhat happenedAction
(a) Volume / host lossThe server or its disk is gone.Provision a new host, install the image, pull off-site *.age, run the full procedure.
(b) Corrupt DBDB unreadable but host intact.Use the local plaintext dump (skip decrypt); pg_restore --clean into the same DB.
(c) Bad migrationA migration broke the schema/data.Restore the most recent pre-incident dump, then redeploy the fixed build.
(d) Ransomware / full lossHost compromised, local backups gone.Restore from the off-site copy only. This is why off-site + encryption are required.

Off-site targets (S3-free) — rclone recipes

Section titled “Off-site targets (S3-free) — rclone recipes”

Off-site uses an rclone remote (BACKUP_RCLONE_REMOTE). rclone reads its config from rclone.conf or RCLONE_CONFIG_* env. Prefer key/password backends (no OAuth on a headless server). Set up once with rclone config, then set the remote path.

# SFTP to a second machine (most universal — anyone with a second Linux box)
[offsite]
type = sftp
host = backup.example.com
user = orimora
key_file = /home/orimora/.ssh/id_ed25519
# → BACKUP_RCLONE_REMOTE=offsite:orimora-backups
# Backblaze B2 (cheap object storage, S3-free, key-based)
[offsite]
type = b2
account = <keyId>
key = <applicationKey>
# → BACKUP_RCLONE_REMOTE=offsite:my-orimora-bucket
# WebDAV (Nextcloud / generic)
[offsite]
type = webdav
url = https://cloud.example.com/remote.php/dav/files/orimora/
vendor = nextcloud
user = orimora
pass = <obscured — set via `rclone obscure`>
# → BACKUP_RCLONE_REMOTE=offsite:orimora-backups

Run a full restore drill at least monthly, and after any change to the backup pipeline. CI runs the round-trip restore and the encrypt→off-site round-trip on every push; the monthly drill exercises the real, end-to-end manual procedure above against a throwaway environment.

Checklist:

  1. Off-site artifacts present and recent.
  2. Decrypt with the recovery code.
  3. pg_restore into a throwaway DB.
  4. Table count matches source.
  5. A known document and its attachment resolve.
  6. Record the wall-clock time — it must be ≤ RTO (4 h).